Bioinformatics database – A Step-by-Step Guide to Database Creation using MySQL and PHP
September 26, 2023Table of Contents
MySQL Installation:
For Windows:
- Download MySQL Installer:
- Visit the MySQL official download page
- Download the MySQL Installer community version.
- Install MySQL:
- Run the installer that you downloaded.
- Follow the on-screen instructions to install MySQL, choosing the “Developer Default” to install the MySQL server and other MySQL tools.
- Configure MySQL:
- Once installation is complete, configure the MySQL server. Use the default settings for a development machine.
- Set the root password when prompted. Make sure to remember it as you’ll need it to access MySQL.
For Linux:
- Update Packages:shell
sudo apt-get update
- Install MySQL Server:shell
sudo apt-get install mysql-server
- Secure MySQL Installation:shell
sudo mysql_secure_installation
- You’ll be prompted to set up a password for the MySQL root user. Remember this password.
Starting MySQL:
For Windows:
- Open MySQL Command Line Client from Start Menu.
- Enter your password when prompted.
For Linux:
mysql -u root -p
- Enter your password when prompted.
Creating a Database and Tables:
1. Create a Database:
CREATE DATABASE DiseaseDB;
2. Use the Database:
USE DiseaseDB;
3. Create a Table:
For the bioinformatics database, we’ll create a table called Proteins
to store information about proteins related to a particular disease.
CREATE TABLE Proteins (
UniProtID VARCHAR(15) PRIMARY KEY,
ProteinName VARCHAR(255) NOT NULL,
ProteinSequence TEXT NOT NULL,
ExternalPDBID VARCHAR(15),
ExternalPubMedID INT
);
4. Insert Data:
Insert some example data into the Proteins table.
INSERT INTO Proteins (UniProtID, ProteinName, ProteinSequence, ExternalPDBID, ExternalPubMedID)
VALUES
('P12345', 'Example Protein', 'MTEITAAMVKELRESTGAG', '1XYZ', 123456789),
('P67890', 'Another Protein', 'MQDRVKRPMNAFIVWSRDQ', '2ABC', 987654321);
5. Retrieve Data:
To retrieve data from the Proteins table, you can use the SELECT statement.
SELECT * FROM Proteins;
Advanced MySQL Operations:
1. Creating Index:
Indexes are used to retrieve data from the database more quickly. For example, you can create an index on the ProteinName
column.
CREATE INDEX idx_protein_name ON Proteins (ProteinName);
2. Altering Tables:
If you need to add a new column to the Proteins table, you can use the ALTER TABLE statement.
ALTER TABLE Proteins ADD COLUMN GeneName VARCHAR(255);
3. Joining Tables:
If you have another table, and you want to perform a JOIN operation, you can do it as follows. Suppose there is another table named Diseases
:
CREATE TABLE Diseases (
DiseaseID INT PRIMARY KEY,
DiseaseName VARCHAR(255) NOT NULL,
UniProtID VARCHAR(15),
FOREIGN KEY (UniProtID) REFERENCES Proteins (UniProtID)
);
Now, if you want to join Proteins
and Diseases
on UniProtID
, you can do the following:
SELECT * FROM Proteins
JOIN Diseases ON Proteins.UniProtID = Diseases.UniProtID;
MySQL Maintenance and Optimization:
- Optimize Table: To reclaim unused space and to defragment the data file, you can use the OPTIMIZE TABLE statement.sql
OPTIMIZE TABLE Proteins;
- Backup Database: It’s crucial to regularly back up your database.shell
mysqldump -u root -p DiseaseDB > DiseaseDB_backup.sql
- Enter your MySQL password when prompted.
- Restore Database: If needed, you can restore your database from a backup file.shell
mysql -u root -p DiseaseDB < DiseaseDB_backup.sql
- Enter your MySQL password when prompted.
This is a concise overview to help you get started with MySQL using a bioinformatics example. For more complex and advanced usage, you can refer to the official MySQL documentation or explore specific MySQL tutorials focused on advanced concepts and optimizations.
Step 1: Data Input
Once you have your tables set up, you’ll likely want to input additional data, or modify existing data. Here’s how you can do that, continuing with our bioinformatics example:
Insert Additional Data
INSERT INTO Proteins (UniProtID, ProteinName, ProteinSequence, ExternalPDBID, ExternalPubMedID)
VALUES ('P34567', 'Protein Three', 'SDKJHSKJDHKSJHDKS', '3DEF', 456123789);
If you have many rows to insert, you can use batch insert:
INSERT INTO Proteins (UniProtID, ProteinName, ProteinSequence, ExternalPDBID, ExternalPubMedID)
VALUES
('P11111', 'Protein Four', 'AIUEOAIUEOAIUEO', '4GHI', 112233445),
('P22222', 'Protein Five', 'BCDGEFBCDGEFBCDG', '5JKL', 556677889);
Update Data
If you want to update existing rows, you can use the UPDATE statement. For example, to add a GeneName to a row in the Proteins table:
UPDATE Proteins SET GeneName = 'GeneOne' WHERE UniProtID = 'P12345';
Delete Data
To delete rows, use the DELETE statement. For example, to delete a protein with a specific UniProtID:
DELETE FROM Proteins WHERE UniProtID = 'P67890';
Step 2: Complex Queries
MySQL allows you to perform complex queries to retrieve and manipulate your data.
Retrieve Specific Columns:
SELECT UniProtID, ProteinName FROM Proteins WHERE ExternalPubMedID > 500000000;
Sorting
You can also sort the results using the ORDER BY clause:
SELECT * FROM Proteins ORDER BY ProteinName DESC;
Aggregations
MySQL provides several aggregation functions like COUNT, AVG, SUM, MIN, and MAX to perform operations on your data.
SELECT COUNT(*) FROM Proteins WHERE ExternalPubMedID > 500000000;
Step 3: User Management and Permissions
MySQL allows you to create different users and grant them specific permissions to secure your database.
Create User
To create a new user:
CREATE USER 'bio_user'@'localhost' IDENTIFIED BY 'password';
Grant Permissions
To grant permissions to a user:
GRANT SELECT, INSERT, UPDATE ON DiseaseDB.* TO 'bio_user'@'localhost';
Revoke Permissions
To revoke permissions from a user:
REVOKE INSERT, UPDATE ON DiseaseDB.* FROM 'bio_user'@'localhost';
Step 4: Performance Optimization
The performance of the MySQL server can be optimized by analyzing and optimizing queries and indexing strategy.
Query Execution Plan
You can analyze the query execution plan using the EXPLAIN keyword:
EXPLAIN SELECT * FROM Proteins WHERE ProteinName = 'Example Protein';
Profiling
MySQL has a built-in profiler that you can use to analyze the performance of your queries.
SET profiling = 1;
SELECT * FROM Proteins WHERE ProteinName = 'Example Protein';
SHOW PROFILES;
Indexing Strategy
Based on the query execution plan and profiling results, you can decide which columns need indexing. For example, if you query the ProteinName column frequently, consider adding an index:
CREATE INDEX idx_name ON Proteins (ProteinName);
Step 5: Database Maintenance
Database maintenance is crucial to ensure the stability and performance of your system.
Table Maintenance
MySQL provides various table maintenance statements like CHECK TABLE, REPAIR TABLE, and OPTIMIZE TABLE.
CHECK TABLE Proteins;
REPAIR TABLE Proteins;
OPTIMIZE TABLE Proteins;
Update Statistics
MySQL uses stored key distribution statistics when deciding the order in which to join tables when you are performing a join on something other than a constant. Those statistics can be updated using the ANALYZE TABLE statement.
ANALYZE TABLE Proteins;
Step 6: Backup and Restore
Finally, regularly back up your database to prevent data loss, and know how to restore it if needed.
Backup with mysqldump:
mysqldump -u root -p DiseaseDB > DiseaseDB_backup.sql
Restore from Backup:
mysql -u root -p DiseaseDB < DiseaseDB_backup.sql
This extensive tutorial will give you a deeper understanding of MySQL from beginner level to advanced, and it should serve as a comprehensive guide for database management in a bioinformatics context.
Step 7: Advanced Query Techniques
Beyond the basics, understanding advanced query techniques can be beneficial in extracting meaningful information from your bioinformatics database effectively.
a) Sub-Queries and Nested Selects:
Sub-queries allow you to perform operations on the output of another query.
SELECT ProteinName
FROM Proteins
WHERE UniProtID IN (SELECT UniProtID FROM Diseases WHERE DiseaseName = 'DiseaseOne');
b) Group By and Having Clauses:
These clauses are essential for performing complex aggregations and filtrations.
SELECT ExternalPDBID, COUNT(*)
FROM Proteins
GROUP BY ExternalPDBID
HAVING COUNT(*) > 1;
Step 8: Scripting and Automation
Often, you would need to automate your database interactions using scripting languages like Python.
a) Python MySQL Connection:
import mysql.connectormydb = mysql.connector.connect(
host="localhost",
user="bio_user",
password="password",
database="DiseaseDB"
)
mycursor = mydb.cursor()
mycursor.execute("SELECT * FROM Proteins")
myresult = mycursor.fetchall()
for x in myresult:
print(x)
b) Data Insertion through Python:
import mysql.connectormydb = mysql.connector.connect(
host="localhost",
user="bio_user",
password="password",
database="DiseaseDB"
)
mycursor = mydb.cursor()
sql = "INSERT INTO Proteins (UniProtID, ProteinName, ProteinSequence, ExternalPDBID, ExternalPubMedID) VALUES (%s, %s, %s, %s, %s)"
val = ("P43210", "Protein Six", "HSTGVCYTK", "6MNO", 123789456)
mycursor.execute(sql, val)
mydb.commit()
print(mycursor.rowcount, "record inserted.")
Step 9: Error Handling and Logging
To ensure that your system is robust and secure, implement proper error handling and logging for your database interactions.
a) Error Handling in Python:
import mysql.connector
from mysql.connector import Errortry:
connection = mysql.connector.connect(
host="localhost",
user="bio_user",
password="password",
database="DiseaseDB"
)
if connection.is_connected():
cursor = connection.cursor(dictionary=True)
cursor.execute("SELECT * FROM Proteins")
records = cursor.fetchall()
for row in records:
print(row)
except Error as e:
print("Error while connecting to MySQL", e)
finally:
if (connection.is_connected()):
cursor.close()
connection.close()
print("MySQL connection is closed")
b) Logging in Python:
You can use the logging
library in Python to log errors or any other messages.
import logginglogging.basicConfig(filename='database.log', level=logging.ERROR)
try:
# Your database interaction code here
pass
except Exception as e:
logging.error("Error occurred: %s", e)
Step 10: Regular Audits and Updates
Lastly, periodically audit your database for any anomalies, and ensure that your MySQL server and any related software are kept up to date with security patches and updates.
a) Audits:
- Regularly review user permissions and revoke any unnecessary privileges.
- Check for any unusual data entries or anomalies in your database.
b) Updates:
- Regularly update your MySQL server and any connected software to the latest versions to benefit from the latest features and security updates.
- Review and update your database schemas as required, to adapt to changing requirements and optimize performance.
By implementing these practices and continuously learning more about MySQL and database management, you can maintain a robust, secure, and efficient bioinformatics database system.
To connect the MySQL database using PHP and Apache, you will need to set up a local development environment with a software package like XAMPP, which includes Apache, MySQL, and PHP. Here’s a step-by-step tutorial:
Step 1: Install XAMPP
- Download XAMPP from the official website: XAMPP Download Page.
- Follow the installation instructions to install XAMPP on your laptop.
- Once installed, open the XAMPP Control Panel and start the Apache and MySQL services.
Step 2: Connect PHP to MySQL Database
- Open a text editor and create a new PHP file. You can name it
index.php
. - Write PHP code to connect to the MySQL database.
$servername = "localhost";
$username = "bio_user"; // your MySQL username
$password = "password"; // your MySQL password
$dbname = "DiseaseDB"; // your MySQL database name// Create connection
$conn = new mysqli($servername, $username, $password, $dbname);
// Check connection
if ($conn->connect_error) {
die("Connection failed: " . $conn->connect_error);
}
echo "Connected successfully";
- Save this
index.php
file in thehtdocs
directory inside the directory where XAMPP is installed. For example, if XAMPP is installed inC:\xampp
, save theindex.php
file inC:\xampp\htdocs
.
Step 3: Test the Connection
- Open your web browser.
- Type
http://localhost/index.php
in the address bar and press Enter. - If everything is set up correctly, you should see “Connected successfully” displayed in your browser.
Step 4: Fetching and Displaying Data
Once you have confirmed that your PHP file can connect to MySQL, you can extend your PHP code to fetch and display data from your database.
Edit the index.php
file and add the following PHP code:
$sql = "SELECT UniProtID, ProteinName, ExternalPubMedID FROM Proteins";
$result = $conn->query($sql);if ($result->num_rows > 0) {
// Output data of each row
while($row = $result->fetch_assoc()) {
echo "<br> UniProt ID: ". $row["UniProtID"]. " - Name: ". $row["ProteinName"]. " - PubMed ID: " . $row["ExternalPubMedID"] . "<br>";
}
} else {
echo "0 results";
}
$conn->close();
Step 5: Display the Result
- Save your updated
index.php
file. - Refresh your browser with
http://localhost/index.php
. - The page should display the rows from your Proteins table, or “0 results” if the table is empty.
Step 6: Create an Interface
If you want to create a user interface with forms for interacting with the database, you can further expand your PHP code to handle form inputs and perform various operations like inserting, updating, and deleting records in the database.
For example, a simple HTML form to add new proteins can be as follows:
<form method="post" action="insert_protein.php">
<label for="uniProtID">UniProt ID:</label>
<input type="text" name="uniProtID" required><br><label for="proteinName">Protein Name:</label>
<input type="text" name="proteinName" required><br>
<label for="pubMedID">PubMed ID:</label>
<input type="text" name="pubMedID" required><br>
<input type="submit" value="Add Protein">
</form>
And insert_protein.php
would look something like:
$servername = "localhost";
$username = "bio_user";
$password = "password";
$dbname = "DiseaseDB";// Create connection
$conn = new mysqli($servername, $username, $password, $dbname);
// Check connection
if ($conn->connect_error) {
die("Connection failed: " . $conn->connect_error);
}
$uniProtID = $_POST['uniProtID'];
$proteinName = $_POST['proteinName'];
$pubMedID = $_POST['pubMedID'];
$sql = "INSERT INTO Proteins (UniProtID, ProteinName, ExternalPubMedID) VALUES ('$uniProtID', '$proteinName', '$pubMedID')";
if ($conn->query($sql) === TRUE) {
echo "New record created successfully";
} else {
echo "Error: " . $sql . "<br>" . $conn->error;
}
$conn->close();
Remember that the above examples are simplistic and do not sanitize user input or use prepared statements, which are crucial for security. Make sure to validate and sanitize user inputs and use prepared statements or parameterized queries to avoid SQL injection attacks when developing production applications.
By now, you should have a local Apache server running PHP connected to your MySQL bioinformatics database and can continue to build upon this foundation to suit your specific needs in bioinformatics.
Step 7: Enhancing Security and Interactivity
When working with PHP and databases, especially when developing applications that might be used in a production environment, consider the following aspects:
a) Use Prepared Statements and Parameterized Queries:
To avoid SQL injection attacks, always use prepared statements or parameterized queries when interacting with the database. Here’s an example of using prepared statements in PHP:
// Prepare and bind
$stmt = $conn->prepare("INSERT INTO Proteins (UniProtID, ProteinName, ExternalPubMedID) VALUES (?, ?, ?)");
$stmt->bind_param("sss", $uniProtID, $proteinName, $pubMedID);// Set parameters and execute
$uniProtID = $_POST['uniProtID'];
$proteinName = $_POST['proteinName'];
$pubMedID = $_POST['pubMedID'];
$stmt->execute();
echo "New record created successfully";
$stmt->close();
$conn->close();
b) Input Validation and Sanitization:
Always validate and sanitize user input before using it, especially if it’s used in database queries or displayed on web pages.
$uniProtID = mysqli_real_escape_string($conn, $_POST['uniProtID']);
$proteinName = mysqli_real_escape_string($conn, $_POST['proteinName']);
$pubMedID = mysqli_real_escape_string($conn, $_POST['pubMedID']);
c) Session Management:
For applications that require user authentication, proper session management is crucial.
session_start(); // Start the session at the beginning of your script
$_SESSION["username"] = "bio_user"; // Store session data
d) Error Handling:
Incorporate proper error handling to ensure that the application can fail gracefully, and no sensitive information is exposed to the users.
ini_set('display_errors', '0'); // Do not display errors to users
error_reporting(E_ALL); // Report all errors
log_errors = "On"; // Log errors
Step 8: Developing Interactive UI
To make user interaction more interactive and user-friendly, you can use JavaScript along with HTML and CSS to create dynamic content and responsive designs.
a) Create Responsive Designs:
Use CSS frameworks like Bootstrap to create responsive and modern designs.
<link href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css" rel="stylesheet">
b) Use JavaScript for Dynamic Content:
Use JavaScript to make your web pages more interactive and to perform client-side validations.
<script src="https://code.jquery.com/jquery-3.3.1.slim.min.js"></script>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/js/bootstrap.min.js"></script>
Step 9: API Integration
For accessing external services or databases like PubMed, you might need to interact with their APIs.
a) Use PHP for API Calls:
PHP can be used to make HTTP requests to external APIs and process the response.
$response = file_get_contents("https://api.example.com/data?param=value");
$data = json_decode($response);
b) Integrate API Data:
You can integrate the data received from external APIs with your application’s data and display it to the users or store it in your local database.
Step 10: Application Deployment
Once the application is fully developed and tested, it can be deployed to a web server. This process involves transferring your application files and setting up the environment on a remote server. It’s crucial to have proper security configurations in place in a production environment to protect sensitive data.
Step 11: Continuous Improvement
After deployment, keep monitoring the application’s performance and user feedback to make necessary improvements. Keep learning about new technologies and best practices to improve your skills and your application over time.
Step 12: Advanced Feature Implementation
a) Pagination:
If your database grows large, displaying all results on one page is not practical. You can implement pagination to display a limited number of results per page.
$limit = 10; // Number of entries to show in a page.
// Look for a GET variable page if not found default is 1.
if (isset($_GET["page"])) {
$pn = $_GET["page"];
}
else {
$pn=1;
};
$start_from = ($pn-1) * $limit;
$sql = "SELECT * FROM Proteins LIMIT $start_from, $limit";
b) Search and Filter:
Implementing a search and filter feature will allow users to easily find specific records.
$searchTerm = mysqli_real_escape_string($conn, $_POST['search']);
$sql = "SELECT * FROM Proteins WHERE ProteinName LIKE '%$searchTerm%'";
c) Data Visualization:
Integrate libraries like Chart.js or D3.js to visualize data in various formats such as tables, charts, and graphs.
<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
d) User Authentication and Authorization:
Implement user login functionality, and define roles and permissions to control access to various parts of your application.
if ($_SESSION['user_role'] != 'admin') {
die('Unauthorized access');
}
Step 13: Documentation and Code Comments
Thoroughly document your code, including inline comments, to explain the logic and flow of your application. This will help others (or even yourself, in the future) understand and maintain the codebase.
/**
* This function connects to the database and returns the connection object.
* @return mysqli $conn - The database connection object.
*/
function connectToDatabase() {
// ... function logic ...
}
Step 14: Automated Testing and Deployment
Develop automated tests to verify the correctness of your code and ensure that changes do not introduce bugs. Set up continuous integration and continuous deployment (CI/CD) pipelines to automate the testing and deployment processes.
phpunit --configuration phpunit.xml
Step 15: Performance Optimization
Optimize the performance of your application by profiling and identifying bottlenecks, optimizing database queries, and using caching mechanisms.
$result = $cache->get('my_query');
if (!$result) {
$result = $db->query('SELECT * FROM Proteins');
$cache->set('my_query', $result);
}
Step 16: User Feedback and Iteration
Collect user feedback regularly and make iterative improvements to your application based on the feedback and emerging requirements.
Step 17: Continuous Learning and Upgrading
Stay updated with the latest technologies, security best practices, and industry trends. Regularly upgrade your application to use the latest libraries and frameworks and to address deprecated features and security vulnerabilities.
Conclusion
By adopting these advanced steps, you will not only have a functional and user-friendly application but also a well-maintained, optimized, and scalable solution. Each step is crucial for the development lifecycle, ensuring the longevity and reliability of your bioinformatics application. Regular upgrades, user feedback, and continuous learning are essential for keeping the application relevant and useful in the ever-evolving field of bioinformatics.