bioinformatics database-mysql-php

Bioinformatics database – A Step-by-Step Guide to Database Creation using MySQL and PHP

September 26, 2023 Off By admin
Shares

Table of Contents

MySQL Installation:

For Windows:

  1. Download MySQL Installer:
    • Visit the MySQL official download page
    • Download the MySQL Installer community version.
  2. Install MySQL:
    • Run the installer that you downloaded.
    • Follow the on-screen instructions to install MySQL, choosing the “Developer Default” to install the MySQL server and other MySQL tools.
  3. Configure MySQL:
    • Once installation is complete, configure the MySQL server. Use the default settings for a development machine.
    • Set the root password when prompted. Make sure to remember it as you’ll need it to access MySQL.

For Linux:

  1. Update Packages:
    shell
    sudo apt-get update
  2. Install MySQL Server:
    shell
    sudo apt-get install mysql-server
  3. Secure MySQL Installation:
    shell
    sudo mysql_secure_installation
    • You’ll be prompted to set up a password for the MySQL root user. Remember this password.

Starting MySQL:

For Windows:

  • Open MySQL Command Line Client from Start Menu.
  • Enter your password when prompted.

For Linux:

shell
mysql -u root -p
  • Enter your password when prompted.

Creating a Database and Tables:

1. Create a Database:

sql
CREATE DATABASE DiseaseDB;

2. Use the Database:

sql
USE DiseaseDB;

3. Create a Table:

For the bioinformatics database, we’ll create a table called Proteins to store information about proteins related to a particular disease.

sql
CREATE TABLE Proteins (
UniProtID VARCHAR(15) PRIMARY KEY,
ProteinName VARCHAR(255) NOT NULL,
ProteinSequence TEXT NOT NULL,
ExternalPDBID VARCHAR(15),
ExternalPubMedID INT
);

4. Insert Data:

Insert some example data into the Proteins table.

sql
INSERT INTO Proteins (UniProtID, ProteinName, ProteinSequence, ExternalPDBID, ExternalPubMedID)
VALUES
('P12345', 'Example Protein', 'MTEITAAMVKELRESTGAG', '1XYZ', 123456789),
('P67890', 'Another Protein', 'MQDRVKRPMNAFIVWSRDQ', '2ABC', 987654321);

5. Retrieve Data:

To retrieve data from the Proteins table, you can use the SELECT statement.

sql
SELECT * FROM Proteins;

Advanced MySQL Operations:

1. Creating Index:

Indexes are used to retrieve data from the database more quickly. For example, you can create an index on the ProteinName column.

sql
CREATE INDEX idx_protein_name ON Proteins (ProteinName);

2. Altering Tables:

If you need to add a new column to the Proteins table, you can use the ALTER TABLE statement.

sql
ALTER TABLE Proteins ADD COLUMN GeneName VARCHAR(255);

3. Joining Tables:

If you have another table, and you want to perform a JOIN operation, you can do it as follows. Suppose there is another table named Diseases:

sql
CREATE TABLE Diseases (
DiseaseID INT PRIMARY KEY,
DiseaseName VARCHAR(255) NOT NULL,
UniProtID VARCHAR(15),
FOREIGN KEY (UniProtID) REFERENCES Proteins (UniProtID)
);

Now, if you want to join Proteins and Diseases on UniProtID, you can do the following:

sql
SELECT * FROM Proteins
JOIN Diseases ON Proteins.UniProtID = Diseases.UniProtID;

MySQL Maintenance and Optimization:

  1. Optimize Table: To reclaim unused space and to defragment the data file, you can use the OPTIMIZE TABLE statement.
    sql
    OPTIMIZE TABLE Proteins;
  2. Backup Database: It’s crucial to regularly back up your database.
    shell
    mysqldump -u root -p DiseaseDB > DiseaseDB_backup.sql
    • Enter your MySQL password when prompted.
  3. Restore Database: If needed, you can restore your database from a backup file.
    shell
    mysql -u root -p DiseaseDB < DiseaseDB_backup.sql
    • Enter your MySQL password when prompted.

This is a concise overview to help you get started with MySQL using a bioinformatics example. For more complex and advanced usage, you can refer to the official MySQL documentation or explore specific MySQL tutorials focused on advanced concepts and optimizations.

Step 1: Data Input

Once you have your tables set up, you’ll likely want to input additional data, or modify existing data. Here’s how you can do that, continuing with our bioinformatics example:

Insert Additional Data

sql
INSERT INTO Proteins (UniProtID, ProteinName, ProteinSequence, ExternalPDBID, ExternalPubMedID)
VALUES ('P34567', 'Protein Three', 'SDKJHSKJDHKSJHDKS', '3DEF', 456123789);

If you have many rows to insert, you can use batch insert:

sql
INSERT INTO Proteins (UniProtID, ProteinName, ProteinSequence, ExternalPDBID, ExternalPubMedID)
VALUES
('P11111', 'Protein Four', 'AIUEOAIUEOAIUEO', '4GHI', 112233445),
('P22222', 'Protein Five', 'BCDGEFBCDGEFBCDG', '5JKL', 556677889);

Update Data

If you want to update existing rows, you can use the UPDATE statement. For example, to add a GeneName to a row in the Proteins table:

sql
UPDATE Proteins SET GeneName = 'GeneOne' WHERE UniProtID = 'P12345';

Delete Data

To delete rows, use the DELETE statement. For example, to delete a protein with a specific UniProtID:

sql
DELETE FROM Proteins WHERE UniProtID = 'P67890';

Step 2: Complex Queries

MySQL allows you to perform complex queries to retrieve and manipulate your data.

Retrieve Specific Columns:

sql
SELECT UniProtID, ProteinName FROM Proteins WHERE ExternalPubMedID > 500000000;

Sorting

You can also sort the results using the ORDER BY clause:

sql
SELECT * FROM Proteins ORDER BY ProteinName DESC;

Aggregations

MySQL provides several aggregation functions like COUNT, AVG, SUM, MIN, and MAX to perform operations on your data.

sql
SELECT COUNT(*) FROM Proteins WHERE ExternalPubMedID > 500000000;

Step 3: User Management and Permissions

MySQL allows you to create different users and grant them specific permissions to secure your database.

Create User

To create a new user:

sql
CREATE USER 'bio_user'@'localhost' IDENTIFIED BY 'password';

Grant Permissions

To grant permissions to a user:

sql
GRANT SELECT, INSERT, UPDATE ON DiseaseDB.* TO 'bio_user'@'localhost';

Revoke Permissions

To revoke permissions from a user:

sql
REVOKE INSERT, UPDATE ON DiseaseDB.* FROM 'bio_user'@'localhost';

Step 4: Performance Optimization

The performance of the MySQL server can be optimized by analyzing and optimizing queries and indexing strategy.

Query Execution Plan

You can analyze the query execution plan using the EXPLAIN keyword:

sql
EXPLAIN SELECT * FROM Proteins WHERE ProteinName = 'Example Protein';

Profiling

MySQL has a built-in profiler that you can use to analyze the performance of your queries.

sql
SET profiling = 1;
SELECT * FROM Proteins WHERE ProteinName = 'Example Protein';
SHOW PROFILES;

Indexing Strategy

Based on the query execution plan and profiling results, you can decide which columns need indexing. For example, if you query the ProteinName column frequently, consider adding an index:

sql
CREATE INDEX idx_name ON Proteins (ProteinName);

Step 5: Database Maintenance

Database maintenance is crucial to ensure the stability and performance of your system.

Table Maintenance

MySQL provides various table maintenance statements like CHECK TABLE, REPAIR TABLE, and OPTIMIZE TABLE.

sql
CHECK TABLE Proteins;
REPAIR TABLE Proteins;
OPTIMIZE TABLE Proteins;

Update Statistics

MySQL uses stored key distribution statistics when deciding the order in which to join tables when you are performing a join on something other than a constant. Those statistics can be updated using the ANALYZE TABLE statement.

sql
ANALYZE TABLE Proteins;

Step 6: Backup and Restore

Finally, regularly back up your database to prevent data loss, and know how to restore it if needed.

Backup with mysqldump:

shell
mysqldump -u root -p DiseaseDB > DiseaseDB_backup.sql

Restore from Backup:

shell
mysql -u root -p DiseaseDB < DiseaseDB_backup.sql

This extensive tutorial will give you a deeper understanding of MySQL from beginner level to advanced, and it should serve as a comprehensive guide for database management in a bioinformatics context.

Step 7: Advanced Query Techniques

Beyond the basics, understanding advanced query techniques can be beneficial in extracting meaningful information from your bioinformatics database effectively.

a) Sub-Queries and Nested Selects:

Sub-queries allow you to perform operations on the output of another query.

sql
SELECT ProteinName
FROM Proteins
WHERE UniProtID IN (SELECT UniProtID FROM Diseases WHERE DiseaseName = 'DiseaseOne');

b) Group By and Having Clauses:

These clauses are essential for performing complex aggregations and filtrations.

sql
SELECT ExternalPDBID, COUNT(*)
FROM Proteins
GROUP BY ExternalPDBID
HAVING COUNT(*) > 1;

Step 8: Scripting and Automation

Often, you would need to automate your database interactions using scripting languages like Python.

a) Python MySQL Connection:

python
import mysql.connector

mydb = mysql.connector.connect(
host="localhost",
user="bio_user",
password="password",
database="DiseaseDB"
)

mycursor = mydb.cursor()
mycursor.execute("SELECT * FROM Proteins")
myresult = mycursor.fetchall()

for x in myresult:
print(x)

b) Data Insertion through Python:

python
import mysql.connector

mydb = mysql.connector.connect(
host="localhost",
user="bio_user",
password="password",
database="DiseaseDB"
)

mycursor = mydb.cursor()

sql = "INSERT INTO Proteins (UniProtID, ProteinName, ProteinSequence, ExternalPDBID, ExternalPubMedID) VALUES (%s, %s, %s, %s, %s)"
val = ("P43210", "Protein Six", "HSTGVCYTK", "6MNO", 123789456)
mycursor.execute(sql, val)

mydb.commit()

print(mycursor.rowcount, "record inserted.")

Step 9: Error Handling and Logging

To ensure that your system is robust and secure, implement proper error handling and logging for your database interactions.

a) Error Handling in Python:

python
import mysql.connector
from mysql.connector import Error

try:
connection = mysql.connector.connect(
host="localhost",
user="bio_user",
password="password",
database="DiseaseDB"
)

if connection.is_connected():
cursor = connection.cursor(dictionary=True)
cursor.execute("SELECT * FROM Proteins")
records = cursor.fetchall()
for row in records:
print(row)

except Error as e:
print("Error while connecting to MySQL", e)

finally:
if (connection.is_connected()):
cursor.close()
connection.close()
print("MySQL connection is closed")

b) Logging in Python:

You can use the logging library in Python to log errors or any other messages.

python
import logging

logging.basicConfig(filename='database.log', level=logging.ERROR)

try:
# Your database interaction code here
pass
except Exception as e:
logging.error("Error occurred: %s", e)

Step 10: Regular Audits and Updates

Lastly, periodically audit your database for any anomalies, and ensure that your MySQL server and any related software are kept up to date with security patches and updates.

a) Audits:

  • Regularly review user permissions and revoke any unnecessary privileges.
  • Check for any unusual data entries or anomalies in your database.

b) Updates:

  • Regularly update your MySQL server and any connected software to the latest versions to benefit from the latest features and security updates.
  • Review and update your database schemas as required, to adapt to changing requirements and optimize performance.

By implementing these practices and continuously learning more about MySQL and database management, you can maintain a robust, secure, and efficient bioinformatics database system.

To connect the MySQL database using PHP and Apache, you will need to set up a local development environment with a software package like XAMPP, which includes Apache, MySQL, and PHP. Here’s a step-by-step tutorial:

Step 1: Install XAMPP

  1. Download XAMPP from the official website: XAMPP Download Page.
  2. Follow the installation instructions to install XAMPP on your laptop.
  3. Once installed, open the XAMPP Control Panel and start the Apache and MySQL services.

Step 2: Connect PHP to MySQL Database

  1. Open a text editor and create a new PHP file. You can name it index.php.
  2. Write PHP code to connect to the MySQL database.
php
<?php
$servername = "localhost";
$username = "bio_user"; // your MySQL username
$password = "password"; // your MySQL password
$dbname = "DiseaseDB"; // your MySQL database name

// Create connection
$conn = new mysqli($servername, $username, $password, $dbname);

// Check connection
if ($conn->connect_error) {
die("Connection failed: " . $conn->connect_error);
}
echo "Connected successfully";
?>

  1. Save this index.php file in the htdocs directory inside the directory where XAMPP is installed. For example, if XAMPP is installed in C:\xampp, save the index.php file in C:\xampp\htdocs.

Step 3: Test the Connection

  1. Open your web browser.
  2. Type http://localhost/index.php in the address bar and press Enter.
  3. If everything is set up correctly, you should see “Connected successfully” displayed in your browser.

Step 4: Fetching and Displaying Data

Once you have confirmed that your PHP file can connect to MySQL, you can extend your PHP code to fetch and display data from your database.

Edit the index.php file and add the following PHP code:

php
<?php
$sql = "SELECT UniProtID, ProteinName, ExternalPubMedID FROM Proteins";
$result = $conn->query($sql);

if ($result->num_rows > 0) {
// Output data of each row
while($row = $result->fetch_assoc()) {
echo "<br> UniProt ID: ". $row["UniProtID"]. " - Name: ". $row["ProteinName"]. " - PubMed ID: " . $row["ExternalPubMedID"] . "<br>";
}
} else {
echo "0 results";
}

$conn->close();
?>

Step 5: Display the Result

  1. Save your updated index.php file.
  2. Refresh your browser with http://localhost/index.php.
  3. The page should display the rows from your Proteins table, or “0 results” if the table is empty.

Step 6: Create an Interface

If you want to create a user interface with forms for interacting with the database, you can further expand your PHP code to handle form inputs and perform various operations like inserting, updating, and deleting records in the database.

For example, a simple HTML form to add new proteins can be as follows:

html
<form method="post" action="insert_protein.php">
<label for="uniProtID">UniProt ID:</label>
<input type="text" name="uniProtID" required><br>

<label for="proteinName">Protein Name:</label>
<input type="text" name="proteinName" required><br>

<label for="pubMedID">PubMed ID:</label>
<input type="text" name="pubMedID" required><br>

<input type="submit" value="Add Protein">
</form>

And insert_protein.php would look something like:

php
<?php
$servername = "localhost";
$username = "bio_user";
$password = "password";
$dbname = "DiseaseDB";

// Create connection
$conn = new mysqli($servername, $username, $password, $dbname);

// Check connection
if ($conn->connect_error) {
die("Connection failed: " . $conn->connect_error);
}

$uniProtID = $_POST['uniProtID'];
$proteinName = $_POST['proteinName'];
$pubMedID = $_POST['pubMedID'];

$sql = "INSERT INTO Proteins (UniProtID, ProteinName, ExternalPubMedID) VALUES ('$uniProtID', '$proteinName', '$pubMedID')";

if ($conn->query($sql) === TRUE) {
echo "New record created successfully";
} else {
echo "Error: " . $sql . "<br>" . $conn->error;
}

$conn->close();
?>

Remember that the above examples are simplistic and do not sanitize user input or use prepared statements, which are crucial for security. Make sure to validate and sanitize user inputs and use prepared statements or parameterized queries to avoid SQL injection attacks when developing production applications.

By now, you should have a local Apache server running PHP connected to your MySQL bioinformatics database and can continue to build upon this foundation to suit your specific needs in bioinformatics.

Step 7: Enhancing Security and Interactivity

When working with PHP and databases, especially when developing applications that might be used in a production environment, consider the following aspects:

a) Use Prepared Statements and Parameterized Queries:

To avoid SQL injection attacks, always use prepared statements or parameterized queries when interacting with the database. Here’s an example of using prepared statements in PHP:

php
<?php
// Prepare and bind
$stmt = $conn->prepare("INSERT INTO Proteins (UniProtID, ProteinName, ExternalPubMedID) VALUES (?, ?, ?)");
$stmt->bind_param("sss", $uniProtID, $proteinName, $pubMedID);

// Set parameters and execute
$uniProtID = $_POST['uniProtID'];
$proteinName = $_POST['proteinName'];
$pubMedID = $_POST['pubMedID'];
$stmt->execute();

echo "New record created successfully";

$stmt->close();
$conn->close();
?>

b) Input Validation and Sanitization:

Always validate and sanitize user input before using it, especially if it’s used in database queries or displayed on web pages.

php
$uniProtID = mysqli_real_escape_string($conn, $_POST['uniProtID']);
$proteinName = mysqli_real_escape_string($conn, $_POST['proteinName']);
$pubMedID = mysqli_real_escape_string($conn, $_POST['pubMedID']);

c) Session Management:

For applications that require user authentication, proper session management is crucial.

php
session_start(); // Start the session at the beginning of your script
$_SESSION["username"] = "bio_user"; // Store session data

d) Error Handling:

Incorporate proper error handling to ensure that the application can fail gracefully, and no sensitive information is exposed to the users.

php
ini_set('display_errors', '0'); // Do not display errors to users
error_reporting(E_ALL); // Report all errors
log_errors = "On"; // Log errors

Step 8: Developing Interactive UI

To make user interaction more interactive and user-friendly, you can use JavaScript along with HTML and CSS to create dynamic content and responsive designs.

a) Create Responsive Designs:

Use CSS frameworks like Bootstrap to create responsive and modern designs.

html
<link href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css" rel="stylesheet">

b) Use JavaScript for Dynamic Content:

Use JavaScript to make your web pages more interactive and to perform client-side validations.

html
<script src="https://code.jquery.com/jquery-3.3.1.slim.min.js"></script>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/js/bootstrap.min.js"></script>

Step 9: API Integration

For accessing external services or databases like PubMed, you might need to interact with their APIs.

a) Use PHP for API Calls:

PHP can be used to make HTTP requests to external APIs and process the response.

php
$response = file_get_contents("https://api.example.com/data?param=value");
$data = json_decode($response);

b) Integrate API Data:

You can integrate the data received from external APIs with your application’s data and display it to the users or store it in your local database.

Step 10: Application Deployment

Once the application is fully developed and tested, it can be deployed to a web server. This process involves transferring your application files and setting up the environment on a remote server. It’s crucial to have proper security configurations in place in a production environment to protect sensitive data.

Step 11: Continuous Improvement

After deployment, keep monitoring the application’s performance and user feedback to make necessary improvements. Keep learning about new technologies and best practices to improve your skills and your application over time.

Step 12: Advanced Feature Implementation

a) Pagination:

If your database grows large, displaying all results on one page is not practical. You can implement pagination to display a limited number of results per page.

php
$limit = 10; // Number of entries to show in a page.
// Look for a GET variable page if not found default is 1.
if (isset($_GET["page"])) {
$pn = $_GET["page"];
}
else {
$pn=1;
};
$start_from = ($pn-1) * $limit;
$sql = "SELECT * FROM Proteins LIMIT $start_from, $limit";

b) Search and Filter:

Implementing a search and filter feature will allow users to easily find specific records.

php
$searchTerm = mysqli_real_escape_string($conn, $_POST['search']);
$sql = "SELECT * FROM Proteins WHERE ProteinName LIKE '%$searchTerm%'";

c) Data Visualization:

Integrate libraries like Chart.js or D3.js to visualize data in various formats such as tables, charts, and graphs.

html
<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>

d) User Authentication and Authorization:

Implement user login functionality, and define roles and permissions to control access to various parts of your application.

php
if ($_SESSION['user_role'] != 'admin') {
die('Unauthorized access');
}

Step 13: Documentation and Code Comments

Thoroughly document your code, including inline comments, to explain the logic and flow of your application. This will help others (or even yourself, in the future) understand and maintain the codebase.

php
/**
* This function connects to the database and returns the connection object.
* @return mysqli $conn - The database connection object.
*/

function connectToDatabase() {
// ... function logic ...
}

Step 14: Automated Testing and Deployment

Develop automated tests to verify the correctness of your code and ensure that changes do not introduce bugs. Set up continuous integration and continuous deployment (CI/CD) pipelines to automate the testing and deployment processes.

shell
phpunit --configuration phpunit.xml

Step 15: Performance Optimization

Optimize the performance of your application by profiling and identifying bottlenecks, optimizing database queries, and using caching mechanisms.

php
$result = $cache->get('my_query');
if (!$result) {
$result = $db->query('SELECT * FROM Proteins');
$cache->set('my_query', $result);
}

Step 16: User Feedback and Iteration

Collect user feedback regularly and make iterative improvements to your application based on the feedback and emerging requirements.

Step 17: Continuous Learning and Upgrading

Stay updated with the latest technologies, security best practices, and industry trends. Regularly upgrade your application to use the latest libraries and frameworks and to address deprecated features and security vulnerabilities.

Conclusion

By adopting these advanced steps, you will not only have a functional and user-friendly application but also a well-maintained, optimized, and scalable solution. Each step is crucial for the development lifecycle, ensuring the longevity and reliability of your bioinformatics application. Regular upgrades, user feedback, and continuous learning are essential for keeping the application relevant and useful in the ever-evolving field of bioinformatics.

Shares