bioinformatics lab

A Step-by-Step Tutorial to Analyzing and Interpreting Next-Generation Sequencing Data

September 27, 2023 Off By admin
Shares

Step 1: Downloading Sample Dataset

a. Download a Viral Genome

Visit NCBI’s Virus Database, and search for a simple viral genome, e.g., PhiX174. Download its genome in FASTA format and note the path.

Step 2: Setup Your Environment

a. Install the Required Software

shell
sudo apt-get update
sudo apt-get install -y bowtie2 samtools bedtools r-base

b. Create a Working Directory

shell
mkdir ngs_analysis
cd ngs_analysis

c. Move the Downloaded Genome to the Working Directory

shell
mv /path_to_your_downloaded_file/phiX174.fasta .

Step 3: Sequence Alignment

a. Build the Bowtie2 Index

shell
bowtie2-build phiX174.fasta phiX174_index

b. Download Sample Reads

For the tutorial’s purpose, we will assume you have sample reads in FASTQ format named sample.fastq. Move it to your working directory.

c. Run the Bowtie2 Alignment

shell
bowtie2 -x phiX174_index -U sample.fastq -S output.sam

Step 4: Process Alignment Files

a. Convert SAM to BAM

shell
samtools view -bS output.sam > output.bam

b. Sort and Index the BAM File

shell
samtools sort output.bam -o sorted_output.bam
samtools index sorted_output.bam

We will also go through the variant analysis in more detail and explore how to use Python or R to generate meaningful plots.

Step 5: Variant Analysis and Visualization

a. Variant Filtering

After calling variants using freebayes, the next step is to filter the variants in the VCF file. This can be done using vcffilter, a tool that comes with vcflib. Install it as follows:

shell
sudo apt-get install vcflib

Now, filter the variants:

shell
vcffilter -f "QUAL > 20" variants.vcf > filtered_variants.vcf

This command will keep only variants with a quality score above 20.

Step 6: Interpretation of Variants

a. Further Exploration of Variants

After annotating your variants, you should further explore them based on their annotations to understand their biological impact.

Using Python:
python
# Assuming you have a 'TYPE' column specifying the variant type
variant_types = df['TYPE'].value_counts()
print(variant_types)

# If you have an 'EFFECT' column specifying the effect of variants
variant_effects = df['EFFECT'].value_counts()
print(variant_effects)

Using R:
R
# Assuming vcf object holds the variant data
variantTypes <- table(elementMetadata(vcf)$TYPE)
print(variantTypes)

variantEffects <- table(elementMetadata(vcf)$EFFECT)
print(variantEffects)

Step 7: Advanced Visualization

a. Visualization of Variant Effects and Types

You can visualize the different types and effects of variants through bar plots or pie charts.

Using Python:
python
import matplotlib.pyplot as plt

# Visualizing variant types
variant_types.plot(kind='bar', color='teal', alpha=0.7)
plt.title('Variant Types')
plt.ylabel('Count')
plt.show()

# Visualizing variant effects
variant_effects.plot(kind='pie', autopct='%1.1f%%', startangle=140, shadow=True)
plt.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.
plt.title('Variant Effects')
plt.show()

Using R:
R
library(ggplot2)

# Visualizing variant types
ggplot(data=data.frame(variantTypes), aes(x=Var1, y=Freq)) +
geom_bar(stat='identity', fill='teal', alpha=0.7) +
labs(title="Variant Types", x="Type", y="Count") + theme_minimal()

# Visualizing variant effects
library(plotly)

effects_df <- as.data.frame(variantEffects)
plot_ly(effects_df, labels = ~Var1, values = ~Freq, type = 'pie') %>%
layout(title = 'Variant Effects', showlegend = T)

Step 8: Biological Interpretation

a. Pathway Analysis and Clinical Relevance:

i. DAVID:
  1. List your variant genes and upload them to DAVID for enrichment analysis.
  2. Study the enriched pathways or biological processes in your gene list, which might give insight into the biological context of your findings.
ii. ClinVar and gnomAD:
  • Use ClinVar to find out if any of your variants are known to be associated with any disease.
  • Use gnomAD to study the population frequency of your variants.

Step 9: Comprehensive Report Creation

a. Organize Your Findings:

  • Introduction: Background of your study, and the importance of the study.
  • Objective: Clearly define what you intend to achieve with this study.
  • Methods: Clearly describe every step, the tools used, and their parameters.
  • Results: Use tables, charts, graphs to represent your findings.
  • Discussion: Explain the importance of your findings and their implications.

b. Utilize Visual Aids:

  • Use appropriate visual aids like tables, charts, and graphs to represent your findings effectively.
  • Properly label each visual aid and reference them appropriately in the text.

c. Review and Edit:

  • Review the report multiple times to ensure clarity, coherence, and logical flow.
  • Edit ruthlessly to ensure brevity while maintaining essence and clarity.

Step 10: Review and Finalize Report

a. Peer Review:

  • Get colleagues or mentors to review your report.
  • Incorporate the feedback and finalize the report.

b. Submission-Ready:

  • Make sure the report is in the required format if you are aiming for publication.
  • Make necessary adjustments as per journal guidelines.

Step 11: Share Your Findings

a. Presentation:

  • Create a concise and clear presentation, highlighting the key aspects of your work.
  • Practice it to ensure smooth delivery.

b. Publications:

  • Select a suitable journal.
  • Prepare your manuscript as per the journal’s guideline and submit it.

c. Data Sharing:

  • Consider depositing your data and code in a public repository like Zenodo or GitHub.

Step 12: Reviewers’ Comments and Publication

a. Address Reviewers’ Comments:

  • Once submitted, you may receive comments and suggestions from peer reviewers.
  • Address all comments diligently, make necessary changes, and resubmit the manuscript.

b. Publication:

  • Once accepted, go through the proof carefully before it goes to publication.
  • Share your published work with your peers and promote it for maximum reach.

Example of Python Code for Visuals:

python
import matplotlib.pyplot as plt

# Example: Visualizing variant impact categories
impact_categories = df['IMPACT'].value_counts()
impact_categories.plot(kind='barh', color='skyblue')
plt.title('Variant Impact Categories')
plt.xlabel('Count')
plt.ylabel('Impact Category')
plt.show()

Example of R Code for Visuals:

R
library(ggplot2)

# Example: Visualizing variant impact categories
impactCategories <- table(elementMetadata(vcf)$IMPACT)
ggplot(data=as.data.frame(impactCategories), aes(x=Var1, y=Freq)) +
geom_bar(stat='identity', fill='skyblue') +
labs(title="Variant Impact Categories", x="Impact Category", y="Count") + theme_minimal()

Remember, the steps involved may vary depending on the specifics of the study and the requirements of the journal if you are aiming to publish your work.

Shares