How to Download FASTQ Files from ENA and Use Them with FastQC, Kallisto, and Salmon
November 20, 2023Table of Contents
Step 1: Access the European Nucleotide Archive (ENA) and Locate the Project
- Go to the ENA website: https://www.ebi.ac.uk/ena/browser/view/PRJEB31975
- Find the accession code for the project you are interested in (in this case, it is “PRJEB31975”).
Step 2: Use SRA Explorer to Obtain Download Links
- Open https://sra-explorer.info/ in your web browser.
- Enter the accession code (e.g., “PRJEB31975”) and press Enter.
- The page will display a list of files associated with the accession. Copy the list to a text file named “files.txt” for later use.
Step 3: Download FASTQ Files Using Wget and Parallel (Unix/Mac)
- Open a terminal on your Unix/Mac system.
- Navigate to the directory where you want to download the files.
- Use the following command to download the files in parallel:bash
cat files.txt | parallel -j 2 "wget {}"
- Adjust the value after
-j
based on your internet connection and hard drive capabilities.
- Adjust the value after
Step 4: Check Total Data Size and Ensure Adequate Resources
- Visit the NCBI Traces page for the accession: https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJEB31975&o=acc_s%3Aa
- Verify the total size of the data set (580+GB) and ensure you have enough storage space and bandwidth before attempting to download.
Step 5: Understand the Tools – FastQC, Kallisto, and Salmon
- FastQC:
- FastQC is a quality control tool for high-throughput sequence data.
- Download FastQC from https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
- Run FastQC on your downloaded FASTQ files to assess their quality.
- Kallisto:
- Kallisto is a program for quantifying abundances of transcripts from RNA-Seq data.
- Download Kallisto from https://pachterlab.github.io/kallisto/.
- Prepare an index for your transcriptome (if needed) and run Kallisto to quantify gene expression.
- Salmon:
- Salmon is a tool for quantifying transcript abundance from RNA-Seq data.
- Download Salmon from https://salmon.readthedocs.io/.
- Index your transcriptome (if necessary) and use Salmon to estimate transcript-level abundances.
Step 6: Optional – Use nf-core/fetchngs
- Explore the use of
nf-core/fetchngs
for a more streamlined data retrieval process. - Follow the instructions provided by
nf-core/fetchngs
for efficient downloading and processing of NGS data.
Step 7: Post-Download Considerations
- Depending on your analysis needs, you may download each FASTQ file, use it for processing, and then delete it to conserve space.
- Adapt your workflow for FastQC, Kallisto, and Salmon based on their specific requirements and your research goals.
By following these steps, you can efficiently download and use the FASTQ files from ENA for your bioinformatics analyses on your Unix/Mac system, utilizing tools like FastQC, Kallisto, and Salmon.
Related posts:
Decoding Bioinformatics: Your Career Questions Answered
How Bioinformatics Contributes to Economic Growth in Biotechnology
Essential Imaging Software for Mass Spectrometry and Molecular Visualization
HTML for Bioinformatics: A Comprehensive Guide for Structuring and Presenting Scientific Data
Bioinformatics and Machine Learning: From Sequence Analysis to Predictive Modeling
Comprehensive Bioinformatics Tools for CRISPR/Cas9 Design and Analysis
Next-generation gene editing: CRISPR technology
Bioinformatics Powered by Open Source: A Game Changer
Enhancing Bioinformatics Research: Unveiling the Potential of ChatGPT, Bard, and Claude
How to Secure a Bioinformatics Internship: Tips and Tricks
Fundamentals of Homology Modeling in Bioinformatics
Exploring the Future of Bioinformatics: Trending Topics and Research Opportunities