RNAseq

How to Download FASTQ Files from ENA and Use Them with FastQC, Kallisto, and Salmon

November 20, 2023 Off By admin
Shares

Step 1: Access the European Nucleotide Archive (ENA) and Locate the Project

  1. Go to the ENA website: https://www.ebi.ac.uk/ena/browser/view/PRJEB31975
  2. Find the accession code for the project you are interested in (in this case, it is “PRJEB31975”).

Step 2: Use SRA Explorer to Obtain Download Links

  1. Open https://sra-explorer.info/ in your web browser.
  2. Enter the accession code (e.g., “PRJEB31975”) and press Enter.
  3. The page will display a list of files associated with the accession. Copy the list to a text file named “files.txt” for later use.

Step 3: Download FASTQ Files Using Wget and Parallel (Unix/Mac)

  1. Open a terminal on your Unix/Mac system.
  2. Navigate to the directory where you want to download the files.
  3. Use the following command to download the files in parallel:
    bash
    cat files.txt | parallel -j 2 "wget {}"
    • Adjust the value after -j based on your internet connection and hard drive capabilities.

Step 4: Check Total Data Size and Ensure Adequate Resources

  1. Visit the NCBI Traces page for the accession: https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJEB31975&o=acc_s%3Aa
  2. Verify the total size of the data set (580+GB) and ensure you have enough storage space and bandwidth before attempting to download.

Step 5: Understand the Tools – FastQC, Kallisto, and Salmon

  1. FastQC:
  2. Kallisto:
  3. Salmon:
    • Salmon is a tool for quantifying transcript abundance from RNA-Seq data.
    • Download Salmon from https://salmon.readthedocs.io/.
    • Index your transcriptome (if necessary) and use Salmon to estimate transcript-level abundances.

Step 6: Optional – Use nf-core/fetchngs

  1. Explore the use of nf-core/fetchngs for a more streamlined data retrieval process.
  2. Follow the instructions provided by nf-core/fetchngs for efficient downloading and processing of NGS data.

Step 7: Post-Download Considerations

  1. Depending on your analysis needs, you may download each FASTQ file, use it for processing, and then delete it to conserve space.
  2. Adapt your workflow for FastQC, Kallisto, and Salmon based on their specific requirements and your research goals.

By following these steps, you can efficiently download and use the FASTQ files from ENA for your bioinformatics analyses on your Unix/Mac system, utilizing tools like FastQC, Kallisto, and Salmon.

Shares