
Step-by-Step Guide to Download Raw Sequence Data from GEO/SRA
December 27, 2024This guide outlines the process of downloading raw sequencing data from GEO/SRA using the NCBI SRA Toolkit. We will also cover troubleshooting common issues and tips for efficient downloading.
Prerequisites
- Install the SRA Toolkit:
- Download the latest version of the SRA Toolkit: NCBI SRA Toolkit.
- Follow the installation instructions for your operating system.
- Install Entrez Direct (optional):
- Entrez Direct simplifies querying SRA databases. Installation guide: Entrez Direct.
- Environment Setup:
- Add the toolkit binaries to your
PATH:
- Add the toolkit binaries to your
- Verify Installation:
- Run
vdb-config -ito set up default download directories or change them to a directory with sufficient space. - Test the installation:
- Run
Steps to Download Data
1. Identify the Accession Number
- Go to GEO: GEO
- Search for your dataset (e.g., GSE48215).
- Navigate to the sub-series or sample page (e.g., GSM1173000).
- Find the corresponding SRA links (e.g., SRP026538 or SRX317818).
2. Use prefetch to Download .sra Files
- Run the following command to download
.srafiles:- By default, files are saved to
/home/<USER>/ncbi/public/sra.
- By default, files are saved to
Note: If your home directory lacks space, update the cache location:
Change the “Workspace Name” to a directory on a larger disk.
3. Convert .sra to .fastq
- Use
fastq-dump:- The
--split-filesoption generates separate files for paired-end reads (e.g.,_1.fastqand_2.fastq).
- The
Example:
4. Download Multiple Runs
- Use
esearchandefetchwithxargsfor batch downloads: - Convert all downloaded
.srafiles to.fastq:
5. Using Docker for SRA Toolkit
- Use a pre-built Docker container:
6. Alternative: Direct Download from ENA
- If you encounter issues with SRA Toolkit, download directly from ENA:
- Visit: EBI ENA
- Search for the dataset (e.g., SRX317818).
- Download
.fastqfiles via FTP:
7. Troubleshooting Common Errors
- UTF-8 Character Error:
- Ensure no special characters in file paths. Re-run:
- No Accession to Process:
- Check the
.srafile exists and paths are correct.
- Check the
- Command Not Found:
- Verify
PATHincludes the SRA Toolkit binaries.
- Verify
8. Automate Download with a Script
9. Further Processing
- Align
.fastqfiles using an aligner (e.g., STAR, BWA, TopHat). - Perform variant calling or downstream analysis.
10. Recommended Reading
By following this guide, you can efficiently download and prepare sequencing data for downstream analysis.

















