Submitting High-Throughput Sequence Data to GEO (Gene Expression Omnibus)
December 3, 2024This tutorial provides a step-by-step guide to submit high-throughput sequence data to GEO. GEO (Gene Expression Omnibus) is a public repository for gene expression data and other functional genomics datasets.
Table of Contents
Step 1: Understand Submission Guidelines
- Visit the GEO Home Page: Navigate to GEO’s home page.
- Review Submission Guidelines: Go to the submission guidelines at www.ncbi.nlm.nih.gov/geo/info/seq.html.
- These guidelines provide details on required files and submission procedures.
- Required Components for Submission:
Step 2: Prepare Metadata Spreadsheet
- Download the Metadata Template:
- Click on Metadata spreadsheet and then Download metadata spreadsheet on the submission guidelines page.
- Fill in Metadata:
- Metadata Tab: Provide detailed information for your study, samples, and protocols.
- Fields marked with an asterisk (*) are required. Ensure all required fields are completed.
- Use tips available in the spreadsheet by hovering over the field headers.
- Review Instructions:
- Use the Instructions tab in the spreadsheet for guidance.
- Refer to example worksheets for different experiment types.
- Special Notes:
- For paired-end fastq files, list both R1 and R2 files in the same row under the Samples section.
- Ensure filenames are unique and match submitted files exactly. Avoid whitespace or special characters in filenames.
Step 3: Format Data Files
- Raw Data:
- Typically provided in fastq or bam formats.
- Check SRA’s accepted formats here.
- Processed Data:
- Include final quantified data (e.g., normalized read counts for RNA-seq studies).
- Do not submit only differentially expressed genes or read alignment files as processed data.
- Organize Files:
- Place raw and processed data files in a single directory for each experiment.
- Do not compress fastq or bam files into
.tar
or.zip
archives. - Avoid subdirectories with identically named files.
Step 4: Transfer Files to GEO
- Set Up FTP Transfer:
- Log in to your My NCBI account.
- Go to the GEO FTP submission page at geo/info/submissionftp.
- Click the Transfer Files button to create your personalized upload space.
- Upload Files:
- Transfer your folder containing all data files to the FTP upload space.
- Follow the instructions provided on the GEO FTP page.
Step 5: Upload Metadata File
- Access Metadata Upload Page:
- Use the Upload Metadata button to navigate to the metadata submission page.
- Select FTP Subfolder:
- Link the uploaded raw and processed data files to the metadata file.
- Upload Metadata:
- Select and upload your local metadata file.
- Specify the release date for public access (maximum four years from the upload date).
- Provide additional comments if needed in the Comment to GEO staff box.
Step 6: Validate Submission
- Metadata Validation:
- After submission, the metadata file is assessed for missing fields or errors.
- Address any identified issues by uploading a revised metadata file.
- Receive Confirmation:
- Upon successful validation, you’ll receive a confirmation message and an email summarizing your submission.
Step 7: Process and Access Records
- Processing:
- Your submission enters the GEO processing queue.
- Upon processing, you’ll receive a GEO accession letter via email with accession numbers.
- Reviewer Access:
- The email will also include instructions for creating a reviewer access token.
Related posts:
Mastering Text Editors in Linux for Bioinformatics
Protein Secondary Structure Prediction-Tips to improve prediction
MongoDB and Bioinformatics
Ethical considerations in Bioinformatics research
Biochemistry Basics: A Comprehensive Beginner's Guide with Applications in Bioinformatics and Chemin...
Comprehensive Bioinformatics Analysis Guide: From Data Acquisition to Advanced Predictive Modeling &...
How Deep Learning is Revolutionizing Omics?
Overview of Molecular Structure Databases
Can Crispr and Precision Medicine Lead to Curing All Diseases?
Bioinformatics Makes CRISPR Gene Editing Safer and Improves Therapeutic Potential
Bioinformatics courses in Latin America
Advancements in Single-Cell Sequencing Technology in 2024