bioinformatics-DNA, protein

Submitting High-Throughput Sequence Data to GEO (Gene Expression Omnibus)

December 3, 2024 Off By admin
Shares

This tutorial provides a step-by-step guide to submit high-throughput sequence data to GEO. GEO (Gene Expression Omnibus) is a public repository for gene expression data and other functional genomics datasets.


Step 1: Understand Submission Guidelines

  1. Visit the GEO Home Page: Navigate to GEO’s home page.
  2. Review Submission Guidelines: Go to the submission guidelines at www.ncbi.nlm.nih.gov/geo/info/seq.html.
    • These guidelines provide details on required files and submission procedures.
  3. Required Components for Submission:
    • Metadata: Includes study and sample descriptions and protocols.
    • Processed Data: Data used to draw conclusions in your study.
    • Raw Data: These files are deposited into SRA by NCBI staff.

Step 2: Prepare Metadata Spreadsheet

  1. Download the Metadata Template:
    • Click on Metadata spreadsheet and then Download metadata spreadsheet on the submission guidelines page.
  2. Fill in Metadata:
    • Metadata Tab: Provide detailed information for your study, samples, and protocols.
    • Fields marked with an asterisk (*) are required. Ensure all required fields are completed.
    • Use tips available in the spreadsheet by hovering over the field headers.
  3. Review Instructions:
    • Use the Instructions tab in the spreadsheet for guidance.
    • Refer to example worksheets for different experiment types.
  4. Special Notes:
    • For paired-end fastq files, list both R1 and R2 files in the same row under the Samples section.
    • Ensure filenames are unique and match submitted files exactly. Avoid whitespace or special characters in filenames.

Step 3: Format Data Files

  1. Raw Data:
    • Typically provided in fastq or bam formats.
    • Check SRA’s accepted formats here.
  2. Processed Data:
    • Include final quantified data (e.g., normalized read counts for RNA-seq studies).
    • Do not submit only differentially expressed genes or read alignment files as processed data.
  3. Organize Files:
    • Place raw and processed data files in a single directory for each experiment.
    • Do not compress fastq or bam files into .tar or .zip archives.
    • Avoid subdirectories with identically named files.

Step 4: Transfer Files to GEO

  1. Set Up FTP Transfer:
    • Log in to your My NCBI account.
    • Go to the GEO FTP submission page at geo/info/submissionftp.
    • Click the Transfer Files button to create your personalized upload space.
  2. Upload Files:
    • Transfer your folder containing all data files to the FTP upload space.
    • Follow the instructions provided on the GEO FTP page.

Step 5: Upload Metadata File

  1. Access Metadata Upload Page:
    • Use the Upload Metadata button to navigate to the metadata submission page.
  2. Select FTP Subfolder:
    • Link the uploaded raw and processed data files to the metadata file.
  3. Upload Metadata:
    • Select and upload your local metadata file.
    • Specify the release date for public access (maximum four years from the upload date).
    • Provide additional comments if needed in the Comment to GEO staff box.

Step 6: Validate Submission

  1. Metadata Validation:
    • After submission, the metadata file is assessed for missing fields or errors.
    • Address any identified issues by uploading a revised metadata file.
  2. Receive Confirmation:
    • Upon successful validation, you’ll receive a confirmation message and an email summarizing your submission.

Step 7: Process and Access Records

  1. Processing:
    • Your submission enters the GEO processing queue.
    • Upon processing, you’ll receive a GEO accession letter via email with accession numbers.
  2. Reviewer Access:
    • The email will also include instructions for creating a reviewer access token.

Shares