Step-by-Step Guide to Mapping SNPs to Genes within Window

January 10, 2025 Off By admin

Mapping SNPs to nearby genes is a common task in genomics, especially when studying the potential impact of non-coding variants. This guide provides a detailed workflow for mapping SNPs to genes within a +/- 60KB window using tools like BEDTools, UCSC MySQL, and BioMart.

Table of Contents

Step 1: Prepare Your SNP Data

Ensure your SNP data is in a compatible format such as BED, VCF, or a simple tab-delimited file with chromosome, position, and SNP ID. For example:

chr1    100000  rs12345
chr2    200000  rs67890

Step 2: Using BEDTools

Step 2.1: Install BEDTools

Download and install BEDTools from the official website: BEDTools.

Step 2.2: Prepare Gene Annotation File

Download a gene annotation file in BED format. For example, you can use the refFlat table from UCSC or a GTF/GFF file.

Step 2.3: Run BEDTools

Use the windowBed command to find genes within +/- 60KB of your SNPs:

windowBed -a snps.bed -b genes.bed -w 60000 > snps_with_genes_60kb.bed

This will output a file (snps_with_genes_60kb.bed) with SNPs and their nearby genes.

Step 3: Using UCSC MySQL

Step 3.1: Connect to UCSC MySQL Server

Connect to the UCSC MySQL server using the following command:

mysql --user=genome --host=genome-mysql.soe.ucsc.edu -A -D hg19

Step 3.2: Run SQL Query

Run the following SQL query to find genes within +/- 60KB of your SNPs:

SELECT
    S.name AS rsID,
    K.geneName AS GeneSymbol,
    S.chrom AS Chromosome,
    S.chromStart AS SNP_Position,
    K.txStart AS Gene_Start,
    K.txEnd AS Gene_End
FROM
    snp150 AS S
LEFT JOIN
    knownGene AS K ON S.chrom = K.chrom
    AND NOT (K.txEnd + 60000 < S.chromStart OR S.chromEnd + 60000 < K.txStart)
WHERE
    S.name IN ("rs12345", "rs67890");

Replace rs12345 and rs67890 with your SNP IDs.

Step 4: Using BioMart

Step 4.1: Access BioMart

Go to the BioMart website.

Step 4.2: Configure Dataset

Select the Ensembl Genes dataset.
Choose the appropriate species (e.g., Human).
Under Filters, select Variation and input your SNP IDs or chromosomal regions.

Step 4.3: Add Attributes

Under Attributes, select Gene and choose fields like Gene Name, Chromosome Name, Gene Start, and Gene End.
Add Flanking Region and set it to 60KB.

Step 4.4: Download Results

Click Results to download the gene annotations for your SNPs.

Step 5: Using Python for Custom Mapping

Step 5.1: Install Required Libraries

Install the pandas library for data manipulation:

pip install pandas

Step 5.2: Write Python Script

Use the following Python script to map SNPs to genes within +/- 60KB:

import pandas as pd

# Load SNP and gene data
snps = pd.read_csv("snps.bed", sep="\t", header=None, names=["chrom", "start", "end", "rsID"])
genes = pd.read_csv("genes.bed", sep="\t", header=None, names=["chrom", "start", "end", "gene"])

# Find genes within +/- 60KB of SNPs
results = []
for _, snp in snps.iterrows():
    nearby_genes = genes[
        (genes["chrom"] == snp["chrom"]) &
        (genes["start"] <= snp["end"] + 60000) &
        (genes["end"] >= snp["start"] - 60000)
    ]
    for _, gene in nearby_genes.iterrows():
        results.append([snp["rsID"], gene["gene"], snp["chrom"], snp["start"], gene["start"], gene["end"]])

# Save results
results_df = pd.DataFrame(results, columns=["rsID", "Gene", "Chromosome", "SNP_Position", "Gene_Start", "Gene_End"])
results_df.to_csv("snps_with_genes_60kb.csv", index=False)

Step 6: Tips and Tricks

Filter Low-Quality SNPs: Remove low-quality or non-informative SNPs before mapping.
Use Multiple Tools: Cross-validate results using different tools (e.g., BEDTools and UCSC MySQL).
Customize Window Size: Adjust the +/- 60KB window based on your specific research needs.
Visualize Results: Use tools like IGV or UCSC Genome Browser to visualize SNP-gene relationships.

Conclusion

Mapping SNPs to genes within a +/- 60KB window is a crucial step in understanding the potential functional impact of genetic variants. By following this guide, you can efficiently perform this task using tools like BEDTools, UCSC MySQL, BioMart, and Python. Whether you prefer command-line tools or web-based interfaces, this guide provides a comprehensive approach to SNP-gene mapping.