computer-bioinformatics

How To Convert List of Entrez IDs Into Gene Names

January 2, 2025 Off By admin
Shares

If you have a list of Entrez IDs and want to convert them into gene names, there are various methods using different programming languages and tools. Below are step-by-step instructions for converting Entrez IDs to gene names using multiple approaches.

Method 1: Using Bioconductor in R

  1. Install Necessary Packages
    Install and load the org.Hs.eg.db and annotate libraries for human gene data.

    R
    install.packages("BiocManager")
    BiocManager::install("org.Hs.eg.db")
    BiocManager::install("annotate")
    library(org.Hs.eg.db)
    library(annotate)
  2. Load Your Entrez ID List Read your list of Entrez IDs into R.
    R
    a <- read.csv("entrez_ids.csv", header = TRUE) # Replace with your file path
  3. Convert Entrez IDs to Gene Names Use getSYMBOL function to map Entrez IDs to gene symbols.
    R
    gene_names <- getSYMBOL(a$EntrezID, data='org.Hs.eg') # Replace 'EntrezID' with your column name
  4. Save Results Save the gene names to a new file.
    R
    write.csv(gene_names, "gene_names.csv")

Method 2: Using BiomaRt in R

  1. Install and Load BiomaRt Package
    R
    install.packages("BiocManager")
    BiocManager::install("biomaRt")
    library(biomaRt)
  2. Set Up BiomaRt Use the biomaRt package to map Entrez IDs to gene names.
    R
    mart <- useMart("ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl")
  3. Convert Entrez IDs to Gene Names Retrieve gene names by providing a list of Entrez IDs.
    R
    entrez_ids <- c("3815", "3816", "2341") # Example Entrez IDs
    gene_names <- getBM(attributes = c("entrezgene", "hgnc_symbol"),
    filters = "entrezgene",
    values = entrez_ids,
    mart = mart)
  4. Save Results Save the results to a CSV file.
    R
    write.csv(gene_names, "gene_names_biomart.csv")

Method 3: Using Perl with org.Hs.eg.db

  1. Install the Necessary Module Install the necessary Perl modules (Bio::DB::EntrezGene) and load them in your script.
    perl
    use Bio::DB::EntrezGene;
  2. Convert Entrez IDs Loop through the Entrez IDs and retrieve the gene names.
    perl
    my $db = Bio::DB::EntrezGene->new();
    my @entrez_ids = ('3815', '3816', '2341'); # Example Entrez IDs

    foreach my $id (@entrez_ids) {
    my $gene = $db->get_gene_by_id($id);
    print $gene->symbol(), "\n";
    }

Method 4: Using UniProt ID Mapping Tool

  1. Go to UniProt
  2. Upload Entrez IDs
    • Select “Entrez Gene” as your input and “Gene Name” as your output.
    • Upload a file with your list of Entrez IDs (CSV format).
  3. Download Results
    • Once the mapping is complete, download the results, which will include the corresponding gene names.

Method 5: Using Online Tool – MatchMiner

  1. Go to MatchMiner
  2. Upload Entrez IDs
    • Upload a file containing the Entrez IDs.
  3. Convert and Download Results
    • The tool will convert the IDs into corresponding gene names (HUGO gene names).
    • Download the results as a CSV file.

Method 6: Using Python (Biopython)

  1. Install Biopython
    bash
    pip install biopython
  2. Write Python Script to Convert Entrez IDs to Gene Names
    python
    from Bio import Entrez

    # Set your email
    Entrez.email = "your-email@example.com"

    def get_gene_name(entrez_id):
    handle = Entrez.efetch(db="gene", id=entrez_id, rettype="gene", retmode="xml")
    records = Entrez.read(handle)
    gene_name = records[0]["Entrezgene_gene"]["Gene-ref_locus"]
    return gene_name

    # Example Entrez IDs
    entrez_ids = ['3815', '3816', '2341']
    gene_names = {entrez_id: get_gene_name(entrez_id) for entrez_id in entrez_ids}

    print(gene_names)

Conclusion

You can convert Entrez IDs into gene names using various programming languages and tools such as R (with Bioconductor and BiomaRt), Perl, Python (Biopython), and online platforms like UniProt and MatchMiner. For large datasets, using Bioconductor or online tools would be most efficient.

Shares