AI-bioinformatics

Step-by-Step Guide: Downloading the PAM50 Gene Set

January 10, 2025 Off By admin
Shares

The PAM50 gene set is a widely used panel of 50 genes for classifying breast cancer subtypes. While the gene list is not always readily available in a parseable format, there are several ways to obtain it. Below is a step-by-step guide to help you download and use the PAM50 gene set.


1. Obtain the PAM50 Gene List from Published Sources

The PAM50 gene list is often included in supplementary materials of research papers or available through specific resources.

Option 1: Extract from the Original Paper

The PAM50 gene list is included in the original paper by Parker et al. (2009):

Option 2: Use the UNC Lineberger Comprehensive Cancer Center Website

The PAM50 gene list, centroids, and R code are available on the UNC Lineberger website:


2. Download the PAM50 Gene List

If the above links are not accessible, you can manually extract the gene list from the provided sources or use the following list:

PAM50 Gene List:

Copy
ACTR3B, ANLN, BAG1, BCL2, BIRC5, BLVRA, CCNB1, CCNE1, CDC20, CDC6, CDH3, CENPF, CEP55, CXXC5, EGFR, ERBB2, ESR1, EXO1, FGFR4, FOXA1, FOXC1, GPR160, GRB7, KIF2C, KRT14, KRT17, KRT5, MAPT, MDM2, MELK, MIA, MKI67, MLPH, MMP11, MYBL2, MYC, NAT1, NDC80, NUF2, ORC6L, PGR, PHGDH, PTTG1, RRM2, SFRP1, SLC39A6, TMEM45B, TYMS, UBE2C, UBE2T

Save as a Text File:

You can save the gene list as a text file for easy access.

Unix Command:

bash
Copy
echo "ACTR3B ANLN BAG1 BCL2 BIRC5 BLVRA CCNB1 CCNE1 CDC20 CDC6 CDH3 CENPF CEP55 CXXC5 EGFR ERBB2 ESR1 EXO1 FGFR4 FOXA1 FOXC1 GPR160 GRB7 KIF2C KRT14 KRT17 KRT5 MAPT MDM2 MELK MIA MKI67 MLPH MMP11 MYBL2 MYC NAT1 NDC80 NUF2 ORC6L PGR PHGDH PTTG1 RRM2 SFRP1 SLC39A6 TMEM45B TYMS UBE2C UBE2T" > pam50_genes.txt

3. Use the genefu R Package

The genefu package in Bioconductor provides the PAM50 gene list and tools for breast cancer subtype classification.

Install and Load the genefu Package:

R
Copy
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("genefu")

library(genefu)

Access the PAM50 Gene List:

R
Copy
data(pam50)
pam50_genes <- rownames(pam50$centroids)
print(pam50_genes)

Save the Gene List to a File:

R
Copy
write.table(pam50_genes, file = "pam50_genes.txt", row.names = FALSE, col.names = FALSE, quote = FALSE)

4. Use the PAM50 Gene List in Your Analysis

Once you have the gene list, you can use it for various analyses, such as gene expression profiling or subtype classification.

Example: Filtering a Gene Expression Matrix in R

R
Copy
# Load your gene expression data (e.g., a matrix with genes as rows and samples as columns)
expression_data <- read.csv("expression_data.csv", row.names = 1)

# Filter for PAM50 genes
pam50_expression <- expression_data[rownames(expression_data) %in% pam50_genes, ]

# Save the filtered data
write.csv(pam50_expression, file = "pam50_expression_data.csv")

5. Additional Resources


By following these steps, you can easily obtain and use the PAM50 gene set for your research. Whether you extract it manually, download it from a repository, or use an R package, the PAM50 gene list is a valuable resource for breast cancer subtype classification.

Shares