Step-by-Step Guide: Downloading the PAM50 Gene Set
January 10, 2025The PAM50 gene set is a widely used panel of 50 genes for classifying breast cancer subtypes. While the gene list is not always readily available in a parseable format, there are several ways to obtain it. Below is a step-by-step guide to help you download and use the PAM50 gene set.
1. Obtain the PAM50 Gene List from Published Sources
The PAM50 gene list is often included in supplementary materials of research papers or available through specific resources.
Option 1: Extract from the Original Paper
The PAM50 gene list is included in the original paper by Parker et al. (2009):
Paper: Supervised Risk Predictor of Breast Cancer Based on Intrinsic Subtypes
Figure 1: The gene list is provided in Figure 1 of the paper.
Option 2: Use the UNC Lineberger Comprehensive Cancer Center Website
The PAM50 gene list, centroids, and R code are available on the UNC Lineberger website:
Website: UNC Lineberger PAM50 Algorithms
Direct Link to Data and Code: PAM50 Data and R Code
2. Download the PAM50 Gene List
If the above links are not accessible, you can manually extract the gene list from the provided sources or use the following list:
PAM50 Gene List:
ACTR3B, ANLN, BAG1, BCL2, BIRC5, BLVRA, CCNB1, CCNE1, CDC20, CDC6, CDH3, CENPF, CEP55, CXXC5, EGFR, ERBB2, ESR1, EXO1, FGFR4, FOXA1, FOXC1, GPR160, GRB7, KIF2C, KRT14, KRT17, KRT5, MAPT, MDM2, MELK, MIA, MKI67, MLPH, MMP11, MYBL2, MYC, NAT1, NDC80, NUF2, ORC6L, PGR, PHGDH, PTTG1, RRM2, SFRP1, SLC39A6, TMEM45B, TYMS, UBE2C, UBE2T
Save as a Text File:
You can save the gene list as a text file for easy access.
Unix Command:
echo "ACTR3B ANLN BAG1 BCL2 BIRC5 BLVRA CCNB1 CCNE1 CDC20 CDC6 CDH3 CENPF CEP55 CXXC5 EGFR ERBB2 ESR1 EXO1 FGFR4 FOXA1 FOXC1 GPR160 GRB7 KIF2C KRT14 KRT17 KRT5 MAPT MDM2 MELK MIA MKI67 MLPH MMP11 MYBL2 MYC NAT1 NDC80 NUF2 ORC6L PGR PHGDH PTTG1 RRM2 SFRP1 SLC39A6 TMEM45B TYMS UBE2C UBE2T" > pam50_genes.txt
3. Use the genefu
R Package
The genefu
package in Bioconductor provides the PAM50 gene list and tools for breast cancer subtype classification.
Install and Load the genefu
Package:
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("genefu") library(genefu)
Access the PAM50 Gene List:
data(pam50) pam50_genes <- rownames(pam50$centroids) print(pam50_genes)
Save the Gene List to a File:
write.table(pam50_genes, file = "pam50_genes.txt", row.names = FALSE, col.names = FALSE, quote = FALSE)
4. Use the PAM50 Gene List in Your Analysis
Once you have the gene list, you can use it for various analyses, such as gene expression profiling or subtype classification.
Example: Filtering a Gene Expression Matrix in R
# Load your gene expression data (e.g., a matrix with genes as rows and samples as columns) expression_data <- read.csv("expression_data.csv", row.names = 1) # Filter for PAM50 genes pam50_expression <- expression_data[rownames(expression_data) %in% pam50_genes, ] # Save the filtered data write.csv(pam50_expression, file = "pam50_expression_data.csv")
5. Additional Resources
Bioconductor
genefu
Package Documentation: genefu ManualPAM50 Algorithm Details: PAM50 Algorithm
Breast Cancer Subtyping Paper: Breast Cancer Molecular Profiling
By following these steps, you can easily obtain and use the PAM50 gene set for your research. Whether you extract it manually, download it from a repository, or use an R package, the PAM50 gene list is a valuable resource for breast cancer subtype classification.