
Adding Gene Names to a Volcano Plot from DESeq2
January 10, 2025In this guide, we will walk through the process of adding gene names to a volcano plot generated from DESeq2 results. The volcano plot is a useful visualization tool for differential expression analysis, and adding gene names to the plot can help identify significant genes of interest.
Step 1: Install and Load Required Libraries
First, install and load the necessary R packages. We will use ggplot2 for plotting and ggrepel to avoid overlapping labels.
# Install necessary packages if not already installed
install.packages("ggplot2")
install.packages("ggrepel")
# Load libraries
library(ggplot2)
library(ggrepel)Step 2: Prepare DESeq2 Results
Assume you have already run DESeq2 and obtained the results object res. If not, here is a quick example of how to generate it:
# Example DESeq2 workflow library(DESeq2) # Create a DESeqDataSet object dds <- DESeqDataSetFromMatrix(countData = count_data, colData = col_data, design = ~ condition) # Run DESeq2 dds <- DESeq(dds) # Get results res <- results(dds)
Step 3: Convert DESeq2 Results to a Data Frame
The res object from DESeq2 is not a data frame by default. Convert it to a data frame and add a column for gene names.
# Convert DESeq2 results to a data frame res_df <- as.data.frame(res) # Add a column for gene names res_df$gene <- rownames(res_df)
Step 4: Create a Volcano Plot with Gene Names
Now, create a volcano plot using ggplot2 and add gene names for significantly differentially expressed genes.
# Define significance thresholds
padj_threshold <- 0.05
log2FC_threshold <- 1
# Add a column to indicate significance
res_df$sig <- ifelse(res_df$padj < padj_threshold & abs(res_df$log2FoldChange) > log2FC_threshold, "Significant", "Not Significant")
# Create the volcano plot
volcano_plot <- ggplot(res_df, aes(x = log2FoldChange, y = -log10(padj), color = sig)) +
geom_point(size = 1) +
scale_color_manual(values = c("black", "red")) +
theme_minimal() +
ggtitle("Volcano Plot of Differential Expression") +
xlab("log2 Fold Change") +
ylab("-log10 Adjusted p-value")
# Add gene names for significant genes
volcano_plot <- volcano_plot +
geom_text_repel(
data = subset(res_df, padj < padj_threshold & abs(log2FoldChange) > log2FC_threshold),
aes(label = gene),
box.padding = 0.5,
max.overlaps = Inf,
size = 3
)
# Display the plot
print(volcano_plot)Step 5: Customize the Volcano Plot (Optional)
You can further customize the volcano plot by adjusting the number of labeled genes, changing colors, or modifying the plot theme.
Example: Label Top 20 Genes
# Sort by significance and select top 20 genes
top_genes <- res_df[order(res_df$padj), ][1:20, ]
# Add labels for top 20 genes
volcano_plot <- volcano_plot +
geom_text_repel(
data = top_genes,
aes(label = gene),
box.padding = 0.5,
max.overlaps = Inf,
size = 3,
color = "blue"
)
# Display the plot
print(volcano_plot)Step 6: Save the Volcano Plot
Save the volcano plot to a file for later use or publication.
# Save the plot as a PDF
ggsave("volcano_plot.pdf", plot = volcano_plot, width = 10, height = 8)
# Save the plot as a PNG
ggsave("volcano_plot.png", plot = volcano_plot, width = 10, height = 8, dpi = 300)Tips and Tricks
- Avoid Overlapping Labels: UseÂ
ggrepel::geom_text_repel to prevent overlapping gene labels. - Adjust Significance Thresholds: ModifyÂ
padj_threshold andÂlog2FC_threshold to focus on genes of interest. - Highlight Specific Genes: Manually add labels for specific genes by filtering the data frame.
- Customize Colors: UseÂ
scale_color_manual to customize the colors for significant and non-significant genes. - Interactive Plots: UseÂ
plotly::ggplotly to create an interactive volcano plot.
By following this guide, you can create a volcano plot with gene names from DESeq2 results, making it easier to interpret and share your differential expression analysis.


















