FPKM vs Raw Counts vs RPKM: Step-by-Step Guide

January 3, 2025 Off By admin

This guide will clarify the differences between FPKM, raw counts, and RPKM in RNA-seq analysis, explaining when and how to use each. Additionally, it provides computational instructions and references tools for analysis.

Table of Contents

1. Definitions and Differences

Raw Counts: The number of reads directly mapped to a gene/transcript. Used as input for tools like DESeq2 and edgeR for differential expression analysis.
RPKM (Reads Per Kilobase of transcript per Million mapped reads): Normalizes raw counts by gene length and sequencing depth.
FPKM (Fragments Per Kilobase of transcript per Million mapped reads): Similar to RPKM but designed for paired-end RNA-seq data (uses fragments instead of reads).

Key Considerations:

Use raw counts for statistical models like DESeq2, which handle normalization internally.
Use FPKM/RPKM for within-sample comparison of gene expression levels but not for cross-sample differential expression.

2. Calculations

FPKM Calculation (Python Script):

RPKM Calculation (R Script):

3. Tools and Pipelines

Raw Counts:
- HTSeq-count: Generates raw counts from aligned BAM files.
- Command:
  bash
  htseq-count -f bam -r pos -s no input.bam genes.gtf > counts.txt
FPKM/RPKM Calculation:
- StringTie: Provides both FPKM and TPM normalization.
- Command:
  bash
  stringtie input.bam -G annotation.gtf -o output.gtf -A gene_abundances.txt
DESeq2/edgeR: Analyzes raw counts for differential expression.

4. Pros and Cons

Metric	Pros	Cons
Raw Counts	Input for robust statistical methods.	Requires normalization for interpretation.
RPKM/FPKM	Useful for within-sample expression ranking.	Not ideal for cross-sample comparisons.

5. Recent Tools and Resources

Salmon/Kallisto: Fast pseudo-alignment and quantification of RNA-seq data.
Bioconductor Workflow: Comprehensive guide for RNA-seq analysis (link).
HAROLD Blog: Insights into RNA-seq expression units (link).

By understanding the differences between FPKM, raw counts, and RPKM, and following these steps, you’ll be equipped to choose the appropriate metric for your RNA-seq analysis.