Understanding Samtools View Output
January 3, 2025This step-by-step guide will help you comprehend the output of samtools view
. The guide covers essential details about the fields in the output, provides example scripts for processing, and mentions recent tools and software for interpreting the data.
Step 1: Command Overview
The samtools view
command retrieves alignment information from BAM or SAM files for specified regions.
Example:
This command extracts alignments from BAMFILE
in the region spanning 1,000,000 to 2,000,000 on chromosome 2.
Step 2: Output Format Explanation
The output of samtools view
is a tab-delimited format, following the SAM format. Each row represents a single alignment with the following columns:
- QNAME: Query name (e.g., read identifier).
- FLAG: Bitwise flag indicating alignment information (e.g., strand, paired-end, etc.).
- RNAME: Reference sequence name (e.g., chromosome).
- POS: Leftmost 1-based position of the alignment on the reference.
- MAPQ: Mapping quality (Phred scale).
- CIGAR: Encoded representation of alignment (e.g., matches, insertions, deletions).
- MRNM: Mate reference sequence (
=
if the same asRNAME
). - MPOS: 1-based position of the mate.
- ISIZE: Inferred insert size.
- SEQ: Aligned sequence.
- QUAL: ASCII-encoded quality score for each base in the sequence.
Optional fields (tags):
- RG: Read group.
- NM: Edit distance.
- OQ: Original quality.
- E2: Second sequence.
- Additional tags may vary.
Step 3: Script for Parsing SAM/BAM
Using Python (pysam):
Using Unix:
Using R (Rsamtools):
Step 4: Online Tools for SAM/BAM Interpretation
- IGV (Integrative Genomics Viewer)
Visualize BAM/SAM files alongside the genome.
Website: IGV - Galaxy
Offers tools for processing and interpreting BAM/SAM files.
Website: Galaxy Project - SAMStat
Generates summaries and quality control metrics for SAM/BAM files.
Website: SAMStat - BamTools
Comprehensive toolkit for BAM file analysis.
Website: BamTools