Visualization techniques for biological data

Step-by-step guide for BAM/SAM to FASTA conversion

December 27, 2024 Off By admin
Shares

Here is a step-by-step guide for BAM/SAM to FASTA conversion, tailored to beginners and incorporating the latest tools and techniques:


Introduction

Converting BAM/SAM files to FASTA format can be achieved efficiently using modern bioinformatics tools such as samtools and seqtk, among others. This manual outlines the methods step-by-step and includes the use of Unix commands and optional Perl scripting for advanced customization.


Prerequisites

  1. Tools Required:
    • samtools (Version ≥ 1.3 recommended)
    • seqtk (if additional processing is needed)
    • Optional: Perl interpreter for custom scripts
  2. Installation:
    • Install samtools:
      bash
      sudo apt-get install samtools
    • Install seqtk:
      bash
      git clone https://github.com/lh3/seqtk.git
      cd seqtk
      make
      sudo cp seqtk /usr/local/bin/
  3. Input Files:
    • A BAM/SAM file (input.bam or input.sam)
    • Reference FASTA file (if required for strand-specific processing)

Method 1: Using samtools fasta Command

Steps:

  1. Convert BAM to FASTA:
    bash
    samtools fasta input.bam > output.fasta
    • This method extracts the sequence information directly.
  2. For strand-specific data, use the -F or -f flags to filter specific flags. For example:
    bash
    samtools view -F 16 input.bam | samtools fasta - > output.fasta
    • Here, -F 16 excludes reverse-strand reads.

Method 2: Using samtools and seqtk

Steps:

  1. Convert BAM to FASTQ using samtools bam2fq:
    bash
    samtools bam2fq input.bam | seqtk seq -A > output.fasta
    • The -A flag in seqtk converts FASTQ to FASTA.
  2. For strand-specific processing:
    bash
    samtools view -F 16 input.bam | samtools bam2fq - | seqtk seq -A > output.fasta

Method 3: Using BBMap’s reformat.sh

Steps:

  1. Download and install BBMap:
    bash
    wget https://sourceforge.net/projects/bbmap/files/latest/download
    unzip BBMap*.zip
    cd bbmap
  2. Convert BAM to FASTA:
    bash
    ./reformat.sh in=input.bam out=output.fasta

Method 4: Using Perl Script for Custom BAM/SAM to FASTA Conversion

Steps:

  1. Create a Perl script (bam_to_fasta.pl):
    perl
    #!/usr/bin/perl
    use strict;
    use warnings;

    while (<STDIN>) {
    chomp;
    my @fields = split(/\t/, $_);
    if ($fields[0] !~ /^@/) { # Skip header lines
    my ($read_name, $sequence) = ($fields[0], $fields[9]);
    print ">$read_name\n$sequence\n";
    }
    }

  2. Save the script and make it executable:
    bash
    chmod +x bam_to_fasta.pl
  3. Run the script:
    bash
    samtools view input.bam | ./bam_to_fasta.pl > output.fasta

Method 5: Using BEDOPS Toolkit

Steps:

  1. Install BEDOPS:
    bash
    sudo apt-get install bedops
  2. Convert BAM to BED and then to FASTA:
    bash
    bam2bed < input.bam | bed2faidxsta.pl > output.fasta

Tips for Strand-Specific Sequencing

  • When dealing with strand-specific sequencing data, ensure the correct strand is chosen:
    • -F 16 excludes reverse-strand reads.
    • -f 16 includes only reverse-strand reads.

Best Practices

  1. Always index the BAM file before conversion:
    bash
    samtools index input.bam
  2. Use compressed output for large datasets:
    bash
    samtools fasta input.bam | gzip > output.fasta.gz

Troubleshooting

  1. samtools version mismatch:
    • Update to the latest version of samtools:
      bash
      conda install -c bioconda samtools
  2. Missing seqtk:
    • Install seqtk from its GitHub repository as shown above.
  3. Custom FASTA header:
    • Modify the Perl script to include additional information in the FASTA header if needed.

This manual ensures beginner-friendly, step-by-step instructions for converting BAM/SAM files to FASTA format. Select the method that best fits your data and computational requirements.

Shares