Step-by-Step Guide: Customizing BLAST Output

December 27, 2024 Off By admin

Table of Contents

1. Understand BLAST Output Options

BLAST’s -outfmt parameter allows you to control the format and content of the output file. The default formats range from pairwise alignments to tabular outputs, XML, and custom configurations.

Key Points:

Default tabular format (-outfmt 6) includes basic fields like query ID, subject ID, % identity, etc.
Custom tabular output can include additional fields by specifying space-delimited format specifiers (e.g., qseq, sseq, etc.).

2. Basic BLAST Command

A typical blastn command looks like this:

Here:

-db BLASTDB: Specifies the database.
-query input.fa: Specifies the input query file.
-out output_file: Specifies the output file name.
-word_size 7: Sets the word size for alignment.
-perc_identity 100: Filters alignments with 100% identity.
-outfmt 6: Produces tabular output.
-max_target_seqs 2: Limits the number of hits to 2 per query.

3. Adding the Target Sequence to the Output

To include the actual aligned subject sequence (sseq), you need to modify the -outfmt parameter.

Updated Command:

Explanation of Added Fields:

sseq: Adds the subject (target) sequence.
qseqid: Query sequence ID.
sseqid: Subject sequence ID.
pident: Percentage of identical matches.
length: Alignment length.
mismatch: Number of mismatches.
gapopen: Number of gap openings.
qstart/qend: Start and end positions in the query.
sstart/send: Start and end positions in the subject.
evalue: Expect value.
bitscore: Bit score.

4. Extracting Specific Information

If you only want specific information from the output, use grep, awk, or cut commands in UNIX.

Example: Extract Query ID and Target Sequence

Here:

$1 refers to qseqid (Query ID).
$13 refers to sseq (Subject Sequence) based on the -outfmt order.

5. Using Perl for Further Customization

If additional post-processing is needed, Perl scripts can help parse and manipulate the BLAST output.

Example: Extract Query ID, Subject ID, and Target Sequence

Save this script as extract_sequences.pl and run it:

6. Verifying the Results

After running the command or script, inspect the output to ensure the desired columns are present.

Quick Inspection Commands:

7. Automating the Workflow

To streamline the process, create a shell script for the BLAST execution and formatting.

Example: Shell Script (run_blast.sh)

Make the script executable and run it:

8. Troubleshooting Tips

Ensure BLAST+ is correctly installed and accessible in your PATH.
Check the BLAST database with blastdbcmd if there are issues with sseq.
Validate your input FASTA file for proper formatting.

By following this guide, you can efficiently customize BLAST output to include specific fields such as the aligned target sequence. Using UNIX tools and scripting enhances reproducibility and automation, making your analysis more robust and scalable.