Step-by-Step Guide: Customizing BLAST Output
December 27, 20241. Understand BLAST Output Options
BLAST’s -outfmt
parameter allows you to control the format and content of the output file. The default formats range from pairwise alignments to tabular outputs, XML, and custom configurations.
Key Points:
- Default tabular format (
-outfmt 6
) includes basic fields like query ID, subject ID, % identity, etc. - Custom tabular output can include additional fields by specifying space-delimited format specifiers (e.g.,
qseq
,sseq
, etc.).
2. Basic BLAST Command
A typical blastn
command looks like this:
Here:
-db BLASTDB
: Specifies the database.-query input.fa
: Specifies the input query file.-out output_file
: Specifies the output file name.-word_size 7
: Sets the word size for alignment.-perc_identity 100
: Filters alignments with 100% identity.-outfmt 6
: Produces tabular output.-max_target_seqs 2
: Limits the number of hits to 2 per query.
3. Adding the Target Sequence to the Output
To include the actual aligned subject sequence (sseq
), you need to modify the -outfmt
parameter.
Updated Command:
Explanation of Added Fields:
sseq
: Adds the subject (target) sequence.qseqid
: Query sequence ID.sseqid
: Subject sequence ID.pident
: Percentage of identical matches.length
: Alignment length.mismatch
: Number of mismatches.gapopen
: Number of gap openings.qstart
/qend
: Start and end positions in the query.sstart
/send
: Start and end positions in the subject.evalue
: Expect value.bitscore
: Bit score.
4. Extracting Specific Information
If you only want specific information from the output, use grep
, awk
, or cut
commands in UNIX.
Example: Extract Query ID and Target Sequence
Here:
$1
refers toqseqid
(Query ID).$13
refers tosseq
(Subject Sequence) based on the-outfmt
order.
5. Using Perl for Further Customization
If additional post-processing is needed, Perl scripts can help parse and manipulate the BLAST output.
Example: Extract Query ID, Subject ID, and Target Sequence
Save this script as extract_sequences.pl
and run it:
6. Verifying the Results
After running the command or script, inspect the output to ensure the desired columns are present.
Quick Inspection Commands:
7. Automating the Workflow
To streamline the process, create a shell script for the BLAST execution and formatting.
Example: Shell Script (run_blast.sh
)
Make the script executable and run it:
8. Troubleshooting Tips
- Ensure BLAST+ is correctly installed and accessible in your
PATH
. - Check the BLAST database with
blastdbcmd
if there are issues withsseq
. - Validate your input FASTA file for proper formatting.
By following this guide, you can efficiently customize BLAST output to include specific fields such as the aligned target sequence. Using UNIX tools and scripting enhances reproducibility and automation, making your analysis more robust and scalable.