Step-by-Step Guide: Convert Multiline FASTA to Single-Line FASTA
December 27, 2024Introduction
FASTQ and FASTA are standard formats in bioinformatics. Sometimes, it’s necessary to convert multiline FASTA sequences to a single-line format to meet specific software requirements or simplify manual inspection. Below, you’ll find a beginner-friendly manual with step-by-step instructions and scripts using UNIX commands and Perl for this conversion.
Prerequisites
- Basic understanding of the FASTA format:
- A header line starting with
>
followed by sequence description. - Sequence lines spanning multiple lines.
- A header line starting with
- Access to a UNIX/Linux shell or Windows Subsystem for Linux (WSL). Alternatively, Perl must be installed on your system.
- A sample FASTA file, e.g.,
input.fasta
.
Step-by-Step Instructions
Option 1: Using awk
(Linux/UNIX Shell)
- Open a terminal and navigate to the directory containing the FASTA file.
- Run the following
awk
command:- Explanation:
^>
: Identifies lines starting with>
(headers).if (NR > 1) printf("\n");
: Adds a newline before each new header (except the first).printf("%s", $0);
: Concatenates sequence lines without a newline.
- Output: A single-line FASTA file named
output.fasta
.
- Explanation:
- Verify the output:
Option 2: Using perl
(One-Liner Script)
- Open a terminal and navigate to the directory containing the FASTA file.
- Run the following Perl command:
- Explanation:
$. > 1
: Ensures the newline is added only after the first header./^>/
: Detects header lines.chomp
: Removes newline characters from sequence lines.
- Explanation:
- Verify the output:
Option 3: Using a Dedicated Perl Script
- Create a new Perl script file, e.g.,
linearize_fasta.pl
. - Add the following script content:
- Save the script and make it executable:
- Run the script on your input file:
Option 4: Using sed
For quick inline edits using sed
:
- Explanation:
- Combines all lines until a header (
>
) is encountered.
- Combines all lines until a header (
Optional: Quality Check
After converting the FASTA file:
- Check that each header is followed by a single line of sequence:
- Ensures uniform line lengths.
- Count headers:
Windows Users
- Use WSL to access a UNIX shell and follow the above steps.
- Alternatively, install Perl for Windows via Strawberry Perl here.
Conclusion
Converting a multiline FASTA file into a single-line FASTA is simple and can be accomplished using awk
, perl
, or sed
scripts. These approaches are efficient, flexible, and suitable for various operating systems.