Navigating the Complex Landscape of Metagenomics Assembly: A Guide to Strategies and Tools
September 14, 2023Table of Contents
Metagenomics Assembly – Introduction
Metagenomics is a fascinating subfield of genomics that enables us to explore the genetic material from complex microbial communities found in environments such as soil, water, and even the human gut. A crucial step in unraveling the mysteries of these diverse communities lies in the assembly process, where numerous sequencing reads are pieced together to form extended DNA segments known as contigs. In this blog post, we’ll dive into the nuances of metagenomic assembly, its different approaches, and the tools commonly employed for this critical task.
What is Metagenomic Assembly?
In essence, metagenomic assembly is all about joining short DNA sequence reads to form longer stretches, known as contigs. The combined sequence, or consensus sequence, for each contig can be derived either from the highest-quality nucleotide at each position across all reads or through a majority-rule principle.
The Two Pathways: Reference-Based and De Novo Assembly
When it comes to assembling sequences in metagenomics, you typically have two routes: reference-based assembly, often termed co-assembly, and de novo assembly.
1. Reference-Based Assembly (Co-Assembly)
This approach is effective when you’re working with metagenomic samples that contain sequences closely related to known genomes. By leveraging these available genomes as a “reference,” the process becomes more straightforward. However, there are caveats. Significant differences between your sample’s actual genome and the reference genome—like large insertions, deletions, or varying genetic markers—could lead to a fragmented assembly or missing out on unique regions.
2. De Novo Assembly
This approach doesn’t rely on any existing reference genomes. Instead, it employs advanced algorithms and often demands greater computational firepower. The de Bruijn graph-based assemblers are specially tailored for this, capable of handling massive datasets but at the cost of higher machine requirements compared to reference-based assembly.
Tools of the Trade
Various software tools have been developed for sequence assembly, and some are specifically tailored for metagenomic applications. These specialized metagenomic assemblers come equipped with algorithms that can differentiate between species, thereby reducing the formation of chimeric contigs. Moreover, they generally don’t depend on even coverage (also called depth or coverage) to validate the assembled sequences. This is particularly crucial in metagenomics, where species may be present in different abundances.
Year | Tools | Short Descriptions | URL |
---|---|---|---|
2002 | Arachne | Arachne was designed for long Sanger-chemistry reads. | Arachne |
2004 | Celera | Celera Assembler is a de novo whole-genome shotgun (WGS) DNA sequence assembler. It reconstructs long sequences of genomic DNA from fragmentary data produced by whole-genome shotgun sequencing. | Celera Assembler |
2007 | PHRAP | phrap is a program for assembling shotgun DNA sequence data. Among other features, it allows use of the entire read and not just the trimmed high quality part, it uses a combination of user-supplied and internally computed data quality information to improve assembly accuracy in the presence of repeats, it constructs the contig sequence as a mosaic of the highest quality read segments rather than a consensus, it provides extensive assembly information to assist in trouble-shooting assembly problems, and it handles large datasets. | PHRAP |
2008 | Velvet | Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454. | Velvet |
2010 | SOAPdenovo | SOAPdenovo is a novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes. The program is specially designed to assemble Illumina GA short reads. | SOAPdenovo |
2011 | Genovo | Genovo uses a probabilistic model that calculates different coverage values to assemble metagenomes | Genovo |
2011 | Meta-IDBA | Meta-IDBA is an iterative De Bruijn Graph De Novo short read assembler specially designed for de novo metagenomic assembly. | Meta-IDBA |
2011 | Minimo | Minimo is designed to assemble small datasets and has been used for virome analyses | AMOS |
2012 | MetaVelvet | MetaVelvet is an extension of Velvet assembler to de novo metagenome assembly from short sequence reads | MetaVelvet |
2012 | IDBA-UD | IDBA-UD is a iterative De Bruijn Graph De Novo Assembler for Short Reads Sequencing data with Highly Uneven Sequencing Depth. It is an extension of IDBA algorithm. | IDBA-UD |
2012 | MAP | MAP is a de novo metagenomic assembly program for shotgun DNA reads. | MAP |
2012 | MOCAT | MOCAT is a metagenomics assembly and gene prediction toolkit. | MOCAT |
2012 | GeneStitch | GeneStitch is a novel way of using the de Bruijn graph assembly of metagenomes to improve the assembly of genes. | GeneStitch |
2012 | Ray Meta | Ray Meta is a massively distributed metagenome assembler that is coupled with Ray Communities, which profiles microbiomes based on uniquely-colored k-mers. | Ray Meta |
2012 | VICUNA | VICUNA is a de novo assembly program targeting populations with high mutation rates. | VICUNA |
2013 | MetAMOS | MetAMOS is a modular and open source metagenomic assembly and analysis pipeline. | MetAMOS |
2014 | GARM | GARM is (Genome Assembler, Reconcilation and Merging) a new software pipeline to merge and reconcile assemblies from different algorithms or sequencing technologies. | GARM |
2013 | PRICE | PRICE (Paired-Read Iterative Contig Extension) is a de novo genome assembler implemented in C++. | PRICE |
2013 | Xgenovo | Xgenovo generates quality assemblies with paired end reads. | Xgenovo |
na | Newbler | Newbler is a software package for de novo DNA sequence assembly. It is designed specifically for assembling sequence data generated by the 454 GS-series of pyrosequencing platforms sold by 454 Life Sciences, a Roche Diagnostics company. | Newbler |
Wrapping It Up
Metagenomics offers a captivating lens through which we can explore the vastly complex world of microbial communities. The assembly stage is a pivotal step in this journey, shaping the quality and scope of insights that can be derived. With an understanding of the different approaches and tools at your disposal, you’ll be better equipped to delve into the genetic mysteries of diverse microbial landscapes.
The following table shows the information contained in different lengths of genomic DNA.
So, whether you opt for reference-based or de novo assembly will depend on various factors such as available resources, the nature of your sample, and the specific questions you aim to answer. Choose wisely, and happy sequencing!