metagenomics

Navigating the Complex Landscape of Metagenomics Assembly: A Guide to Strategies and Tools

September 14, 2023 Off By admin
Shares

Metagenomics Assembly – Introduction

Metagenomics is a fascinating subfield of genomics that enables us to explore the genetic material from complex microbial communities found in environments such as soil, water, and even the human gut. A crucial step in unraveling the mysteries of these diverse communities lies in the assembly process, where numerous sequencing reads are pieced together to form extended DNA segments known as contigs. In this blog post, we’ll dive into the nuances of metagenomic assembly, its different approaches, and the tools commonly employed for this critical task.

What is Metagenomic Assembly?

In essence, metagenomic assembly is all about joining short DNA sequence reads to form longer stretches, known as contigs. The combined sequence, or consensus sequence, for each contig can be derived either from the highest-quality nucleotide at each position across all reads or through a majority-rule principle.

The Two Pathways: Reference-Based and De Novo Assembly

When it comes to assembling sequences in metagenomics, you typically have two routes: reference-based assembly, often termed co-assembly, and de novo assembly.

1. Reference-Based Assembly (Co-Assembly)

This approach is effective when you’re working with metagenomic samples that contain sequences closely related to known genomes. By leveraging these available genomes as a “reference,” the process becomes more straightforward. However, there are caveats. Significant differences between your sample’s actual genome and the reference genome—like large insertions, deletions, or varying genetic markers—could lead to a fragmented assembly or missing out on unique regions.

2. De Novo Assembly

This approach doesn’t rely on any existing reference genomes. Instead, it employs advanced algorithms and often demands greater computational firepower. The de Bruijn graph-based assemblers are specially tailored for this, capable of handling massive datasets but at the cost of higher machine requirements compared to reference-based assembly.

Tools of the Trade

Various software tools have been developed for sequence assembly, and some are specifically tailored for metagenomic applications. These specialized metagenomic assemblers come equipped with algorithms that can differentiate between species, thereby reducing the formation of chimeric contigs. Moreover, they generally don’t depend on even coverage (also called depth or coverage) to validate the assembled sequences. This is particularly crucial in metagenomics, where species may be present in different abundances.

YearToolsShort DescriptionsURL
2002ArachneArachne was designed for long Sanger-chemistry reads.Arachne
2004CeleraCelera Assembler is a de novo whole-genome shotgun (WGS) DNA sequence assembler. It reconstructs long sequences of genomic DNA from fragmentary data produced by whole-genome shotgun sequencing.Celera Assembler
2007PHRAPphrap is a program for assembling shotgun DNA sequence data. Among other features, it allows use of the entire read and not just the trimmed high quality part, it uses a combination of user-supplied and internally computed data quality information to improve assembly accuracy in the presence of repeats, it constructs the contig sequence as a mosaic of the highest quality read segments rather than a consensus, it provides extensive assembly information to assist in trouble-shooting assembly problems, and it handles large datasets.PHRAP
2008VelvetVelvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454.Velvet
2010SOAPdenovoSOAPdenovo is a novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes. The program is specially designed to assemble Illumina GA short reads.SOAPdenovo
2011GenovoGenovo uses a probabilistic model that calculates different coverage values to assemble metagenomesGenovo
2011Meta-IDBAMeta-IDBA is an iterative De Bruijn Graph De Novo short read assembler specially designed for de novo metagenomic assembly.Meta-IDBA
2011MinimoMinimo is designed to assemble small datasets and has been used for virome analysesAMOS
2012MetaVelvetMetaVelvet is an extension of Velvet assembler to de novo metagenome assembly from short sequence readsMetaVelvet
2012IDBA-UDIDBA-UD is a iterative De Bruijn Graph De Novo Assembler for Short Reads Sequencing data with Highly Uneven Sequencing Depth. It is an extension of IDBA algorithm.IDBA-UD
2012MAPMAP is a de novo metagenomic assembly program for shotgun DNA reads.MAP
2012MOCATMOCAT is a metagenomics assembly and gene prediction toolkit.MOCAT
2012GeneStitchGeneStitch is a novel way of using the de Bruijn graph assembly of metagenomes to improve the assembly of genes.GeneStitch
2012Ray MetaRay Meta is a massively distributed metagenome assembler that is coupled with Ray Communities, which profiles microbiomes based on uniquely-colored k-mers.Ray Meta
2012VICUNAVICUNA is a de novo assembly program targeting populations with high mutation rates.VICUNA
2013MetAMOSMetAMOS is a modular and open source metagenomic assembly and analysis pipeline.MetAMOS
2014GARMGARM is (Genome Assembler, Reconcilation and Merging) a new software pipeline to merge and reconcile assemblies from different algorithms or sequencing technologies.GARM
2013PRICEPRICE (Paired-Read Iterative Contig Extension) is a de novo genome assembler implemented in C++.PRICE
2013XgenovoXgenovo generates quality assemblies with paired end reads.Xgenovo
naNewblerNewbler is a software package for de novo DNA sequence assembly. It is designed specifically for assembling sequence data generated by the 454 GS-series of pyrosequencing platforms sold by 454 Life Sciences, a Roche Diagnostics company.Newbler

Wrapping It Up

Metagenomics offers a captivating lens through which we can explore the vastly complex world of microbial communities. The assembly stage is a pivotal step in this journey, shaping the quality and scope of insights that can be derived. With an understanding of the different approaches and tools at your disposal, you’ll be better equipped to delve into the genetic mysteries of diverse microbial landscapes.

The following table shows the information contained in different lengths of genomic DNA.

A Primer of Metagenomics

 

So, whether you opt for reference-based or de novo assembly will depend on various factors such as available resources, the nature of your sample, and the specific questions you aim to answer. Choose wisely, and happy sequencing!

Shares