metagenomics

Binning Methods in Metagenomics

September 14, 2023 Off By admin
Shares

Taxonomy-Dependent and Independent Binning Methods in Metagenomics

Organizing Metagenomic Data: Classification Techniques

A typical workflow in metagenomics generates an array of sequencing reads, contigs, and identified genes. Linking these elements to the original organisms they came from is crucial for understanding the ecological makeup. This task of associating sequences with originating organisms or broader taxonomic groups is commonly referred to as binning or classification.

In the context of shotgun sequencing, data analysis aims to detail both the taxonomic and functional diversity within a specific environment by evaluating DNA fragments from the native microbial population. Binning methods fall into two broad categories: those dependent on known taxonomies and those that aren’t.

Taxonomy-Dependent Methods:

Most existing techniques for sorting shotgun-sequencing data rely on known taxonomies. These approaches gauge ‘similarity’ between sequence reads and either existing sequences in databases or models created from those databases. Depending on the strategy for assessing this similarity, taxonomy-dependent techniques can be further divided into alignment-based, composition-based, and hybrid models. In our research, we introduced MBMC (Metagenomic Binning by Markov Chains), a unique taxonomy-dependent method that doesn’t require alignment.

Taxonomy-Independent Methods:

These techniques cluster sequence reads based on inherent similarities, without the need for database comparisons.

Unsupervised models in this category usually sort reads based on a few key observations:
– The frequency of k-mers within a genome’s reads often directly correlates to the abundance of that genome.
– Distinctively long w-mers are generally unique to each genome.
– The distribution of short q-mers in individual, sufficiently long reads from the same or similar genomes tends to be comparable.

In this regard, we’ve developed a taxonomy-independent technique known as MBBC (Metagenomic Binning Based on Composition).

 

The following table shows a list of commonly used tools for metagenomic binning.

CategoriyYearToolsShort DescriptionsURL
Taxonomy-dependent methods2012AmphoraNetThe webserver implementation of the AMPHORA2 pipeline for metagenomic analysis of shotgun sequencing data.AmphoraNet
2008CARMAA software pipeline for characterizing the taxonomic composition and genetic diversity of short-read metagenomes.CARMA
2011ClaMSA sequence composition-based classifier for metagenomic sequencesClaMS
2010DiScRIBinATEDistance Score Ratio for Improved Binning and Taxonomic Estimation.DiScRIBinATE
2012GenometaA Java based local bioinformatics program which allows rapid analysis of metagenomic short read datasets.Genometa
2014KRAKENA system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.KRAKEN
2013LMATDesigned to efficiently assign taxonomic labels to as many reads as possible in very large metagenomic datasets and report the taxonomic profile of the input sample.LMAT
2010MARTAThis java-based software blasts each sequence that you provide it, and then looks for a consensus taxon among the top-hits returned from blast.MARTA
2012MetaPhlAnA computational tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data. MetaPhlAn relies on unique clade-specific marker genes identified from 3,000 reference genomes.MetaPhlAn
2011MetaPhylerA taxonomic classifier for metagenomic shotgun reads, which uses phylogenetic marker genes as a taxonomic reference.MetaPhyler
2010MG-RASTAn automated analysis platform for metagenomes providing quantitative insights into microbial populations based on sequence data.MG-RAST
2010MLTreeMapAnalyzes DNA sequences and determines their most likely phylogenetic origin.MLTreeMap
2014MyTaxaA homology-based bioinformatics framework to classify metagenomic and genomic sequences.MyTaxa
2012NBCThe Naïve Bayes Classification tool webserver for taxonomic classification of metagenomic reads.NBC
2007PhyloPythiaAccurate phylogenetic classification of variable-length DNA fragments.PhyloPythia
2012PhyloPythiaSThe Web Server for Taxonomic Assignment of Metagenome Sequences.PhyloPythiaS
2009Phymm/PhymmBLPhylogenetic Classification of Metagenomic Data with Interpolated Markov Models.Phymm/PhymmBL
2010PplacerPlaces query sequences on a fixed reference phylogenetic tree to maximize phylogenetic likelihood or posterior probability according to a reference alignment.Pplacer
2011ProViDEA novel similarity based binning algorithm that uses a customized set of alignment parameter thresholds/ranges, specifically suited for the accurate taxonomic labelling of viral metagenomic sequences.ProViDE
2011RAIphyA semi-supervised metagenomic fragment classification program.RAIphy
2012SequedexA signature-based method to classify the function and phylogeny of reads as short as 30 bp.Sequedex
2009SOrt-ITEMSSequence orthology based approach for improved taxonomic estimation of metagenomic sequences.SOrt-ITEMS
2011SPHINXA hybrid binning approach that achieves high binning efficiency by utilizing both ‘compositional’ and ‘similarity’ features of the query sequence during the binning process.SPHINX
2009TACOASoftware that can accurately predict the taxonomic origin of genomic fragments from metagenomic data sets by combining the advantages of the k -NN approach with a smoothing kernel function.TACOA
2011TaxSOMA tool for taxonomic classification of DNA fragments, as they are typically obtained in metagenome projects.TaxSOM
2010TreephylerA tool for fast taxonomic profiling of metagenomes.Treephyler
2009WebCARMATaxonomic classification of metagenomic shotgun sequences.WebCARMA
2013MEGAN5Interactively analyze and compare metagenomic and metatranscriptomic data, both taxonomically and functionallyMEGAN5
2011ProViDEA software tool for accurate estimation of viral diversity in metagenomic samplesProViDE
2011PaPaRaParsimony-based Phylogeny-Aware Read alignment programPaPaRa
2014MetaCluster-TAA software for binning and annotating short paired-end reads.MetaCluster-TA
Taxonomy-independent methods2011AbundanceBinAn abundance-based tool for binning metagenomic sequences, such that the reads classified in a bin belong to species of identical or very similar abundances.AbundanceBin
2008CompostBinA DNA-composition-based binning algorithm for classifying metagenomic reads.CompostBin
2012MetaCluster 5.0MetaCluster5.0 is an unsupervised binning method.MetaCluster 5.0
2004TETRAThe standalone-programs can be used to calculate, how well tetranucleotide usage patterns in DNA sequences correlate.TETRA

There is no standard for the taxonomic classification of metagenome sequences. Also, taxonomic sequence classification can be error prone, in particular for habitats with a complex diversity or high proportions of as yet barely characterized taxa.

Shares