Decoding Metagenomic Data: A Guide to Pathway Reconstruction and Analysis
September 14, 2023Mapping Functions in Metagenomics: An Overview of Analytical Approaches
The figure on the right illustrates the key stages and software tools necessary for reconstructing metabolic pathways from metagenomic studies. Each number in a circle correlates to specialized software or programs, listed on the right side, for that specific stage. Curved brackets indicate the use of specialized databases for particular applications.
Analysis usually diverges at the initial stage, based on the research strategy employed. You can opt to enhance specific marker genes, like ribosomal RNA, using PCR-based amplification and then employ Roche 454 sequencing. Alternatively, you can break down the DNA and prepare it for Illumina/SOLiD metagenomic sequencing. Aligning short sequence reads with existing databases, like the Ribosomal Database Project for taxonomic classification using 16S sequencing (Step 1) or the more general NCBI non-redundant databases for studies of the environment or the gut microbiome (Step 2), is a simple way to analyze data.
Another route involves using next-gen assemblers to stitch these short reads into longer contigs (Step 3). This step makes gene identification software more effective, as it benefits from a richer dataset for more reliable gene pinpointing (Step 4). Once you have identified the coding sequences, you can match them against functional databases that contain information in the form of Hidden Markov Models or Position-Specific Scoring Matrices, gathered from multiple sequence alignments (Step 5).
The first methodology provides immediate function identification that aids in detecting and evaluating metabolic pathways (Step 6). It then leverages statistical methods for sample profiling (Step 7). The second strategy offers insights into both taxonomic and functional profiles (Step 8), directly informing metabolic pathway determination (Step 9). This data can then be translated into stoichiometric models (Step 10), which are invaluable for predicting how individual microbes or entire communities may respond to environmental shifts.
Comparative analyses between metagenomes (comparative metagenomics) can provide additional insight into the function of complex microbial communities and their role in host health. Here is a list of commonly used pipelines for the functional annotation and comparison of metagenomic data sets.
Year | Tools | Short Descriptions | URL |
---|---|---|---|
2007 | CAMERA | The aim of this project is to serve the needs of the microbial ecology research community, and other scientists using metagenomics data, by creating a rich, distinctive data repository and a bioinformatics tools resource that will address many of the unique challenges of metagenomic analysis. | CAMERA |
2011 | CoMet | A web-server for fast comparative functional profiling of metagenomes. | CoMet |
2012 | HUMAnN | A pipeline for efficiently and accurately determining the presence/absence and abundance of microbial pathways in a community from metagenomic data. | HUMAnN |
2014 | IMG/M | Provides support for comparative analysis of microbial community aggregate genomes (metagenomes) in the context of a comprehensive set of reference genomes from all three domains of life, as well as plasmids, viruses and genome fragments. | IMG/M |
2014 | InterProScan | A tool that combines different protein signature recognition methods into one resource. | InterProScan |
2013 | MEGAN | Software for analyzing metagenomes. | MEGAN |
2011 | MetaPath | It can identify differentially abundant metabolic pathways in metagenomic datasets, relying on a combination of metagenomic sequence data and prior metabolic pathway knowledge (from KEGG). | MetaPath |
2010 | METAREP | An open source tool for high-performance comparative metagenomics. | METAREP |
2010 | MG-RAST | An automated analysis platform for metagenomes providing quantitative insights into microbial populations based on sequence data. | MG-RAST |
2009 | RAMMCAP | Analysis and comparison of very large metagenomes with fast clustering and functional annotation. | RAMMCAP |
2009 | ShotgunFunctionalizeR | An R-package for functional comparison of metagenomes. | ShotgunFunctionalizeR |
2010 | SmashCommunity | A stand-alone metagenomic annotation and analysis pipeline suitable for data from Sanger and 454 sequencing technologies. | SmashCommunity |
2010 | STAMP | A software package for analyzing metagenomic profiles that promotes ‘best practices’ in choosing appropriate statistical techniques and reporting results. | STAMP |