History and Definition of Metagenomics – A Primer
September 13, 2023Metagenomics – History & Definition
Historical Background
Microorganisms play a fundamental role in all life forms on Earth, thriving in nearly every natural habitat, including extreme environments like polar regions, deserts, geysers, rocks, and the deep sea. They hold immense significance in human health, ecology, and various environmental aspects:
Soil Health: Microbial communities are essential for nutrient recycling in soil, which is of great interest to farmers seeking to understand the soil’s composition.
Water Pollution: Microorganisms respond to the content of water, making them crucial indicators for assessing water quality.
Human Health and Nutrition: Microbial communities in the gut, nose/throat, skin, and vagina can serve as indicators of infection, health risks, and even potential biomarkers for diseases like cancer.
Paleobiology and Paleogenomics: DNA extracted from frozen mammoths and ancient human remains, like the iceman, have unveiled insights into their diets and genetic ancestry.
Forensic Science: Microbial analysis has found applications in forensic investigations.
In the last few decades, the perspective of microbiologists has shifted from studying cultured microorganisms to investigating uncultured ones (as about 99% of microbes are not easily cultured). In 1985, Norman R. Pace and colleagues introduced a groundbreaking method, utilizing the analysis of 5S and 16S rRNA gene sequences directly from environmental samples, transforming our understanding of microbial diversity. This marked the first step towards isolating and cloning bulk DNA from environmental samples.
This shift gave birth to the concept of “metagenomics.” Initially, the term “metagenomics” was coined by the Jo Handelsman group in 1988 to describe function-based analysis of mixed environmental DNA species. However, the definition evolved, particularly after two pivotal studies by Tyson and Venter in 2004, which applied random whole genome shotgun sequencing to microbial populations. These studies laid the groundwork for future metagenomic projects, and “metagenomics” became the widely accepted term.
Simultaneously, the development of next-generation sequencing technologies further enabled metagenomics, promising higher throughput and reduced sequencing costs, propelling the field forward.
Defining Metagenomics
Metagenomics is fundamentally distinct from genomics, as it involves analyzing genomic DNA from an entire community, rather than an individual organism or cell. Recently, metagenomics has been defined as “the application of modern genomics techniques to the study of communities of microbial organisms directly in their natural environments, bypassing the need for isolation and lab cultivation of individual species.”
Metagenomic data analysis aims to answer questions related to:
– The diversity and abundance of community members (“who is there”).
– The metabolic potential of the community and its members (“what they are doing”).
– The ecological relationships among community members (“why they are there”).
Approaches to Metagenomic Analysis
Metagenomics can be categorized into two primary research areas: environmental single-gene surveys and random shotgun studies of all environmental genes.
In single-gene surveys, specific targets are amplified using polymerase chain reaction (PCR) and then sequenced, providing insights into the range of different genes within a given community.
Random shotgun metagenomics involves isolating total DNA from a sample and sequencing it, offering a comprehensive profile of all genes within the community. Genes are annotated and linked to the environment, facilitating the identification of proteins synthesized by the metagenome.
Key steps in a metagenomic study include:
-Sampling: Collecting representative samples and recording metadata.
– Sequencing: Utilizing shotgun sequencing to reveal the genes present in environmental samples.
– Sequence Read Preprocessing: Preparing sequence reads for subsequent analysis.
– Assembly: Assembling reads into contiguous sequences or contigs, ultimately forming whole genomes.
– Gene Prediction: Identifying genes, including operons and functional networks, from fragmented metagenomic data.
– Binning: Associating sequence data with specific Operational Taxonomic Units (OTUs) to understand their functions.
– Functional Annotation: Assessing the functional potential of microbial communities derived from metagenomic data.
Metagenomics holds promise in solving practical challenges across diverse fields, including medicine, engineering, agriculture, sustainability, and ecology.
Processing metagenomic datasets, particularly those from complex microbiomes, presents challenges due to higher error rates, such as chimeric contigs, under-assembly, and limitations in binning. Manual inspection and result validation are crucial aspects of metagenomic data analysis to ensure accuracy.