Mastering the Basics: Sampling and Sequencing Techniques in Metagenomics

September 14, 2023 Off By admin

Table of Contents

Sampling and Sequencing

Sample Collection:

Initiating a study in metagenomics involves first collecting a sample from the environment. It’s crucial that the DNA harvested reflects the diversity of all cellular life in the sample. Moreover, enough high-quality DNA must be gathered for later stages such as library building and DNA sequencing.

When dealing with a community that’s part of a host organism, different techniques like fractionation or targeted cell lysis could be useful to minimize the inclusion of host DNA. It may also be beneficial to physically segregate and isolate cells from the sample to either maximize the yield of DNA or to prevent the extraction of substances like humic acids, which could hamper later stages. Some samples may offer only minimal amounts of DNA, and for most sequencing methods, it’s essential to have DNA in the higher nanogram to microgram range. DNA amplification might be necessary, but it comes with its own set of challenges like reagent contamination, formation of chimeric sequences, and potential sequencing bias.

Environmental Shotgun Sequencing (ESS). A. Sampling from habitat, (B) filtering particles, typically by size (C) DNA extraction and lysis (D) cloning and library (E) Sequence the clones (F) Sequence Assembly

As for metadata—essentially the ‘information about the information’—it’s important to keep detailed records about the sample’s geographical and environmental context, along with the methods used for sample collection. Having standardized and thorough metadata is vital, as it opens up opportunities for meaningful biological insights through statistical analyses, linking the metagenomic data with environmental variables.

Both sequencing methods have their advantages and disadvantages, as shown below:

Method	Advantage	Disadvantage
Direct sequencing	Sequencing can be focused on any taxon of interest, regardless of prevalence in community Can determine linkage between large genome regions with confidence (within single individual) Good for microbial communities with high diversity	Do not reconstruct entire genome Cannot identify novel types Sequence data focused on single group, not entire community
Random shotgun sequencing	All genomes in the sample are sequenced Can identify novel types Can assemble full genomes of dominant types Good for communities with low diversity Good for communities with few dominant species	Only dominant genomes are well-represented Linkage between genome regions (contigs) inferred only Automated assembly of genome is problematic, requires manual checking for some assemblies

DNA Sequencing Approaches:

In the last decade, the landscape of metagenomic sequencing has evolved, moving from traditional Sanger sequencing to next-generation sequencing technologies (NGS). There are two main approaches for sequencing DNA from microbiome samples. One is directed sequencing, which focuses on either particular functions or phylogenetic markers like the 16S rDNA. The DNA surrounding these markers is then sequenced from large DNA fragments. The other is shotgun sequencing, which takes a more unbiased route, offering a comprehensive overview of the genes and metabolic functionalities present in a microbiome.

The quality of NGS outputs, measured in terms of read length, error rate, and coverage, impacts our ability to explore the gene composition of naturally occurring microbial communities. Technological advancements in sequencing will undoubtedly continue to shape metagenomics, allowing us to probe increasingly intricate ecosystems.

The following table shows a comparison between the yield, fragment length, and run times of the different sequencers.

Preprocessing of Sequence Reads:

The preprocessing stage involves several steps:
– Base calling transforms raw data from sequencing machines into identified DNA bases. Common tools for this include phred, Paracel’s TraceTuner, and ABI’s KB.
– Vector screening aims to remove sequences originating from the cloning vector. Tools like cross_match, LUCK, and vector_clip are used for this purpose.
– Quality trimming eliminates bases of low quality.
– Contaminant screening identifies and removes known sequence contaminants.

Mistakes in any of these preprocessing steps can result in more severe downstream impacts in metagenomic studies compared to individual genome studies.