metagenomics

Decoding the Metagenome: Strategies for Accurate Gene Prediction and Identification

September 14, 2023 Off By admin
Shares

Metagenome – Gene Prediction

Identifying Gene Components

The basic building blocks within a genome are genes, which can be part of bigger functional systems like operons or networks. The task of pinpointing these genetic elements within a DNA sample is often referred to as gene prediction. The effectiveness of this task can hinge on several variables, including the quality of DNA sequences that have been assembled, raw sequencing reads, or a blend of both.

Typically, there are two primary strategies used to identify genes: one that’s ‘evidence-based’ and another known as ‘ab initio.’
– The ‘evidence-based’ tactic involves comparing DNA sequences to previously identified genes to find matches.
– On the other hand, ‘ab initio’ methods focus on the inherent features of the DNA string itself to separate coding from non-coding regions, which allows the discovery of previously unidentified genes. Various computational techniques are used in this approach, many of which are rooted in statistical learning models, including types of Markov chains.

For refining the accuracy of the gene identification, some software solutions utilize pre-defined sets of known genes from similar organisms as a training guide. Alternatively, other tools are capable of self-training, using the target DNA sequence for this purpose.

The following table shows a list of commonly used tools for gene prediction.

YearToolsShort DescriptionsURL
2007FGENESHFGENESH is an application for finding (fragmented) genes in short reads.FGENESH
2010FragGeneScanFragGeneScan is a HMM-based gene structure prediction (multiple genes, both chains) tool.FragGeneScan
2005GeneMarkGeneMark is a family of gene prediction programs developed at Georgia Institute of Technology. .GeneMark
2009GENSCANGENSCAN can predict the locations and exon-intron structures of genes in genomic sequences from a variety of organisms..GENSCAN
2007GlimmerGlimmer is a system for finding genes in microbial DNA, especially the genomes of bacteria, archaea, and viruses.Glimmer
2012Glimmer-MGGlimmer-MG is a system for finding genes in environmental shotgun DNA sequences.Glimmer-MG
2000HMMgeneHMMgene is a tool to do prediction of vertebrate and C. elegans genes.HMMgene
2007MEDMED is a new non-supervised gene prediction algorithm for bacterial and archaeal genomes.MED
2008MetaGeneAnnotatorMetaGeneAnnotator is a gene-finding program for prokaryote and phage.MetaGeneAnnotator
2013MetaGUNMetaGUN is a gene prediction method for metagenomic fragments based on a machine learning approach of SVM.MetaGUN
2013MGCMGC is an application for finding complete and incomplete genes in metagenomic reads.MGC
2009OrpheliaOrphelia is a metagenomic ORF finding tool for the prediction of protein coding genes in short, environmental DNA sequences with unknown phylogenetic origin.Orphelia
2012MetaProdigalProdigal can run in metagenomic mode and analyze sequences even when the organism is unknown.MetaProdigal

The quality of gene predictions in microbial metagenome data sets is inferior to those of sequenced genomes. Combining multiple gene finders, screening intergenic regions for overlooked genes and using dedicated frameshift detectors are common strategies to overcome at least some of these limitations.

Shares