Bioinformatics for metabolomics Overview

July 22, 2019 Off By admin

BIOINFORMATICS FOR METABOLOMICS
Metabolomics generates large amounts of data like other functional genomics research. It is a clear challenge for researchers for handling, processing and analyzing this data as it requires specialized mathematical, statistical and bioinformatical tools. Functional genomics data sets are extensive and multi-dimensional, making their organization in properly designed databases a necessity rather than an option but the analysis of these metabolomics data sets is equally challenging, as it needs an established procedure which is a useful algorithm selection need to establish. Metabolomics has provided unique bioinformatiocs need in addition to others common in microarray or proteomics data. Major areas for developing bioinformatics is crucial as to further progress of metabolomics like data and information management, raw analytical data processing, metabolomics standards and ontology, statistical analysis and data mining, data integration and mathematical modelling of metabolic networks within the framework of systems biology.

Processing metabolomics raw data becomes the most challenging and it consumes a lot of time in data analysis.Noise reduction, spectrum deconvolution, peak detection and integration, chromatogram alignment, compound identification and quantificationare involved in processing a set of raw chromatograms. AMDIS (automated mass spectral deconvolution and identification system, http://chemdata.nist.gov/mass-spc/amdis/) software is a solution for metabolomics as it requires automated mass automated data processing solution. AMDIS software has utilized well- described algorithms which is very useful for processing GC-MS data, but its applicability to LC–MS or CE–MS is somewhat limited. ESI–LC–MS data can be processed using the component detection algorithm (CODA) or ‘windowed mass selection method’ (WMSM). The development of MZmine (http://mzmine.sourceforge.net/index.shtml) become a platform-independent software for processing of the LC–MS data in metabolomics and proteomics applications. The software employs a modular infrastructure with the ability to integrate new algorithms and applications. Another important feature of the software is that it is expandable to other types of mass spectral data like GC–MS and CE–MS and is vendor independent.

Common Software List for Metabolomic Analysis

A wide range of statistical and machine-learning algorithms have been analyzed in metabolomics data. These can be classified into two major classes which are unsupervised and supervised algorithms. Examples of unsupervised methods that have been routinely used in analysing metabolomics data are hierarchical clustering, principal component analysis (PCA) and self-organizing maps . Supervised methods include ANOVA, partial least squares (PLS) [ and discriminant function analysis (DFA).

Metabolomics data sets are largely underdetermined as it contains many more variables than samples, same like others ‘omic’ data sets. In a typical ‘omic’ experiment an average of several hundred to tens of thousands of variables are measured. For example, all the genes in the microarray experiment, or hundreds of metabolites in metabolomics study. But only a relatively small number of samples are collected to examine this high-dimensional space. For statistical analysis of these data, it is important to reduce the number of variables in order to obtain uncorrelated features in the data. By using significance methods in ANOVA and t-tests, through linear combinations of variables in PCA or by using evolutionary algorithms such as genetic algorithms or genetic programming, this can be best achieved. Evolutionary algorithms have been successfully applied to metabolomics data as it carried out in combination with a second analysis algorithm which is PLS or DFA that search for combinations of variables most effective in the secondary algorithm, and are guided by principles of evolution and selection of species.

To collect both metadata, raw and processed experimental data, it requires database management systems for metabolomics. It is important to store metadata, cover experimental design, the nature of the samples and their treatment prior to the analysis and information about the analytical technique and data-processing details are as it is used to reproduce the experimental conditions and compare results obtained in different laboratories. This is, in part, similar to the requirements of the MIAME protocol for microarray data, but has a number of extra requirements beyond MIAME. Also, it is important to store and organized raw data coming from the analytical instrument, as well as subsequently processed and statistically analyzed data. Typically, data for a single biological sample contains several parallel streams from different instruments obtained using the different analytical technique. Thus, it becomes challenging for storing and archiving metabolomics data and emphasizes the need for properly designed databases. Reference databases are needed to collect the list of metabolites observed in each species, like human, Arabidopsis, Drosophila, yeast, etc. Ideally, the metabolomics database should be comprehensive and flexible enough to incorporate new data types owing to novel developments in technology as well as to accommodate corresponding data from parallel ‘omics’ platforms including transcriptomics and proteomics that are often collected in the same experiment and share metadata.

There is a number of databases, data management, analysis and visualization tools are now publicly available. These include, among others, metabolic pathway databases and pathway viewers KEGG (http://www.genome.ad.jp/kegg/), MetaCyc (http://metacyc.org/), AraCyc (http://www.Arabidopsis.org/tools/aracyc/), MapMan (http://gabi.rzpd.de/projects/MapMan/) and KaPPA-View (http://kpv.kazusa.or.jp/kappa-view/), the data model for plant metabolomics experiments ArMet (http://www.armet.org/), functional genomics databases MetNet (http://metnet.vrac.iastate.edu/) and DOME (http://medicago.vbi.vt.edu). DOME, a database developed by the Mendes group at the Virginia Bioinformatics Institute, provides an example of a comprehensive data management system for metabolomics as well as for other genomics data. This is a bioinformatics database that helps for metabolomics database.