pathguide-omicstutorials

Introduction to Biological Pathway Databases

August 5, 2019 Off By admin
Shares

What is Biological Pathway?

A biological pathway is a series of interactions among molecules in a cell that leads to a certain product or a change in a cell. Such a pathway can trigger the assembly of new molecules, such as a fat or protein. Pathways can also turn genes on and off, or spur a cell to move.Some of the most common biological pathways are involved in metabolism, the regulation of gene expression and the transmission of signals. Pathways play a key role in advanced studies of genomics.

Pathway databases

Pathway information is available through a large number of databases ranging from high-quality databases created by professional curators to massive databases, covering a vast number of putative pathways, created through natural language processing and text mining of abstracts. Because of the various differences in size, quality, and/or property, it is necessary to use the right database for the user’s purpose, regardless of whether it is for commercial or for public use.

Major Pathway Databases

Pathway databases are being created all around the world. Each database strongly reflects its builder’s intent and purpose. There are databases with detailed metabolic pathways, while others have detailed signaling pathways. Most databases are created by curators who read papers and extract pathway information which will be organized together with pathway diagrams in the databases. Others are created using natural language processing and text mining, which extract from papers various biological relations such as gene regulatory relations and organize them into databases.

KEGG

KEGG (Kyoto Encyclopedia of Genes and Genomes) (http://www.kegg.jp/) is a series of databases developed by both the Bioinformatics Center of Kyoto University and the Human Genome Center of the University of Tokyo.As the name encyclopedia suggests, the database includes information necessary for systems understanding of biology, such as genome sequences and chemical information. The “Pathway” section of KEGG consists mainly of metabolic pathways. For noncommercial uses, the license is free, while for commercial uses, the license is sold from Pathway Solutions Inc.(http://www.pathway.jp/).
KEGG is unique for its focus and coverage of yeast, mouse, and human metabolic pathways. Currently, signaling pathways for cell cycles and apoptosis are being expanded. New pathways are created by professionals (curators) who read and summarize the relevant literature. The database is stored in a format called KEGGML. Since the pathways are then displayed as GIF files, the user cannot easily edit the pathway information.

kegg_overview

BioCyc

BioCyc is a pathway database provided by SRI International (http://www.biocyc.org/).
The database is a high-quality database focused on metabolic pathways originally formed by SRI International’s bioinformatics research group. Related to BioCyc are the EcoCyc, MetaCyc, HumanCyc databases. Licenses are free for academic and nonprofit uses. Humans and E. coli are the major organisms listed with a variety of others. EcoCyc is mainly a database of E. coli metabolic pathways. These reactions are shown in the form of chemical equations. EcoCyc also contains a small number of signaling pathways. Curators extracted the pathway knowledge from themliterature. Pathways are described with a proprietary format.
In addition, gene regulatory information upstream of the metabolic pathways ismalso listed. In other words, there is a link from a metabolic pathway to the genes mcoding enzymes and its regulators. The pathway map displays are separated in levels of detail. At the most detailed level, the metabolic products are shown in terms of the chemical equations.

BioCyc_pathwaydatabase

Reactome

Reactome is a pathway database containing cell metabolic and signaling pathways (http://www.reactome.org/). Cold Spring Harbor Laboratory, European Bioinformatics Institute, and Gene Ontology Consortium—which specifies Gene Ontology mentioned later—are the main developers of the project. Although humans are the main organism catalogued, it has data for 22 other species such as mouse and rat.Reactome’s pathways and reactions can be viewed but not edited through a web browser. Though the storage format is proprietary, a large number of pathways can be obtained in multiple formats.

reactome_pathway

WikiPathways

WikiPathways is a community resource for contributing and maintaining content dedicated to biological pathways. Any registered WikiPathways user can contribute, and anybody can become a registered user. Contributions are monitored by a group of admins, but the bulk of peer review, editorial curation, and maintenance is the responsibility of the user community. WikiPathways is built using MediaWiki software, a custom graphical pathway editing tool (PathVisio) and integrated BridgeDb databases covering major gene, protein, and metabolite systems.

wikipathway_pathwaydatabase-omicstutorials

Commerical pathway databases

Ingenuity Pathways Knowledge Base

Ingenuity Pathways Knowledge Base (IPKB) is the pathway database created by Ingenuity Systems Inc. (http://www.ingenuity.com/). All licenses, including academic and nonprofit, require a fee. The database consists of gene regulatory and signaling pathways. Curators extract knowledge from the literature for this database, which currently contains human, mouse, and rat genetic information.

Ontology and Knowledge Infrastructure. Ingenuity Pathways Knowledge Base. Client Solutions. Ingenuity Pathways Analysis. Portal & Enterprise Search Enablement. Specialist Analytics. ~ 1.8 million findings manually extracted from full text. ~160 curated metabolic and cell signaling pathways. Chemical and drug info. Signatures. Scalable best-in-class Content Acquisition processes. Ingenuity Ontology of ca. 600,000 biological objects and processes in 12 major branches. Robust, up-to-date synonym library. Knowledge Infrastructure tools and processes for structuring biological and chemical knowledge. Enterprise KM Infrastructure.

ResNet

ResNet (http://www.ariadnegenomics.com/) is the pathway database created by Ariadne Genomics. Academic and commercial licenses require a fee. The pathways of ResNet consist mainly of gene regulatory and signaling pathways. Unlike other databases, ResNet is constructed through computer analysis. In other words, the pathways and networks are created through natural language processing of relevant literature. MedScan is used for this natural language processing procedure. The database is constructed mainly from abstracts in PubMed, but some entries make use of the full text. In addition, there are a small number of entries created by curators. The pathway data created by MedScan can be viewed through the viewing tool Pathway Studio. Similarly to other databases, MedScan uses its own proprietary format.

resnet-pathway

Meta-data databases

Consolidating the knowledge contained in various databases is performedby meta-data databases. PathGuide (http://www.pathguide.org/), a comprehensive catalog of interactionand pathway related resources, currently lists over 702 resources in its meta-database. PathwayCommons and ConsensusPathDB are example of databases which house integrated biological pathway data. Theformer, in particular, collects data from various providers and represents itin a standardized format. These meta-data databases are especially suited for analysing consolidated pathway information.

pathguide-omicstutorials

Limitations of biological databases

Though crucial for data organization and storage, the challenges affiliatedwith biological databases are manifold. Firstly, integration of databasecontent is complicated by inconsistencies at the ontological level; non-standardized nomenclature thus requires a workaround, necessitating map-pings. Another major issue is that data may be incomplete, ambiguous,contain errors, redundancies or inconsistencies with the literature.Regular updates may also be wanting with the arrival of new knowledge. Issues with specialized databases include pathway maps in pathway databases which are often static in nature and represent only a snapshot of biology. It is important then that such variability be assessed and accounted for in some capacity, such that the end users of these databases stand to benefit from their utility.

Conclusion

While there are many pathway databases, even an idealized unified version of them is still far from being comprehensive. Most of the database providers are focused on a particular type of biological processes, reflecting the research interest and expertise of a specific group. The databases vary greatly in their content, quality, and completeness. Furthermore, the lack of resources limits the ability of most database providers to offer up-to-date pathway knowledge since the scientific literature to digest is very large and constantly accumulating. Currently, the information stored in pathway databases still falls behind the knowledge presented in scientific articles. An integrative approach seems to be a natural solution to the problems; yet, it is hindered by issues such as heterogeneous data models and lack of standardized data access methods. Various data exchange standards have been developed to assist the storage, organization, and exchange of pathway information. However, they are still in an early developmental stage.

Shares