Introduction to Biological Pathway Databases
August 5, 2019Table of Contents
What is Biological Pathway?
A biological pathway is a series of interactions among molecules in a cell that leads to a certain product or a change in a cell. Such a pathway can trigger the assembly of new molecules, such as a fat or protein. Pathways can also turn genes on and off, or spur a cell to move.Some of the most common biological pathways are involved in metabolism, the regulation of gene expression and the transmission of signals. Pathways play a key role in advanced studies of genomics.
Pathway databases
Pathway information is available through a large number of databases ranging from high-quality databases created by professional curators to massive databases, covering a vast number of putative pathways, created through natural language processing and text mining of abstracts. Because of the various differences in size, quality, and/or property, it is necessary to use the right database for the user’s purpose, regardless of whether it is for commercial or for public use.
Major Pathway Databases
Pathway databases are being created all around the world. Each database strongly reflects its builder’s intent and purpose. There are databases with detailed metabolic pathways, while others have detailed signaling pathways. Most databases are created by curators who read papers and extract pathway information which will be organized together with pathway diagrams in the databases. Others are created using natural language processing and text mining, which extract from papers various biological relations such as gene regulatory relations and organize them into databases.
KEGG
KEGG (Kyoto Encyclopedia of Genes and Genomes) (http://www.kegg.jp/) is a series of databases developed by both the Bioinformatics Center of Kyoto University and the Human Genome Center of the University of Tokyo.As the name encyclopedia suggests, the database includes information necessary for systems understanding of biology, such as genome sequences and chemical information. The “Pathway” section of KEGG consists mainly of metabolic pathways. For noncommercial uses, the license is free, while for commercial uses, the license is sold from Pathway Solutions Inc.(http://www.pathway.jp/).
KEGG is unique for its focus and coverage of yeast, mouse, and human metabolic pathways. Currently, signaling pathways for cell cycles and apoptosis are being expanded. New pathways are created by professionals (curators) who read and summarize the relevant literature. The database is stored in a format called KEGGML. Since the pathways are then displayed as GIF files, the user cannot easily edit the pathway information.
BioCyc
BioCyc is a pathway database provided by SRI International (http://www.biocyc.org/).
The database is a high-quality database focused on metabolic pathways originally formed by SRI International’s bioinformatics research group. Related to BioCyc are the EcoCyc, MetaCyc, HumanCyc databases. Licenses are free for academic and nonprofit uses. Humans and E. coli are the major organisms listed with a variety of others. EcoCyc is mainly a database of E. coli metabolic pathways. These reactions are shown in the form of chemical equations. EcoCyc also contains a small number of signaling pathways. Curators extracted the pathway knowledge from themliterature. Pathways are described with a proprietary format.
In addition, gene regulatory information upstream of the metabolic pathways ismalso listed. In other words, there is a link from a metabolic pathway to the genes mcoding enzymes and its regulators. The pathway map displays are separated in levels of detail. At the most detailed level, the metabolic products are shown in terms of the chemical equations.
Reactome
Reactome is a pathway database containing cell metabolic and signaling pathways (http://www.reactome.org/). Cold Spring Harbor Laboratory, European Bioinformatics Institute, and Gene Ontology Consortium—which specifies Gene Ontology mentioned later—are the main developers of the project. Although humans are the main organism catalogued, it has data for 22 other species such as mouse and rat.Reactome’s pathways and reactions can be viewed but not edited through a web browser. Though the storage format is proprietary, a large number of pathways can be obtained in multiple formats.
WikiPathways
WikiPathways is a community resource for contributing and maintaining content dedicated to biological pathways. Any registered WikiPathways user can contribute, and anybody can become a registered user. Contributions are monitored by a group of admins, but the bulk of peer review, editorial curation, and maintenance is the responsibility of the user community. WikiPathways is built using MediaWiki software, a custom graphical pathway editing tool (PathVisio) and integrated BridgeDb databases covering major gene, protein, and metabolite systems.
Commerical pathway databases
Ingenuity Pathways Knowledge Base
Ingenuity Pathways Knowledge Base (IPKB) is the pathway database created by Ingenuity Systems Inc. (http://www.ingenuity.com/). All licenses, including academic and nonprofit, require a fee. The database consists of gene regulatory and signaling pathways. Curators extract knowledge from the literature for this database, which currently contains human, mouse, and rat genetic information.
ResNet
ResNet (http://www.ariadnegenomics.com/) is the pathway database created by Ariadne Genomics. Academic and commercial licenses require a fee. The pathways of ResNet consist mainly of gene regulatory and signaling pathways. Unlike other databases, ResNet is constructed through computer analysis. In other words, the pathways and networks are created through natural language processing of relevant literature. MedScan is used for this natural language processing procedure. The database is constructed mainly from abstracts in PubMed, but some entries make use of the full text. In addition, there are a small number of entries created by curators. The pathway data created by MedScan can be viewed through the viewing tool Pathway Studio. Similarly to other databases, MedScan uses its own proprietary format.
Meta-data databases
Consolidating the knowledge contained in various databases is performedby meta-data databases. PathGuide (http://www.pathguide.org/), a comprehensive catalog of interactionand pathway related resources, currently lists over 702 resources in its meta-database. PathwayCommons and ConsensusPathDB are example of databases which house integrated biological pathway data. Theformer, in particular, collects data from various providers and represents itin a standardized format. These meta-data databases are especially suited for analysing consolidated pathway information.
Limitations of biological databases
Though crucial for data organization and storage, the challenges affiliatedwith biological databases are manifold. Firstly, integration of databasecontent is complicated by inconsistencies at the ontological level; non-standardized nomenclature thus requires a workaround, necessitating map-pings. Another major issue is that data may be incomplete, ambiguous,contain errors, redundancies or inconsistencies with the literature.Regular updates may also be wanting with the arrival of new knowledge. Issues with specialized databases include pathway maps in pathway databases which are often static in nature and represent only a snapshot of biology. It is important then that such variability be assessed and accounted for in some capacity, such that the end users of these databases stand to benefit from their utility.
Conclusion
While there are many pathway databases, even an idealized unified version of them is still far from being comprehensive. Most of the database providers are focused on a particular type of biological processes, reflecting the research interest and expertise of a specific group. The databases vary greatly in their content, quality, and completeness. Furthermore, the lack of resources limits the ability of most database providers to offer up-to-date pathway knowledge since the scientific literature to digest is very large and constantly accumulating. Currently, the information stored in pathway databases still falls behind the knowledge presented in scientific articles. An integrative approach seems to be a natural solution to the problems; yet, it is hindered by issues such as heterogeneous data models and lack of standardized data access methods. Various data exchange standards have been developed to assist the storage, organization, and exchange of pathway information. However, they are still in an early developmental stage.