pucbchem

Using PubChem to explore compound-protein interactions

November 8, 2023 Off By admin
Shares

Using PubChem to Explore Compound-Protein Interactions

PubChem is the world’s largest collection of freely accessible chemical information. You can search for chemicals by name, molecular formula, structure, and other identifiers. It provides information on chemical and physical properties, biological activities, safety and toxicity data, patents, literature citations, and more. Since it primarily focuses on chemical entities, it may be a bit challenging for biologists to navigate. I introduce this resource because drug discovery and drug interactions are increasingly relevant to basic science research, and PubChem offers a wealth of information about various chemical compounds. The extent of information on how these compounds interact with different proteins is limited by the assays that have been published and shared.

I will not provide an in-depth tutorial here but will walk you through the process of finding information for a single protein. Extensive tutorials are available through the link above if you believe this site will be useful for your research

pucbchem

Type “BRCA1” in the search box and click the search icon (magnifying glass) to the right of the search text box. It will return many records under different categories, as shown in Figure. Click on the “Genes” tab, and you should find a link to human BRCA1 at the top (Gene ID 672). Click on it, and it will open a window that should resemble the one found in Figure.

Using the Contents menu to the right, skip over the first 4 content lines and click on “4 Chemicals and Bioactivities.” This will take you to the relevant record. There are two tables of interest: 4.1 Tested Compounds and 5.1 BioAssays. Here, you can find compounds that have been demonstrated to have some sort of activity against BRCA1. To the right, above the column headers of the tables, you’ll find a link to download. It will provide you with different options, but choose CSV as it can be directly imported into Excel. Please note that the CSV file for Tested Compounds (named GeneID_672_bioactivity_gene.csv) contains over 350,000 records, so it may take some time to load into Excel.

pubchem

“What would you do with this file? The data file includes columns for activity (active, inactive, etc.), assay type, activity value (reported in µM), among other data types. You could sort by Activity, keeping only active compounds, and then by activity value (acvalue), keeping only those with a value less than 1-2 µM. This would narrow down a search for compounds or related compounds that may be of interest.

The table 5.1 Bioassay (downloads as GeneID_672_bioassays.csv) contains only 11 rows listing the different types of bioassays referenced in this gene record. It’s not the easiest to read because some of the data cells have paragraphs of text in them, but it does provide information and descriptions of assays conducted with BRCA1, where they were done, and compounds tested in those assays.

Again, the PubChem site is probably more useful for those who are working in labs with an active interest in identifying new therapeutic candidates. However, it may be useful to know what kinds of assays have been developed for your protein of interest.”

Take-home points:

  1. There is a LOT of biomolecular data, and much of it is shared between databases.
  2. The various web tools/interfaces provide different approaches to viewing and interacting with this data. There is likely to be more than one way to answer whatever questions you might have. The most important thing is to document what tools you used, when you accessed them, and for genomic data, what assembly/version of the genome you accessed.
  3. This tutorial provides only a glimpse of the types of searches and questions you can ask. Take your time going through it and spend some extra time exploring what other features are available. All of the main websites will have readily available information and/or tutorials to explain what you are looking at.
Shares