Protein-Protein Interaction Prediction Tutorial

March 13, 2024 Off By admin
Shares

Using Protein-Protein Interaction Databases

Links:

String DB: https://string-db.org/

IntAct: https://www.ebi.ac.uk/intact/home BioGRID: https://thebiogrid.org/

STRING: Database of known & predicted protein-protein interactions.

The STRING database is derived from high-throughput experiments, co-expression data, literature data and genomic context. It has data from >67 million proteins from

>15000 organisms. The interaction network view they provide is quite intuitive and offers multiple options for viewing it, including evidence based, type of interaction, confidence and others.

Figure 1: STRING default network view of BRCA2

Go to the STRING homepage, click on the Search button and type in BRCA2 in the protein name text box and select humans from the organism drop-down menu. Click the GO button. On the next page, it will list various organism choices for BRCA2, with the human one checked. Click the Continue button. It should display a network view like that shown in Figure 1.

By default, the network will show only 10 interactions. You can expand the Settings tab to change the max number of proteins to show from 10 to 20. You can also change the confidence score to show only high confidence interactions. The network view summarizes the network of predicted associations for a

particular group of proteins. The network nodes are proteins. Clicking a node will open a box with additional details about that

Figure 2: More info about PALB2 protein.

protein. You can change the Network type from full STRING network to physical network. The second option shows only interactions that are part of a physical complex.

The edges (lines between proteins) represent the predicted functional associations. The default view is evidence based. An edge may be drawn with up to 8 differently colored lines representing types of evidence used in predicting the associations. Expand the Legend tab

to see what the different colors represent.

After you’ve clicked on a protein for more information, you can also use a link within that popup window to re-center the network on the secondary protein. Within the popup window are links to the protein record in

Figure 3: PPI from down-regulated RSV proteins

other databases (Figure 2).

Expand the Exports tab to save the image in a variety of formats and views. PNG images can be imported into Word or Powerpoint. It offers high resolution images that can be used for publications. Click the More button to expand the network. They have made improvements to the ability to download lists of data also. You can download the list of proteins that are shown

in the current network view using the Export your current network -> protein annotations. You can also download the interactions using Export your current network -> as simple tabular text output. Either file can be

opened in Excel. If you open the network text data, you can sort by the type of evidence and the score. For example, if you wanted to know which interactions are based on experimental evidence, it would be easy to find. NOTE: the interactions downloaded do not necessarily include the protein you entered in the search box. This is a network and it will show the interactions expanded out from the first protein. The STRING export does not include the evidence for the interaction beyond whether it is based on neighborhood, co- expression, experimental, ect. That is, no references are included.

Analyzing lists of proteins

Click on the Multiple Proteins link from the main page. This opens up a text box that will allow you to input a list of proteins to look for PPI within a list of related protein- coding genes,

Click on the link PPItutorial_GeneIDs.xlsx on Exercise 6 homepage. Past the list of UniprotAC IDs into the text box and select human for the Organism. Initially the network will be quite large with many disconnected nodes. To see only connected nodes and only the high-confidence ones, click on the Settings button and change the minimum required score to highest confidence 0.9. Click on the check box next to Hide Unconnected nodes

Figure 4: High confidence network without disconnected nodes

as shown in Figure 4. Then click the Update button. It should like the network shown in Figure 4.

The Analysis tab offers the option to conduct statistical tests to determine if the observed network s to see the network parameters, such as number of nodes and edges and the statistical analysis of whether this number of PPI would be expected to have occurred by chance or not. It also includes enrichment for GO terms and KEGG pathways. You should see a pretty good overlap with similar analyses done using DAVID or WebGestalt, but there may be small differences.

Network for query list vs network for single protein in that list

The query list network does not show all the possible interactions for every protein in the query list, it only shows the interactions between the proteins in that list. Select 1 protein from the network you generated above. Double-click so that it opens the window for that protein, as show in Figure 2. Within that pop-up window is an option to re-center the network on that node. Click on it and the network view should shift to only showing that protein. Now in the settings tab change the max number of interactions to show in the 1st shell to 5 or 10. You should now see a network centered on the single protein. This network may or may not include

Figure 5: Search options at IntAct

proteins from the original query list. You can expand out networks and do analyses on the expanded networks as you did for a query list.

All in all, STRING is one of the better designed, user- friendly sites for analyzing gene lists with respect to potential protein-protein interactions.

The lack of references supporting the interactions is a drawback.

IntAct

In the text box below the Quick Search tab in the IntAct website, type in TLR4 and wait for a few seconds for the window to expand to include a list of possible Interactors match as shown in Figure 5. Click on

Figure 6: Network filter for IntAct

the TLR4 (O002066) link, which is the Uniprot accession for the human Tlr4. It will open a page with a network view followed by a list of the Interactions in table view. You can change the options for the network view by using the filters in the purple menu above the network figure.

You can export an image file or a table of

interactions using the menu to the right of the network filters. The Network export options are PNG or GraphML and the Table export options are miTab or miXML. The miTAB exports as a .txt file which you should be able to import into Excel.

BioGRID (Biological General Repository for Interaction Datasets)

In the BioGRID Gene/Identifier Search box, type in TLR4 and make sure Homo sapiens is the organism in the drop-down menu below it. Click the GO button. It should return results like those shown in Figure 7. In this database, there are 43 interactors and 69 published interactions for TLR4. The Current Statistics window lists how many are physical versus genetic interactions and how many were determined by High throughput versus Low throughput experiments. If you click on the Blue Bar

Figure 7: Results for TLR4 search of BioGRID

Download Curated Data for this Protein” it will give you the option to download the data in multiple formats, with BioGRID TAB 3.0 as the default. This will download the interaction data between proteins and the protein-chemical interaction data as a zip archive. The two interaction files are tab-delimited text files within the zip archive. You can open this on a Mac by right-clicking and opening with Archive utility. It may open automatically by double-clicking on a PC. I’ve put a links to the interactions file in both formats on the exercise 6 website so you can download them, import into Excel and see how they present the data differently.

If you return to the search result page and click on the Interactions tab, you can see the different interactions with the experimental evidence color coded by the type of assay. Open the links for one of the references in a new tab and it will list all the protein interactions found in that publication. This database of interactions has undergone substantial revision since last year and is quite user-friendly for finding interactions and the experimental evidence for those interactions.

Return to the search result page and click on the Network tab to show the interactions in a network mode. It also includes small molecules. You can filter by minimum evidences and export the image. It is also interactive, though not a full featured as the STRING-DB network view.

You can use either BioGRID or IntAct for the exercise.

This information is in the lecture, but below is a brief description of the source data for StringDB, IntAct and BioGRID.

StringDB:

IntAct:

BioGRID

Dosage growth defect

Affinity capture- MS or luminescence

Genetic interactions

Physical interactions

Affinity capture RNA or Western Dosage lethality

Biochemical Activity Dosage rescue

Co-crystal Structure Negative genetic

Co-fractionation Phenotypic enhancement

Co-localization Phenotypic suppression

Co-purification Postive genetic

Far Western Synthetic growth defect

FRET Synthetic haploinsufficiency

Proximity Label-MS Synthetic lethality

Reconstituted complex Synthetic recue

Two hybrid

https://wiki.thebiogrid.org/doku.php/experimental_systems

 

Shares