Protein Secondary Structure Prediction Made Simple: Essential Insights and Tools
August 8, 2024Table of Contents
Protein Secondary Structure Prediction in a Nutshell
Introduction
Proteins have a variety of roles that they must fulfill.They are the enzymes that rearrange chemical bonds and carry signals to and from the outside of the cell, and within the cell.Proteins also transports small molecules and form many of the cellular structures.They regulate cell processes, turning them on and off and controlling their rates.(1)
Protein structure is essential for the understanding of protein function. In order to recognize the protein functions of proteins at a molecular level, it is sometimes necessary to determine their 3D structure. Accurately and reliably predicting structures from protein sequences is one of the most challenging tasks in computational biology. Protein secondary structure prediction provides a significant first step toward tertiary structure prediction, as well as offering information about protein activity, relationships, and functions.Protein secondary structure refers to the local conformation proteins’ polypeptide backbone.(2)
Secondary structure refers to regular, recurring arrangements in space of adjacent amino acid residues in a polypeptide chain. It is maintained by hydrogen bonds between amide hydrogens and carbonyl oxygens of the peptide backbone. The major secondary structures are α-helices and β-structures.
Beta-helical structures merge features of the two motifs, containing two or three beta-sheet faces connected by loops or turns in a single protein. Beta-helical structures form the basis of proteins with diverse mechanical functions such as bacterial adhesins, phage cell-puncture devices, antifreeze proteins, and extracellular matrices(3). Alpha-helices are commonly found in cellular and extracellular matrix components, whereas beta-helices such as curli fibrils are more common as bacterial and biofilm matrix components. It is currently not known whether it may be advantageous to use one helical motif over the other for different structural and mechanical functions.
Here we will see the overview of current trends and perspective in protein secondary structure prediction.(1)
Evolution And Improvements Of Protein Secondary Prediction Over Years
The first attempts to predict secondary structure were made in the 1970s and involved only single sequences. Most early methods basically relied on a straightforward statistical analysis of sequence composition underlying the three secondary structure elements Predicting protein secondary structure improved substantially in the 1990s through the use of evolutionary information taken from the divergence of proteins in the same structural family (4). Recently, the evolutionary information resulting from improved searches and larger databases has again boosted prediction accuracy by more than four percentage points to its current height of around 76% of all residues predicted correctly in one of the three states, helix, strand, and other.
The past year also brought successful new concepts to the field. These new methods may be particularly interesting in light of the improvements achieved through simple combining of existing methods. Divergent evolutionary profiles contain enough information not only to substantially improve prediction accuracy, but also to correctly predict long stretches of identical residues observed in alternative secondary structure states depending on nonlocal conditions. (5)
An example is a method automatically identifying structural switches and thus finding a remarkable connection between predicted secondary structure and aspects of function. Secondary structure predictions are increasingly becoming the work horse for numerous methods aimed at predicting protein structure and function.(6)
Application And Benefits Of Protein Secondary Structure
It has been observed that secondary structure elements are formed early on during folding. Their subsequent assembly results in the proteins’ initial structural framework. As a consequence of this so-called framework model of protein folding, secondary. Cartoon representation of a protein structure containing all three secondary structure elements: helices, strands (displayed as arrows) and coils (displayed as ropes).The PDB-ID of the protein, a formyl transferase, is 1meo. 328 Pirovano and Heringa structure prediction techniques are often implemented in methods that infer protein 3D structures. One of these approaches, called threading, aims at the identification of a template structure that most closely matches a given query structure (7). Threading techniques thus follow the so-called inverse folding problem, where the question is not what three-dimensional structure a given protein sequence will adopt (the folding problem) but what sequence is compatible with a given three-dimensional structure.(6)
In most threading implementations a database of tertiary structures (for instance, the Protein Data Bank (PDB) is scanned and for each fold a pseudo-energy is computed to determine if it is a good match for the query sequence, often in conjunction with its predicted secondary structure (8). Also ab initio prediction, where sequence information is used for de novo prediction of a 3D model, has been shown to benefit significantly from reliably predicted secondary structure.
In addition to fold recognition, secondary structure prediction has also been successfully integrated in a number of further important bioinformatics tools. These include homology detection programs, multiple sequence alignment routines and protein disorder prediction approaches . In all these cases, the common thread is that structure is more conserved than sequence. This applies particularly to more distantly related proteins, where evolutionary relatedness might not be discernible at the sequence level anymore but can still be detected at the structural level.
In some applications, secondary structure information is used indirectly. For instance, a threading algorithm might not directly use information from secondary structure prediction but employ a technique for remote homology detection that incorporates secondary structure prediction.(9)
Current Methods In Secondary Prediction
Also recent methods adopt the window approach that includes a local stretch of amino acids around a central position to predict the secondary structure state at that position. Training algorithms, when properly applied, then help to decipher the prediction rules. Whereas early methods relied on a straightforward statistical analysis of sequence composition underlying the three secondary structure elements, modern methods adopt more sophisticated machine learning protocols for gleaning the sequence signals associated with the secondary structure types.
However, the powerful combination of a large number of crystallized protein structures for better training techniques and the use of multiple sequence alignments have been of great advantage in recent prediction methods. The latter idea was first exploited by Zvelebil et al. in 1987 and its success was later also confirmed by Levin et al. and Rost and Sander. As a consequence, nowadays all state-of-the-art methods (including those described below) use multiple alignment information to better incorporate the evolutionary signals of residue and secondary structure conservation.
Another crucial development in the field concerns the usage of computational neural networks for secondary structure prediction. The earliest published method appeared when neural network computing was in its infancy only 2 years after the initial publication by (Rumelhart et al).After the first successful neural network implementation by Qian and Sejnowski, prediction algorithms based on other computational formalisms were developed, the most important of which include k-Nearest-Neighbour approaches, Hidden Markov Model (HMM) methods and Consensus approaches. Although each of these techniques have their own distinct advantages, over the past several years neural Protein Secondary Structure Prediction 333 nets have turned out to be the most successful. As a result alternative approaches also have converged on neural nets, for instance by merging them into their original strategy.
Methods Used In Secondary Prediction
The early methods suffered from a lack of data. Predictions were performed on single sequences rather than families of homologous sequences, and there were relatively few known 3D structures from which to derive parameters. Probably the most famous early methods are those of Chou & Fasman, JPRED , GOR method. Although the authors originally claimed quite high accuracies (70-80 %), under careful examination, the methods were shown to be only between 56 and 60% accurate .An early problem in secondary structure prediction had been the inclusion of structures used to derive parameters in the set of structures used to assess the accuracy of the method.
Chou & Fasman
The Chou-Fasman algorithm for the prediction of protein secondary structure is one of the most widely used predictive schemes. This is because of its relative simplicity and its reasonably high degree of accuracy. It is based on knowledge of the potential of amino acid residues to form α-helical or β-sheet regions in proteins.
The Chou–Fasman algorithm is a statistical procedure based on assigning conformation potentials to all amino acid residues. Conformation potentials, one for each conformation state, are obtained from statistical analysis of proteins of known secondary structure. The folding mechanism is based on the assumption that nucleation for a particular conformation site starts at the region of maximum conformation potential and continues until a region of low conformation potential is reached.(10)
JPRED
Jpred is a web server that takes a protein sequence or multiple alignment of protein sequences, and from these predicts secondary structure using a neural network called Jnet. The prediction is the definition of each residue into either alpha helix, beta sheet or random coil secondary structures.(11)
Jpred (http://www.compbio.dundee.ac.uk/jpred) is a secondary structure prediction server powered by the Jnet algorithm. Jpred performs over 1000 predictions per week for users in more than 50 countries. The recently updated Jnet algorithm provides a three-state (alpha-helix, beta-strand and coil) prediction of secondary structure at an accuracy of 81.5%. Given either a single protein sequence or a multiple sequence alignment, Jpred derives alignment profiles from which predictions of secondary structure and solvent accessibility are made (12). The predictions are presented as coloured HTML, plain text, PostScript, PDF and via the Jalview alignment editor to allow flexibility in viewing and applying the data. The new Jpred 3 server includes significant usability improvements that include clearer feedback of the progress or failure of submitted requests. Functional improvements include batch submission of sequences, summary results via email and updates to the search databases. A new software pipeline will enable Jnet/Jpred to continue to be updated in sync with major updates to SCOP and UniProt and so ensures that Jpred 3 will maintain high-accuracy predictions.(11)
GOR Method
The GOR method of protein secondary structure prediction is described as the original method was published by Garnier, Osguthorpe, and Robson in 1978 and was one of the first successful methods to predict protein secondary structure from amino acid sequence. The GOR method is one of the most popular of the secondary structure prediction schemes. Through the successive incorporation of observed frequencies of single, then pairs of residues on a local sequence of 17 residues, the accuracy of the GOR method has improved from about 55% up to 64.4%.(13)
Prediction of secondary structure provides a probabilistic template for identifying weakly homologous domains in the data base of known protein conformation. A domain or protein of known conformation is sought that has a similar pattern of secondary structure to that predicted. Secondly, the prediction of epitopic sites provides artificial peptides to serve as a basis for vaccines or diagnostics or biochemical probes. The “classic” Hopp-Woods procedure emphasizes the average polar character of segments of sequence, the most polar being the most promising as a sequence to be synthesized, linked to an immunogenic carrier, and used to raise antibodies.(14)
Moreover there is developed methods based primarily on bit pattern recognition and secondary structure prediction methods that allow for this. They have turned out to be remarkably successful in studies to be reported elsewhere, yet the epitopes involved were not always highly polar. Finally, circular dichroism data are used to aid predictions or rather the use of predictions to assign the locations of secondary structures once composition of secondary content in the protein is experimentally established (Gamier et al 1978).
PSIPRED
PSI-blast based secondary structure prediction (PSIPRED) is a method used to investigate protein structure. It uses artificial neural network machine learning methods in its algorithm. It is a server-side program, featuring a website serving as a front-end interface, which can predict a protein’s secondary structure (beta sheets, alpha helixes and coils) from the primary sequence. PSIPRED is available as a web service and as software. The software is distributed as source code, licensed technically as proprietary software. It allows modifying, but enforces freeware provisions by forbidding for-profit distribution of the software and its results.(2)
GHOR and ChouFasman tools are efficient because they provide the summary of the prediction results. In both GHOR and ChousFasman tools, the results were being displayed in charts and sequence annotation diagram. It is easily for users to observe the predicted patterns of the protein secondary structure and also provided the total residues and their percentage count. On the contrary, PSIPRED did displayed their results in plot and cartoon, but the tools do not provide the total residues and their percentage count.
All tools are showing specification in the results, where these tools provide sequence annotation charts and diagram and display the detailed residues index and percentages. Among all these tools, PSIPRED is the most specified tool because it provides a highly accurate secondary structure prediction method, MEMSAT 2, a new version of a widely used transmembrane topology prediction method and GenTHREADER, a sequence profile based fold recognition method.
Involvement Of Bioinformatics In Protein Secondary Prediction
Internet Computing Systems for Protein Structure Prediction
Access to a computing system designed for bioinformatics analysis is required. For example, a user account on a computer system running a Unix-flavored operating system (e.g., Solaris, Linux, or IRIX), sufficient memory, disk space, and applications (including an editor, a multiple sequence alignment program, a sequence similarity search program, and access to up-to-date biological sequence, structure, and bibliographic databanks) are required. Such facilities are available to registered users of the UK Human Genome Mapping Project Resource Centre (HGMP-RC) Bioinformatics facilities. An internet connection and use of a Web browser such as Netscape or Internet Explorer are needed. Search engines such as Google or Yahoo enable keyword searches to find web sites hosting bioinformatics programs and servers. Various databanks and analysis tools are available for protein database searches and secondary and tertiary structure prediction via the internet.(9)
Local Computing Systems for Protein Structure Prediction
Rasmol is a macromolecule viewer; the correct mime-types and helper-applications need to be set in the browser’s preferences to view structures. Rasmol was not designed to manipulate atomic stereochemistry. Software such as Composer , Modeller, WhatIf, SwissModel, and Naomi are of value in protein structure modeling to atomic detail.
In addition to building protein models, software packages are required to interactively visualize and monitor the building process. Commercial packages for molecular modeling, developed by Accelrys and Tripos, can provide these facilities. These commercial products have extensive graphical user interfaces and have been developed with emphasis on ease of use and project management and continuity. WhatIf also provides a continuing research environment for protein structure predictions and is one example of a noncommercial package.(15)
If you are affiliated with a nonprofit academic institute, access to many computational resources will be available free or at a lower cost compared to that available to commercial organizations. Having built a protein structure, you may like to investigate complex molecular recognition processes like protein interaction networks and ligand–receptor binding with the aim of designing therapeutic. If this is the case, control of visualizing and building accurately to atomic detail of complex macromolecules are important.
If you are interested in modeling reliable structures of proteins to atomic detail, then access to a bespoke molecular modeling computing system is required. This will provide modules to perform macromolecular editing and high-resolution interactive viewing capability and energy minimization facilities. The commercial software packages (SYBYL Tripos, Discover, and InsightII Accelrys) and the WhatIf suite provide such modules. Such programs need to be installed and maintained on local computer systems that can include (networked) Silicon Graphics workstations, plus a local copy of the PDB.(5)
Examine Structural and Local Environments in Protein Models by Using Joy
The program Joy annotates protein sequences with 3D structural features. Joy was designed to investigate properties of structural and local environments and conservation of amino acids in protein families. For instance, a solvent inaccessible side-chain hydrogen-bonded to a main-chain amide plays an important role in stabilizing the 3D structure and is well conserved in families of protein structure.
In a Joy display, this type hydrogen bond is shown in a bold faced letter in a formatted alignment. Also, solvent inaccessible residues are represented as upper-case letters. Structural features like these are highly conserved in families of protein structures and should be monitored in model building. (Edwards & Cottage, 2003)
Conclusion
The present process and method can’t help us to effectively discover the relationship between sequence and special conformation. The prediction accuracy is not good for our desirability especially when we use one dimension structure to predict three-dimension structure directly. So the research of protein secondary structure prediction has become a big issue in the fields of bioinformatics and it’s urgent for us to find out more reliable methods to predict protein secondary structure nowadays.
References
- 1. Reeb J, Rost B. Secondary structure prediction. Encycl Bioinforma Comput Biol ABC Bioinforma. 2018 Jan 1;1–3:488–96.
- 2. Jiang Q, Jin X, Lee SJ, Yao S. Protein secondary structure prediction: A survey of the state of the art. J Mol Graph Model. 2017 Sep 1;76:379–402.
- 3. Y Z. Progress and challenges in protein structure prediction. Curr Opin Struct Biol [Internet]. 2008 Jun [cited 2021 Oct 21];18(3):342–8. Available from: https://pubmed.ncbi.nlm.nih.gov/18436442/
- 4. Wang Y, Mao H, Yi Z. Protein secondary structure prediction by using deep learning method. Knowledge-Based Syst. 2017 Feb 15;118:115–23.
- 5. Edwards YJK, Cottage A. Bioinformatics methods to predict protein structure and function. A practical approach. Mol Biotechnol [Internet]. 2003 Feb [cited 2021 Nov 15];23(2):139–66. Available from: https://pubmed.ncbi.nlm.nih.gov/12632698/
- 6. Smolarczyk T, Roterman-Konieczna I, Stapor K. Protein Secondary Structure Prediction: A Review of Progress and Directions. Curr Bioinform. 2019 Oct 17;15(2):90–107.
- 7. Tomii K, Akiyama Y. FORTE: A profile-profile comparison tool for protein fold recognition. Bioinformatics. 2004 Mar 1;20(4):594–5.
- 8. Jones DT. Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol. 1999 Sep 17;292(2):195–202.
- 9. MacCarthy E, Perry D, KC DB. Advances in Protein Super-Secondary Structure Prediction and Application to Protein Structure Prediction. Methods Mol Biol [Internet]. 2019 [cited 2021 Oct 14];1958:15–45. Available from: https://link.springer.com/protocol/10.1007/978-1-4939-9161-7_2
- 10. Prevelige P, Fasman GD. Chou-Fasman Prediction of the Secondary Structure of Proteins. Predict Protein Struct Princ Protein Conform [Internet]. 1989 [cited 2021 Oct 27];391–416. Available from: https://link.springer.com/chapter/10.1007/978-1-4613-1571-1_9
- 11. C C, JD B, GJ B. The Jpred 3 secondary structure prediction server. Nucleic Acids Res [Internet]. 2008 [cited 2021 Oct 29];36(Web Server issue). Available from: https://pubmed.ncbi.nlm.nih.gov/18463136/
12. Drozdetskiy A, Cole C, Procter J, Barton GJ. JPred4: a protein secondary structure prediction server. Nucleic Acids Res [Internet]. 2015 Jul 1 [cited 2021 Nov 16];43(W1):W389–94. Available from: https://academic.oup.com/nar/article/43/W1/W389/2467870
- 13. Edwards YJK, Cottage A. Bioinformatics methods to predict protein structure and function: A practical approach. Appl Biochem Biotechnol – Part B Mol Biotechnol. 2003 Feb;23(2):139–66.
- 14. Garnier J, Robson B. The GOR Method for Predicting Secondary Structures in Proteins. Predict Protein Struct Princ Protein Conform [Internet]. 1989 [cited 2021 Oct 27];417–65. Available from: https://link.springer.com/chapter/10.1007/978-1-4613-1571-1_10
- 15. Finn R, Griffiths-Jones S, Bateman A. Identifying Protein Domains with the Pfam Database. Curr Protoc Bioinforma. 2003 Mar;1(1):2.5.1-2.5.19.
- Buchan, D. W. A., & Jones, D. T. (2019). The PSIPRED Protein Analysis Workbench: 20 years on. Nucleic Acids Research, 47(W1), W402–W407. https://doi.org/10.1093/nar/gkz297
- McGuffin, L. J., Bryson, K., & Jones, D. T. (2000). The PSIPRED protein structure prediction server. Bioinformatics, 16(4), 404–405. https://doi.org/10.1093/bioinformatics/16.4.404
- Protein Secondary Structure Prediction-Tips to improve prediction – Omics tutorials. (n.d.). Retrieved March 16, 2021, from https://omicstutorials.com/protein-secondary-structure-prediction-tips-to-improve-prediction/
- PSIPRED Workbench. (n.d.). Retrieved March 17, 2021, from http://bioinf.cs.ucl.ac.uk/psipred/
- UCL-CS Bioinformatics: PSIPRED Help. (n.d.). Retrieved March 17, 2021, from http://bioinf.cs.ucl.ac.uk/web_servers/psipred_server/psipred_help/
- UCL-CS Bioinformatics: PSIPRED overview. (n.d.). Retrieved March 17, 2021, from http://bioinf.cs.ucl.ac.uk/index.php?id=779