isoelectricpoint

In silico Protein physiochemical characterization-Tutorial

July 9, 2019 Off By admin
Shares

Theory behind physiochemical characterization and Importance of identification
Physiochemical characterization is very important to characterize specific proteins.

Isoelectric point
Background theory:
Isoelectric focusing or also called the pI of the protein is the pH at which its net charge is zero. A separation technique which separates peptides according to how acidic and basic their residues are. A gel with a pH gradient is used as the medium. The pH gradient is made by adding polyampholytes, which are multi-charged polymers, with different pI into the gel. Then the sample is put onto the gel and a voltage is applied. The proteins will move along the gel until they reach their isoelectric points. In other words, each protein will move until it reaches a position in the gel at which the pH is equal to the pI of the protein. a protein band that forms at a given pH can then be removed and analyzed further. This process can successfully separate proteins that have a difference in net charge greater than or equal to 1.

Isoelectric point (pI): The pH at which the net charge on the protein is zero. For a protein with many basic amino acids, the pI will be high, while for an acidic protein the pI will be lower. Isoelectric focusing is the first step in two-dimensional gel electrophoresis, in which proteins are first separated by their pI and then further separated by molecular weight through SDS PAGE.

isoelectricpoint

Molecular weight
Background theory:
Sodium dodecyl sulphate polyacrylamide gel electrophoresis (SDS-PAGE)is a method of gel elctrophoresis to separate proteins based on the their mass.Sodium dodecyl sulfate (SDS) is a detergent that breaks up the interactions between proteins. The proteins are dissolved in SDS and then electrophorised. The smallest molecules move through the gel fast, while larger molecules take longer and result in bands closer to the top of the gel. In practice, gel filtration can be used to separate proteins by molecular weight at any point in a purification of a protein.

molecularweight

Figure: Example of protein separation based on molecular weight

Image adpated from: Gull, I., Samra, Z. Q., Aslam, M. S., & Athar, M. A. (2013). Heterologous expression, immunochemical and computational analysis of recombinant human interferon alpha 2b. SpringerPlus2(1), 264.

Extinction coefficients
Background theory:
The extinction coefficient indicates how much light a protein absorbs at a certain wavelength. It is useful to have an estimation of this coefficient for following a protein which a spectrophotometer when purifying it.

The theoretically calculated extinction coefficient, which is in direct correlation with the cysteine, tryptophan, and tyrosine content. Computes the Ec for a range of 276, 278, 279, 280, and 282 nm wavelength, 280 nm is favored because proteins absorb strongly there. The computed EC values will help in the quantitative study of protein-protein and protein-ligand interactions in solution.

Instability index (II)
Backgroud theory
The II provides an estimate of the stability of the proteins in a test tube. Proteins whose II is smaller than 40 is predicted as stable, a value above 40 predicts that the protein may be unstable. The instability index is useful for storing proteins in the correct solvent. For example, it is well known that insulin monomer (instability index = 43.05) is unstable and tends to macroscopically aggregate in aqueous solution during storage, causing loss of hormone biological activity, which is a major obstacle for developing long-term delivery formulations

Aliphatic index
Background theory
The AI is a parameter for estimating thermal stability of a protein directly associating with the mole fraction of aliphatic side chains (Alanine, isoleucine, leucine, and valine) in the protein.

GRAVY
Background theory
The GRAVY value for a protein or a peptide is calculated by adding the
hydropathy values (Kyte and Doolittle, 1982) of each amino acid residues and dividing by the number of residues in the sequence or length of the sequence. GRAVY index indicates the solubility of proteins, increasing positive score indicates a greater hydrophobicity. A low GRAVY value deciphers that there is better interaction between protein and water. They are indicatiors of the hydrophilic and soluble behavior of proteins.

half-life
Background theory
The half-life is a prediction of the time it takes for half the amount of protein in a cell to disappear after its synthesis. ProtParam relies on the “N-end rule” that relates the half-life of a protein to the identity of its N-terminal residue; the prediction is given for three model organisms (human, yeast, and Escherichia coli).The N-end rule originated from the observations that the identity of the N-terminal residue of a protein plays an important role in determining its stability in vivo. The rule was established from experiments that explored the metabolic fate of artificial beta-galactosidase proteins with different N-terminal amino acids engineered by site-directed mutagenesis.

Computation prediction of physico-chemical characteristic of protein
The computed amino acids of protein sequences contain various information such as isoelectric point (pI), molecular weight (Mw), extinction coefficient (Ec), instability index (II), aliphatic index (AI), and Grand average of hydropathicity (GRAVY). As these parameters are very essential for studying their physiochemical properties, they were computed using Expasy’s ProtParm tool.

ProtParam computes various physico-chemical properties that can be deduced from a protein sequence.

Input: Amino acid sequence
The protein can either be specified as a Swiss-Prot/TrEMBL accession number or ID, or in form of a raw sequence. White space and numbers are ignored. If you provide the accession number of a Swiss-Prot/TrEMBL entry, you will be prompted with an intermediary page that allows you to select the portion of the sequence on which you would like to perform the analysis. The choice includes a selection of mature chains or peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking on the positions), as well as the possibility to enter start and end position in two boxes. By default (i.e. if you leave the two boxes empty) the complete sequence will be analyzed.

How it is computed?
All the physico-chemical characteristic can be computated using amino acid composition and dipeptid composition. It incorporates theortical value and algorithm based on values observed from experiments in calculation. For each phyisochemical characteristic calculation depends on their knowledge obtained from experimental calculation. It makes of experimental knowledge, values and computes computationally using amino acid composition. Below you can see the explanantion how it is computed theortically.

Extinction coefficients
It is possible to estimate the molar extinction coefficient of a protein from knowledge of its amino acid composition. Protparam calculates protein extinction coefficients using the Edelhoch method , but with the extinction coefficients for Trp and Tyr determined by Pace et al.

half-life
ProtParam relies on the “N-end rule”, which relates the half-life of a protein to the identity of its N-terminal residue; the prediction is given for 3 model organisms (human, yeast and E.coli). The N-end rule originated from the observations that the identity of the N-terminal residue of a protein plays an important role in determining its stability in vivo. The rule was established from experiments that explored the metabolic fate of artificial beta-galactosidase proteins with different N-terminal amino acids engineered by site-directed mutagenesis.

Instability index
Statistical analysis of 12 unstable and 32 stable proteins has revealed that there are certain dipeptides, the occurence of which is significantly different in the unstable proteins compared with those in the stable ones. The authors of this method have assigned a weight value of instability to each of the 400 different dipeptides (DIWV). Using these weight values it is possible to compute an instability index (II).

Using these weight values it is possible to compute an instability index (II) which is defined as:

i=L-1

II = (10/L) * Sum DIWV(x[i]x[i+1])

i=1

where: L is the length of sequence

DIWV(x[i]x[i+1]) is the instability weight value for the dipeptide starting in position i.

Aliphatic index
The aliphatic index of a protein is calculated according to the following formula:

Aliphatic index = X(Ala) + a * X(Val) + b * ( X(Ile) + X(Leu) )

where X(Ala), X(Val), X(Ile), and X(Leu) are mole percent (100 X mole fraction)

of alanine, valine, isoleucine, and leucine.

The coefficients a and b are the relative volume of valine side chain (a = 2.9)

and of Leu/Ile side chains (b = 3.9) to the side chain of alanine.

GRAVY (Grand Average of Hydropathy)
The GRAVY value for a peptide or protein is calculated as the sum of hydropathy values of all the amino acids, divided by the number of residues in the sequence.

Case study
The following text is extracted from the journal article to show as case study example: Kaur, G., & Pati, P. K. (2018). In silico physicochemical characterization and topology analysis of respiratory burst oxidase homolog (Rboh) proteins from Arabidopsis and rice. Bioinformation, 14(3), 93.

Aim of the study:
In this study, a total of 19 Rboh proteins (10 from Arabidopsis thaliana and 9 from Oryza sativa Japonica) were analyzed. They employed in silico approaches to compute the physiochemical properties (molecular weight, isoelectric point, total number of negatively and positively charged residues, extinction coefficient, half-life, instability and aliphatic index, grand average of hydropathicity, amino acid percentage).

Methods
1. Go to expasy: https://web.expasy.org/protparam/
Input each uniprot in expasy and click compute

expasy

protparam

Results

phyisochemical

Discussion:
It was observed that the calculated pI was > 7 for 19 Rbohs which indicates their basic nature. The basic nature and large size of these transmembrane proteins is consistent with the previous report inferring membrane proteins as heavier and more basic than non-membrane proteins in bacteria, archaea and eukaryotes.

Also, for the purification of a protein by isoelectric focusing methods, the pI value will be useful for developing buffer system. In addition to pI, the instability index (II) provides an estimation of the stability of the protein in vitro and in vivo. A protein whose instability index is <40 indicates stable and the value >40 infers unstable protein The lowest
instability index observed for AtRbohD indicated its stability.Another measure for stability of proteins is the aliphatic index (AI) and increase in its value is reported to enhance the thermo stability of globular proteins [25]. AI refers to the relative volume occupied by aliphatic side chain of the following amino acids: alanine (A), isoleucine (I), leucine (L) and valine (V). The lowest AI of OsRbohF is indicative of its low thermal stability and hence of more flexible structure when compared to other Rbohs.In addition, extinction coefficient of Rbohs was also computed at 280 nm. The calculated
ECs of Rbohs indicated the presence of high concentration of tyrosine (Y) and tryptophan (W), and not of cysteine (C) because it was observed in very low amount in all Rbohs. This indicated that UV spectral methods couldn’t be employed to analyze Rbohs. However, the obtained EC values will aid in the study of protein-protein and protein-ligand interactions.

GRAVY score denotes the sum of hydropathy values of all amino acids in the protein, divided by the number of residues in the protein. It lies in the range from -2 to +2 where positive value represents hydrophobic and negative indicates hydrophilic protein [27]. It is also an indicator of whether a protein would be observed on 2-D gels, as proteins having GRAVY scores >0.4 does not lie in solubility range and hence are difficult to detect [28]. In case of Rbohs, GRAVY score exhibited a very narrow range (-0.087 to -0.286) with less negative value indicating a low hydrophobic nature and hence good solubility.

References – from expasy documentation for further reading
[1] Pace, C.N., Vajdos, F., Fee, L., Grimsley, G., and Gray, T. (1995) How to measure and predict the molar absorption coefficient of a protein. Protein Sci. 11, 2411-2423. [PubMed: 8563639]

[2] Edelhoch, H. (1967) Spectroscopic determination of tryptophan and tyrosine in proteins. Biochemistry 6, 1948-1954. [PubMed: 6049437]

[3] Gill, S.C. and von Hippel, P.H. (1989) Calculation of protein extinction coefficients from amino acid sequence data. Anal. Biochem. 182:319-326(1989). [PubMed: 2610349]

[4] Bachmair, A., Finley, D. and Varshavsky, A. (1986) In vivo half-life of a protein is a function of its amino-terminal residue. Science 234, 179-186. [PubMed: 3018930]

[5] Gonda, D.K., Bachmair, A., Wunning, I., Tobias, J.W., Lane, W.S. and Varshavsky, A. J. (1989) Universality and structure of the N-end rule. J. Biol. Chem. 264, 16700-16712. [PubMed: 2506181]

[6] Tobias, J.W., Shrader, T.E., Rocap, G. and Varshavsky, A. (1991) The N-end rule in bacteria. Science 254, 1374-1377. [PubMed: 1962196]

[7] Ciechanover, A. and Schwartz, A.L. (1989) How are substrates recognized by the ubiquitin-mediated proteolytic system? Trends Biochem. Sci. 14, 483-488. [PubMed: 2696178]

[8] Varshavsky, A. (1997) The N-end rule pathway of protein degradation. Genes Cells 2, 13-28. [PubMed: 9112437]

[9] Guruprasad, K., Reddy, B.V.B. and Pandit, M.W. (1990) Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence. Protein Eng. 4,155-161. [PubMed: 2075190]

[10] Ikai, A.J. (1980) Thermostability and aliphatic index of globular proteins. J. Biochem. 88, 1895-1898. [PubMed: 7462208]

[9] Kyte, J. and Doolittle, R.F. (1982) A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105-132. [PubMed: 7108955]

Shares