aminoacidproperties

Amino acid and Pseudo-amino acid composition

July 11, 2019 Off By admin
Shares

Amino acids play a central role in the building block of protein. The primary structure of the protein is determined by the arrangement of 20 naturally occurring amino acids. Amino acids contain the basic amino groups (-NH2) and carboxyl groups (-COOH). The ingredients present in proteins are of amino acids. Both peptides and proteins are the long chains of amino acids. Altogether, there are twenty amino acids, which are involved in the construction of proteins. The function of a protein is determined from their amino acids

aminoacid-bioinformatics

Amino acid structure
All amino acids have a carboxyl terminus (called the C-terminus) and an amino terminus (called the N-terminus), but they differ in their residual groups. Amino acids are bonded together by a covalent linkage called a peptide bond. Amino acids contain both a carboxyl group (COOH) and an amino group (NH2). The core amino acid structure is:

amino acid structure

Where (R) is the side chain unique to each different amino acid. Large amino acids form the rigid region of the polypeptide backbone while the small amino acids form the flexible regions of the polypeptide allowing the protein to fold into its three-dimensional shape. On the peptide backbone there is flexible rotation around the peptide bond and there is a rigid planar peptide which is caused by a partial double bond. This is what allows the polypeptides primary sequence to fold to an alpha helix which is one strand coiled. A beta strand is two strands coiled to an antiparallel helix. The core of the polypeptide is made up of the hydrophobic amino acids like phenyalanine, tyrosine, and tryptophan. These three amino acids are also aromatic and are the largest amino acids. The other hydrophobic amino acids, but are not aromatic, are: proline, valine, isoleucine, leucine and methionine.A series of amino acids joined by peptide bonds form a polypeptide chain, and each amino acid unit in a peptide is called a residue.

Grouping of amino acid
The common amino acids are grouped according to their side chains. For example, acidic, basic, uncharged polar, and non-polar.

For basic side chains, the amino acids are: Lysine (K), Arginine (R) and Histidine (H).

For acidic side chains, the amino acids are: Aspartic acid (D) and Glutamic acid (E) (formed by the addition of a proton to the amino acids aspartate and glutamate).

For uncharged polar side chains, the amino acids are: Asparagine (N), Glutamine (Q), Serine (S), Threonine (T) and Tyrosine (Y).

For non-polar side chains, the amino acids are: Alanine (A), Valine (V), Leucine (L), Isoleucine (I), Proline (P), Phenylalanine (F), Methionine (M), Tryptophan (W), Glycine (G) and Cysteine (C).

aminoacidproperties

Amino acid composition
The amino acid composition is the number of amino acids of each type normalized with the total number of residues. It is defined as

Comp(i)=∑ni*100/N

where i stands for the 20 amino acid residues; ni is the number of residues of each type, and N is the total number of residues. The summation is through all the residues in the considered protein.

Why amino acid composition?
With the avalanche of protein sequences generated in the post-genomic age, it is highly desired to develop automated methods for efficiently identifying various attributes.The most straightforward model in this regard is its entire amino acid sequence in developing algorithm.

amionacidcompo

An example of amino acid composition analyses -Amino acid compositional difference between globular and -barrel membrane proteins.

Pseudo amino acid composition
Sometime using of amino acid compostion, the entire sequence model would fail to work when the query protein did not have significant homology to proteins of known characteristics. Thus, various non-sequential models or discrete models were proposed. The simplest discrete model is the amino acid (AA) composition. Using it to represent a protein, however, all the sequence-order information would be completely lost. To cope with such a dilemma, the concept of pseudo amino acid (PseAA) composition was introduced. Its essence is to keep using a discrete model to represent a protein yet without completely losing its sequence-order information. Therefore, in a broad sense, the PseAA composition of a protein is actually a set of discrete numbers that is derived from its amino acid sequence and that is different from the classical AA composition and able to harbour some sort of sequence order or pattern information.

To avoid completely losing the sequence-order information, the concept of PseAAC (pseudo amino acid composition) was proposed.In contrast with the conventional amino acid composition (AAC) that contains 20 components with each reflecting the occurrence frequency for one of the 20 native amino acids in a protein, the PseAAC contains a set of greater than 20 discrete factors, where the first 20 represent the components of its conventional amino acid composition while the additional factors incorporate some sequence-order information via various pseudo components.

The additional factors are a series of rank-different correlation factors along a protein chain, but they can also be any combinations of other factors so long as they can reflect some sorts of sequence-order effects one way or the other. Therefore, the essence of PseAAC is that on one hand it covers the AA composition, but on the other hand it contains the information beyond the AA composition and hence can better reflect the feature of a protein sequence through a discrete model.

Ever since the concept of PseAAC was introduced, it has been widely used to study various problems in proteins and protein-related systems, such as protein structural class, protein secondary structure content, protein quaternary structure, protein homo-oligomer types , classification of amino acids, protein subcellular localization , protein subnuclear localization, G-protein-coupled receptor (GPCR) type classification, protein submitochondria localization, conotoxin superfamily classification, membrane protein type, transmembrane protein region, apoptosis protein subcellular localization, enzyme functional classification, cell wall lytic enzyme, protein fold pattern, cofactors of oxidoreductases, lipase type, protein-protein interactions, DNA-binding proteins, signal peptide, and other protein-related systems.

How to create amino acid and pseudo amino composition

Tool to create amino acid composition
https://webs.iiitd.edu.in/raghava/copid/nterm_std_search.html

Tool to create pseudo amino acid composition
http://www.csbio.sjtu.edu.cn/bioinf/PseAAC/

References
1.Berg, J. M., Tymoczko, J. L., & Stryer, L. (2012). Biochemistry
2.Heim, M., Römer, L., & Scheibel, T. (2010). Hierarchical structures made of proteins. The complex architecture of spider webs and their constituent silk proteins. Chemical Society Reviews, 39(1), 156-164.
3.Chou, K. C. (2001). Prediction of protein cellular attributes using pseudo‐amino acid composition. Proteins: Structure, Function, and Bioinformatics, 43(3), 246-255.
4.Chou, K. C. (2011). Some remarks on protein attribute prediction and pseudo amino acid composition. Journal of theoretical biology, 273(1), 236-247.
5.Chou, K. C. (2009). Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology. Current Proteomics, 6(4), 262-274.
6.Shen, H. B., & Chou, K. C. (2008). PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. Analytical biochemistry, 373(2), 386-388.

Shares