Converting DNA to Protein Sequence
August 10, 2021
When we have known DNA sequence, we can use the genetic code to convert it to the corresponding protein sequence. This is the same mechanism by which the cell generates protein sequences. This is referred to as the DNA to protein translation process.
Table of Contents
How do you convert DNA to protein sequence?
What is the ‘Central Dogma’?
Prior to that, you must grasp the concept of ‘Central Dogma.’ The ‘Central Dogma’ refers to the process by which DNA instructions are transformed into functional product.
The central dogma of molecular biology explains the flow of genetic information, from DNA to RNA, to make a functional product, a protein. The central dogma suggests that DNA contains the information needed to make all of our proteins, and that RNA is a messenger that carries this information to the ribosomes.The ribosomes serve as factories in the cell where the information is ‘translated’ from a code into the functional product. The nucleotide sequence encoding a polypeptide from its start codon to its stop codon, is referred as CoDing Sequence(CDS) pending on the biological system (eukaryote or prokaryote) translation mechanistic is slightly different. This needs to be considered when designing gene expression vectors.The process by which the DNA instructions are converted into the functional product is called gene expression.
Transcription and Translation
Gene expression has two key stages – transcription and translation.
In transcription, the information in the DNA of every cell is converted into small, portable RNA messages.
During translation, these messages travel from where the DNA is in the cell nucleus to the ribosomes where they are ‘read’ to make specific proteins.
The information that originally was in the genome, enshrined in DNA, then gets transcribed into messenger RNA. And then that information is translated from the messenger RNA to a protein. So we’re taking the same information, but it’s going from one form to another; a nucleic acid code to an amino acid code in a protein.
Comparing Eukaryotic and Prokaryotic Translation
By definition prokaryotes do not possess a subcellular compartment isolating the chromosomic DNA from the cytosol. Therefore, DNA transcription to mRNA and mRNA translation to peptide chains can occur simultaneously. Indeed, protein translation is active on mRNA that are still being transcribed from the genomic DNA.
Another specificity of the eukaryote genes is that most CDS are grouped in polycistronic operons. This mean several CDS are transcribed in a single mRNA molecule, each of them being preceded by a ribosome binding site (RBS), which is the sequence directly upstream the start codon.
The translation process is very similar in prokaryotes and eukaryotes. Although different elongation, initiation, and termination factors are used, the genetic code is generally identical. As previously noted, in bacteria, transcription and translation take place simultaneously, and mRNAs are relatively short-lived. In eukaryotes, however, mRNAs have highly variable half-lives, are subject to modifications, and must exit the nucleus to be translated; these multiple steps offer additional opportunities to regulate levels of protein production, and thereby fine-tune gene expression.
Genetic code
The genetic code is a set of rules defining how the four-letter code of DNA (A, T,G,C) is translated into the 20-letter code of amino acids, which are the building blocks of proteins. The genetic code is a set of three-letter combinations of nucleotides called codons, each of which corresponds to a specific amino acid or stop signal.
There are 64 possible permutations, or combinations, of three-letter nucleotide sequences that can be made from the four nucleotides. Of these 64 codons, 61 represent amino acids, and three are stop signals. Although each codon is specific for only one amino acid (or one stop signal), the genetic code is described as degenerate, or redundant, because a single amino acid may be coded for by more than one codon. It is also important to note that the genetic code does not overlap, meaning that each nucleotide is part of only one codon-a single nucleotide cannot be part of two adjacent codons. Furthermore, the genetic code is nearly universal, with only rare variations reported. For instance, mitochondria have an alternative genetic code with slight variations.
The sequence of the bases, A, C, G and T, in DNA determines our unique genetic code and provides the instructions for producing molecules in the body.The cell reads the DNA code in groups of three bases. Each triplet of bases, also called a codon, specifies which amino acid will be added next during protein synthesis.There are 20 different amino acids, which are the building blocks of proteins. Different proteins are made up of different combinations of amino acids. This gives them their own unique 3D structure and function in the body.Only 61 of the 64 codons are used to specify which of the 20 amino acids is next to be added.There are three codons that don’t code for an amino acid. These codons mark the end of the protein and stop the addition of amino acids to the end of the protein chain.
The instructions in a gene that tell the cell how to make a specific protein. A, C, G, and T are the “letters” of the DNA code; they stand for the chemicals adenine (A), cytosine (C), guanine (G), and thymine (T), respectively, that make up the nucleotide bases of DNA. Each gene’s code combines the four chemicals in various ways to spell out three-letter “words” that specify which amino acid is needed at every step in making a protein. In the genetic code, each three nucleotides in a row count as a triplet and code for a single amino acid. So each sequence of three codes for an amino acid. And proteins are made up of sometimes hundreds of amino acids. So the code that would make one protein could have hundreds, sometimes even thousands, of triplets contained in it.
START and STOP Codons
Translation begins with a START codon. AUG is the most common start codon, which in eukaryotes, codes for methionine and in prokaryotes, codes for formyl methionine.
STOP codons signal the end of the polypeptide chain during protein synthesis. Also called nonsense or termination codons the STOP codons are UAG, UGA, and UAA and are given the names amber, opal and ochre, respectively. STOP codons trigger the ribosome to release the new polypeptide chain, since no tRNA anticodons complement these stop codons.
Translation and Open Reading Frame Search
Regions of DNA that encode proteins are first transcribed into messenger RNA and then translated into protein. By examining the DNA sequence alone we can determine the sequence of amino acids that will appear in the final protein. In translation codons of three nucleotides determine which amino acid will be added next in the growing protein chain. It is important then to decide which nucleotide to start translation, and when to stop, this is called an open reading frame.
Once a gene has been sequenced it is important to determine the correct open reading frame (ORF). Every region of DNA has six possible reading frames, three in each direction. The reading frame that is used determines which amino acids will be encoded by a gene. Typically only one reading frame is used in translating a gene (in eukaryotes), and this is often the longest open reading frame. One common use of open reading frames (ORFs) is as one piece of evidence to assist in gene prediction. Long ORFs are often used, along with other evidence, to initially identify candidate protein-coding regions or functional RNA-coding regions in a DNA sequence. Once the open reading frame is known the DNA sequence can be translated into its corresponding amino acid sequence.
Six-frame translation
Since DNA is interpreted in groups of three nucleotides (codons), a DNA strand has three distinct reading frames. The double helix of a DNA molecule has two anti-parallel strands; with the two strands having three reading frames each, there are six possible frame translations.
For example, the following sequence of DNA can be read in six reading frames. Three in the forward and three in the reverse direction. The three reading frames in the forward direction are shown with the translated amino acids below each DNA seqeunce. Frame 1 starts with the “a”, Frame 2 with the “t” and Frame 3 with the “g”. Stop codons are indicated by an “*” in the protein sequence. The longest ORF is in Frame 1.
Webtools for DNA to protein translation
TRANSLATION: DNA -> PROTEIN
SITES: A number of excellent sites exist all of which permit translation in all six reading frames. I would recommend “ORF Finder” because of its visuals and Pipeline or GeneMark if you are seriously interested in identifying genes within your sequence. The latter two programs permit the analysis of long sequences (submit by attachment not in the box).
Frameshift errors:
AMIGene
path :: protein back-translation and alignment – addresses the problem of finding distant protein homologies where the divergence is the result of frameshift mutations and substitutions. Given two input protein sequences, the method implicitly aligns all the possible pairs of DNA sequences that encode them, by manipulating memory-efficient graph representations of the complete set of putative DNA sequences for each protein. (Reference: Gîrdea M et al. 2010. Algorithms for Molecular Biology 5🙂
Simple translation tools – DNA to protein sequences:
Open Reading Frame Finder (NCBI) – searches for open reading frames (ORFs) in the DNA sequence you enter. The program returns the range of each ORF, along with its protein translation. Use ORF finder to search newly sequenced DNA for potential protein encoding segments, verify predicted protein using newly developed SMART BLAST or regular BLASTP.
Six-frame Translations can be done at Tuebingen, Russia, Bioline, and Science Launcher.
EMBOSS Sixpack (EMBL-EBI) – reads a DNA sequence and outputs the three forward and (optionally) three reverse translations in a visual manner. Alternatively use EMBOSS Transeq
MBS Translator (JustBio Tools) – An excellent new site since one can translate specifically from ATG and the results are presented with the nucleotide sequence overlaying the amino acid sequence. Ideal for Cut/Paste into a manuscript. You need to register to use this free tool. Other quick translation tools are here and here.
Translator (fr33.net, France) or DNA to protein translation
Translate (ExPASy, Switzerland) – is a tool which allows the translation of a nucleotide (DNA/RNA) sequence to a protein sequence.
Transcription and Translation Tool (Attotron Biosensor Corporation)
DNA to protein translation (University of the Basque Country, Spain) and here.
Translation of multiple sequences:
Virtual Ribosome (Reference: R. Wernersson. 2006. Nucl. Acids Res. 34 (web Server Issue): W385-388) – I find that the output from the first two sites is optimal for translating multiple DNA sequences.
RevTrans 1.4 Server (CBS, Danish Technical University)
TranslatorX – is a web server designed to align protein-coding nucleotide sequences based on their corresponding amino acid translations. TranslatorX novelties include: (i) use of all documented genetic codes and the possibility of assigning different genetic codes for each sequence; (ii) a battery of different multiple alignment programs; (iii) translation of ambiguous codons when possible; (iv) an innovative criterion to clean nucleotide alignments with GBlocks based on protein information; and (v) a rich output, including Jalview-powered graphical visualization of the alignments, codon-based alignments coloured according to the corresponding amino acids, measures of compositional bias and first, second and third codon position specific alignments. (Reference: Abascal F, et al. (2010) Nucleic Acids Res. 38: W7-13).
Backtranslation i.e. taking a protein sequence and defining it as DNA sequence:
Back Translation– part of the The Sequence Manipulation Suite; limited choice of codon usage (E.coli and H. sapiens)
Protein to DNA reverse translation – includes a wide range of genetic codes
Reverse translation of aminoacid sequences – probably the best in that it includes the genetic codes of seven organisms (E.coli, and 6 eukaryotes); plus provides consensus and detail output of results in RNA or DNA.
Identification of open-reading frames:
StarORF – facilitates the identification of the protein(s) encoded within a DNA sequence. Using StarORF, the DNA sequence is first transcribed into RNA and then translated into all the potential ORFs (Open Reading Frame) encoded within each of the six translation frames (3 in the forward direction and 3 in the reverse direction). This allows students to identify the translation frame that results in the longest protein coding sequence.
TICO – TranslationInitiation site COrrection – provides an interface for direct post processing of thepredictions obtained from GLIMMER to improve the accuracy of annotated Translation Initiation Sites (TIS). (Reference: M. Tech et al. 2005. Bioinformatics 21: 3568-3569)
GeneMark Homepage (M. Borodovsky, Georgia Institute of Technology Atlanta, U.S.A.) offers a family of programs for ORF analysis. This site links one to a growing number of programs for modeling phage, bacterial, and eukaryotic data. Extensive control is possible with the data output, i.e. one can request the nucleotide and protein sequence of the ORFs. Two programs to consider are GeneMarkS (Reference: Besemer J et al. 2001. Nucleic Acids Research; 29:2607-2618) or GeneMarkS-2 and Heuristic Approach for Gene Prediction (Reference: Besemer J & Borodovsky M. 1999. Nucleic Acids Research; 27:911 3920). For metagenomic analysis use MetaGeneMark (Reference: Zhu, W. et al. 2010. Nucleic Acids Research; 38: e132).
EasyGene (Technical University of Denmark; Reference:T.S. Larsen and A. Krogh. 2003. EasyGene – a prokaryotic gene finder that ranks ORFs by statistical significance. BMC Bioinformatics 4:21) – produces a list of predicted genes given a sequence of prokaryotic DNA. Each prediction is attributed with a significance score (R-value) indicating how likely it is to be just a non-coding open reading frame rather than a real gene. The user needs only to specify the organism hosting the query sequence. It you are interested in the analysis of existing bacterial genomes consult EasyGene 1.2.
AMIGene – (Reference: Bocs, S. et al. 2003. Nucl. Acids Res. 13: 3723-3726)FgenesB (SoftBerry) -fast Pattern/Markov chain-based bacterial operon and gene prediction. Somewhat limited range of model bacteria & archaea. Accuracy.
ZCURVE is an ab initio program for gene finding in bacterial or archaeal genomes and its latest version is 3.0. Based on cross validations of 422 prokaryotic genomes, ZCURVE 3.0 has slightly higher accuracy than Glimmer 3.02. (Reference: Hua, Z-G. 2015. Nucl. Acids Res.).
FramePlot 2.3.2 (National Institute of Health, Japan) – This site permits one to select the minimal size of the ORF, and the start codon (ATG or GTG being the most common). While in presentation (a series of coloured arrows is somewhat confusing by clicking on any arrow one can view the DNA and protein sequence. These can be used in homology (BLASTN & BLASTP) searches. (Reference: Ishikawa,J. & Hotta K. 1999. FEMS Microbiol. Lett. 174 :251-253).
ExPASy – Translate tool (ExPASy, University of Geneva, Switzerland). I find this site useful if I have a gene which begins with an alternative start codon. An alternative site is Translate Nucleic Acid Sequence Tool (University of Massachusetts Medical School, U.S.A.) which permits choice of reading frame(s) and genetic code.
Third Position GC Skew Display (The Institute for Genomic Research, U.S.A.) predicts genes by comparing possible open reading frames (variety of initiation codon options) to a third position GC plot. This tool is apparently most effective for genomes with a high G+C content.
Programmed frameshifting:
FSFinder2 (Frameshift Signal Finder) – Programmed ribosomal frameshifting is involved in the expression of certain genes from a wide range of organisms such as virus, bacteria and eukaryotes including human. In programmed frameshifting, the ribosome switches to an alternative frame at a specific site in response to a special signal in a messanger RNA. Programmed frameshift plays role in viral particle morphogenesis, autogenous control, and alternative enzymatic activities. The common frameshift is a -1 frameshift, in which the ribosome shifts a single nucleotide in the upstream direction. The major elements of -1 frameshifting consist of a slippery site, where the ribosome changes reading frames, and a stimulatory RNA structure such as pseudoknot or stem-loop located a few nucleotides downstream. +1 frameshifts are much less common than -1 frameshifting but are observed in diverse organisms.
Expasy
There are many webtools are available to translate DNA to protein sequence. One of them is EXpasy. ExPASY(Expert Protein Analysis System) is a bioinformatics resources portal operated by the Swiss Institute of Bioinformatics (SIB). Expasy was the first website of the life sciences. Extensible and integrative portal for accessing many scientific resources,databases and software analysis
Exercise 1
You will learn how to:
Translate a DNA sequence into a sequence of amino acids.
Look for mutations in this sequence and determine how they cause inherited human diseases
We will use Expasy tools for translation. Clicking on it will open a new window so you can return to this window for instructions and to copy your sequence.
A)Translating a DNA sequencePoints to remember. The genetic code is a triplet code, groups of 3 letters (bases) code for amino acids. The coding sequence (CDS) of a gene generally begins with the start codon ATG.ATG encodes the amino acid methionine (MET, M). There are three stop codons (TAA, TGA, TAG) which indicate the end of proteins. Let us try this exercise by hand.
Consider the following DNA sequence:
CDS of human beta globin
atggtgcacctgactcctgaggagaagtctgccgttactgccctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcaggttgctggtggtctacccttggacccagaggttctttgagtcctttggggatctgtccactcctgatgctgttatgggcaaccctaaggtgaaggctcatggcaagaaagtgctcggtgcctttagtgatggcctggctcacctggacaacctcaagggcacctttgccacactgagtgagctgcactgtgacaagctgcacgtggatcctgagaacttcaggctcctgggcaacgtgctggtctgtgtgctggcccatcactttggcaaagaattcaccccaccagtgcaggctgcctatcagaaagtggtggctggtgtggctaatgccctggcccacaagtatcactaa
1)What is the nucleotide in position 20 in this sequence (HBB)?
Answer: a
2)Translate the first 27 bases of the sequence:atg gtg cac ctg act cct gag gag aag
Answer: M V H L T P E E K (refer codon table)
Computers are generally quicker and more accurate.
- Go to Copy onto the clipboard the sequence (only): “CDS of human beta globin”
- Go to the ExPasy translate tool at: https://web.expasy.org/translate/
- Paste the sequence in the box
- Go to “Output format:” (under the box) and choose “Includes nucleotide sequence”
Exercise 2
Translating an Unknown DNA Sequence
One of the most basic exercises in bioinformatics is determining if a nucleic acid sequence actually codes for a protein. This is complicated by the fact that we generally do not know which strand is the coding strand (i.e. whether the sequence itself or its complementary strand will be transcribed into mRNA) nor the correct reading frame (whether the sequence should be read three bases at a time starting with the first nucleotide, the second or the third. We resolve both these questions by translating both strands in all three reading frames and looking for the one that gives the longest amino acid sequence before a stop codon is encountered. Since there are 64 codons and three of these code for no amino acid i.e. are stop signals- we expect a stop codon to appear on average once every 20 amino acids if we are reading a sequence in the incorrect frame. However, things are not always that clear cut and it is possible for an out of frame translation to extend to over 100 amino acids before a stop codon is reached.
In the exercise below you will be given an unknown DNA sequence and asked to use a web tool to translate the sequence into an amino acid sequence and hopefully identify the proper reading frame. You will then save this amino acid sequence to a word processing program.
Obtaining your sequence
In the lab, this might be obtained by sequencing a clone from a cDNA library or by isolating an amplified DNA fragment from a PCR amplification. Often, when we sequence such a product we find we have an unexpected fragment of DNA which we need to analyze. Here we will provide a partial sequence below to analyse.
Nucleotide sequence
CTTCTCAGTGAGGCTCCTCAAGTTCTCCCGGGAGAAGAAAGCGGCCAAAACGCTGGGCATCGTGGTCGGCTGCTTCGTCCTCTGCTGGCTGCCTTTTTTCTTAGTCATGCCCATTGGGTCTTTCTTCCCTGATTTCAAGCCCTCTGAAACAGTTTTTAAAATAGTATTTTGGCTCGGATATCTAAACAGCTGCATCAA
Translating the Sequence
Several sites on the web perform a translation of an input sequence. Clicking on the Expasy link below will open a new window giving you access to a translation tool. Translating the DNA sequence is done by reading the nucleotide sequence three bases at a time and then looking at a table of the genetic code to arrive at an amino acid sequence. This program examines the input sequence in all six possible frames (i.e. reading the sequence from 5′ to 3′ and from 3′ to 5′ starting with nt 1, nt 2 and nt 3). What we typically look for in identifying the proper translation is the frame that gives the longest amino acid sequence before a stop codon is encountered. (Since there are 64 codons and three code for nonsense, we expect a stop codon to appear on average once every 20 amino acids if we simply read a sequence “out of frame”. However, “on average” is just that, and it is possible to have an incorrect reading frame give an extended sequence with no stop codons. The next exercise will address that problem.
- Go to Expasy tool: https://web.expasy.org/translate/
- Select the sequence, copy it and then paste it into the translate sequence window in the ExPasy link.
- Under Output format select “Compact”. This gives the amino acid sequence as one letter codes with stop codons indicated by a hyphen. (The “Verbose” output indicates start codons (ATG) in bold as Met and stop codons written out so this is an easy way to scan the outputs.
- Click on Translate Sequence
- Often only one reading frame will give you a translation with no stop codons, but this is not always the case. If you get multiple possible reading frames, one way to determine which is most likely the true frame is to use the BLAST program to determine if the sequence corresponds to any known protein sequence.
- Using the “Compact output” to get one letter sequences, copy the one letter sequence of the best reading frame (i.e. one with no stop codons) and paste it into the window below labelled “Best Guess”.
- Copy the longest amino acid sequence (i.e. no hypens) of one of the other reading frames to the window below labelled “Second Best”. If you have two reading frames without a stop codon, simply copy each to the boxes below.
Exercise 3
To determine the amino acid sequence encoded by the normal and mutant mRNAs,
one could use the genetic code and manually decode the sequences, or use programs
such as ExPASy,
Wild-type cDNA
atgtccactgcggtcctggaaaacccaggcttgggcaggaaactctctga
ctttggacaggaaacaagctatattgaagacaactgcaatcaaaatggtg
ccatatcgctgatcttctcactcaaagaagaagttggtgcattggccaaa
gtattgcgcttatttgaggagaatgatgtaaacctgacccacattgaatc
tagaccttctcgtttaaagaaagatgagtatgaatttttcacccatttgg
ataaacgtagcctgcctgctctgacaaacatcatcaagatcttgaggcat
gacattggtgccactgtccatgagctttcacgagataagaagaaagacac
agtgccctggttcccaagaaccattcaagagctggacagatttgccaatc
agattctcagctatggagcggaactggatgctgaccaccctggttttaaa
gatcctgtgtaccgtgcaagacggaagcagtttgctgacattgcctacaa
ctaccgccatgggcagcccatccctcgagtggaatacatggaggaaggaa
agaaaacatggggcacagtgttcaagactctgaagtccttgtataaaacc
catgcttgctatgagtacaatcacatttttccacttcttgaaaagtactg
tggcttccatgaagataacattccccagctggaagacgtttctcagttcc
tgcagacttgcactggtttccgcctccgacctgtggctggcctgctttcc
tctcgggatttcttgggtggcctggccttccgagtcttccactgcacaca
gtacatcagacatggatccaagcccatgtatacccccgaacctgacatctMutant cDNA
atgtccactgcggtcctggaaaacccaggcttgggcaggaaactctctga
ctttggacaggaaacaagctatattgaagacaactgcaatcaaaatggtg
ccatatcgctgatcttctcactcaaagaagaagttgatgcattggccaaa
gtattgcgcttatttgaggagaatgatgtaaacctgacccacattgaatc
tagaccttctcgtttaaagaaagatgagtatgaatttttcacccatttgg
ataaacgtagcctgcctgctctgacaaacatcatcaagatcttgaggcat
gacattggtgccactgtccatgagctttcacgagataagaagaaagacac
agtgccctggttcccaagaaccattcaagagctggacagatttgccaatc
agattctcagctatggagcggaactggatgctgaccaccctggttttaaa
gatcctgtgtaccgtgcaagacggaagcagtttgctgacattgcctacaa
ctaccgccatgggcagcccatccctcgagtggaatacatggaggaaggaa
agaaaacatggggcacagtgttcaagactctgaagtccttgtataaaacc
catgcttgctatgagtacaatcacatttttccacttcttgaaaagtactg
tggcttccatgaagataacattccccagctggaagacgtttctcagttcc
- Click on the “ExPASy” link in your gene document to be directed to the ExPASy
translation tool. The Translate Tool is used as an example here.
- Paste the normal cDNA sequence into the Translate
Tool, but don’t include the FASTA header. If the header is included, the program
will think that the first line is also sequence. - Click the Translate Sequence button.
- Selecting a Reading Frame
There are potentially 3 different reading frames (from the first M to the first stop)
for the mRNA. In this example, the first reading frame (starting with the first base)
is most likely to be real since it is the longest reading frame. (The other two reading frames have multiple stops throughout them.)
a. Select the longest reading frame (click on the header above that frame).
Note: Look over all of the reading frames generated from your sequence and
select the longest one from the first M to the first stop ( yours might not be
Reading Frame 1 as in the example).
b. To select where the reading frame should start, left-click the initial methionine (M) of the reading frame.
This will generate a virtual translation of your cDNA sequence which is stored
in the Swiss Prot database,
c. You can then analyze your virtual translation in a number of ways, but for
right now, you want to copy the translated sequence (in this example, starting with MSTA … ) and paste it into a new Word document.
d. Create a FASTA formatted file in your Word document by inserting an arrow
and description (>normal) in the first line as shown below.
normal
MSTAVLENPGLGRKLSDFGQETSYIEDNCNQNGAISLIFSLKEEVGALAKVL
RLFEENDVNLTHIESRPSRLKKDEYEFFTHLDKRSLPALTNIIKILRHDIGATV
HELSRDKKKDTVPWFPRTIQELDRFANQILSYGAELDADHPGFKDPVYRAR
RKQFADIAYNYRHGQPIPRVEYMEEGKKTWGTVFKTLKSLYKTHACYEYNH
IFPLLEKYCGFHEDNIPQLEDVSQFLQTCTGFRLRPVAGLLSSRDFLGGLAF
RVFHCTQYIRHGSKPMYTPEPDICHELLGHVPLFSDRSFAQFSQEIGLASLG
APDEYIEKLATIYWFTVEFGLCKQGDSIKAYGAGLLSSFGELQYCLSEKPKLL
PLELEKTAIQNYTVTEFQPLYYVAESFNDAKEKVRNFAATIPRPFSVRYDPYT
QRIEVLDNTQQLKILADSINSEIGILCSALQKIK
e. Repeat steps 1-4 for the mutant gene, pasting and translating the mutant
cDNA sequence (remember this is in fact the same sequence as the mutant
mRNA) with the Translate Tool (without FASTA header). Select the copy
of your sequence with the longest reading frame, and copying and pasting the
translated amino acid sequence into your Word document. Add a FASTA
header (>mutant).
Conclusion
You have now been introduced to the use of a translation program to identify the most probable reading frame and to translate an unknown sequence. What if none of the six possible reading frames gives an extended a.a. sequence. This could be due to your having errors in sequence (you need to sequence both strands to ensure an accurate sequence). Or you may have isolated a non-coding region of DNA (e.g. we know that the 5′ and 3′ ends of most genes are not coding for protein, but serve regulatory functions. There are many untranslated regions of DNA (exons, pseudogenes, etc). In that case we need to further analyze by using BLAST tool.
References
Chapeville, F., et al. On the role of soluble ribonucleic acid in coding for amino acids. Proceedings of the National Academy of Sciences 48, 1086–1092 (1962)
Crick, F. On protein synthesis. Symposia of the Society for Experimental Biology 12, 138–163 (1958)
Flinta, C., et al. Sequence determinants of N-terminal protein processing. European Journal of Biochemistry 154, 193–196 (1986)
Grunberger, D., et al. Codon recognition by enzymatically mischarged valine transfer ribonucleic acid. Science 166, 1635–1637 (1969) doi:10.1126/science.166.3913.1635
Kozak, M. Point mutations close to the AUG initiator codon affect the efficiency of translation of rat preproinsulin in vivo. Nature 308, 241–246 (1984) doi:10.1038308241a0