Sequencing through Sanger basics
December 15, 2024Table of Contents
Sequencing through Sanger
Sanger sequencing relies on DNA polymerization and the incorporation of dideoxynucleotides, which act as reaction terminators. Modern sequencing methods are based on a modified PCR process that uses fluorophore-labeled dideoxynucleotides, with the results analyzed through capillary electrophoresis. The most used system is the one developed by Applied Biosystems .
Chromatogram.
Chromatograms
The information obtained in the automatic sequencers is saved in binary files.
These files may include, in addition to the processed chromatogram (trace), the raw data read by the auto- matic sequencer, the nucleotide sequence and the qualities.
So far the market for automatic sequencing based on the Sanger method is mainly controlled by Applied Biosystems.
Applied sequencers generate the data in abi format.
These abi files can be read with different programs: jokes (Win), the Sequence Scanner (Win) or the trev of the analysis package Staden (Mac, Pc, Linux).
Another format in which chromatograms are saved is scf. scf is a free format not controlled by a company.
scf only includes the processed chromatogram and the sequence.
In the NCBI some of the chromatograms obtained in the sequencing projects are deposited.
Basecalling
From the chromatograms, the nucleotide sequence must be obtained.
This process is done automatically by the programs that read the chromatograms, but it should be re- viewed manually because in many occasions failures occur when assigning the bases.
The sequence proposed by the automatic sequencer software almost always has a large part at the end that needs to be removed.
Quality
All sequencing systems estimate the probability that each of the sequenced nucleotides is wrong, this pa- rameter is often called quality. This error estimate is specific to each technology and is calculated by the
team software. To facilitate analysis and interpretation of results, these values are often changed on a stan-
dard scale used by all sequencing technologies, the scale of Phred. Phred was originally a basecalling pro- gram, but is now mostly known as the quality value scale. This is defined as:
Phred score = – 10 log (prob error).
In this way it is easy to interpret the probability of error according to its value.
Phred score | Read error probability on that basis |
10 | 1/10 |
twenty | 1/100 |
30 | 1/1000 |
40 | 1/10000 |
Interpretation of sequence chromatograms
Examples obtained from the Web from the University of Michigan sequencing service.
Noise
In theory a chromatogram should always be perfect, but it is not always so.
A good chromatogram.
Chromatogram with some noise.
Chromatogram with a lot of noise.
Possible causes of background noise are: contamination of another DNA or contamination with another primer (usually the reverse primer used in PCR).
Basecalling errors
Sometimes the automatic sequencer is not able to place the peaks corresponding to the different bases at an equidistant distance. For example this happens frequently in the GA dinucleotide.
Well interpreted poorly spaced.
Bad spacing that introduces an extra N.
Bad spacing and background introducing an extra base.
Heterozygot
Basecalling programs often interpret heterozygotes as N.
There are specialized programs to detect these double peaks and to label them properly.
C / T heterozygous.
Loss of resolution
Even good sequences lose resolution as the sequence progresses, due to chromatography. This is one of the reasons that make the readings no more than 700-800 bp.
Good resolution.
Acceptable resolution.
Bad resolution.
Problems during the sequencing reaction
Sometimes there may be problems in the sequencing reaction that prevent a good sequence. You need to diagnose the problem to fix it.
Possible causes for sequence problems. No signal:
There was no mold DNA. There was no primer.
The primer has not recognized the mold.
The signal is very weak:
There was little mold DNA.
The primer has not recognized the mold well.
There is a sign, but the resolution is bad from the beginning:
There may be a contaminant in the sample that affects chromatography.
The signal and resolution are good, but there are several bands in each position:
Various molds in reaction:
The primer joins in various positions of the mold. Various products are being sequenced.
The primers of the original PCR have not been removed.
Gradual loss of signal in large sizes:
The signal is good at first, but it decreases rapidly:
Excess salts in the mold. Excess DNA mold.
Contaminant that inhibits polymerase.
A large peak breaks the sequence at a specific point.
poorly removed ddNTPs from the sequencing reaction.
The signal is good to a specific point and then abruptly decreases:
Secondary structure in the DNA mold.
Chromatogram display ¶
Download the chromatogram chrome file
Open the chromatograms that we had downloaded with the trev one by one and check their quality.
The trev should be asked to mark the quality of the allocation of each base (view – > display confidence).
Mark poor quality regions at the beginning and end of sequences (edit – > left quality, edit – > right quality).
Evaluate for each of them:
Do you have a signal? Do you see bands or is it all noise?
Where does the good quality region start? Where does it end? Does the poor quality region end abruptly?
Possible diagnosis of problems.
Are there bases misinterpreted by the baseball team? Is there a lot of background noise?
Save the obtained sequence in plain text.