Sequencing through Sanger basics

December 15, 2024 Off By admin

Table of Contents

Sequencing through Sanger

Sanger sequencing relies on DNA polymerization and the incorporation of dideoxynucleotides, which act as reaction terminators. Modern sequencing methods are based on a modified PCR process that uses fluorophore-labeled dideoxynucleotides, with the results analyzed through capillary electrophoresis. The most used system is the one developed by Applied Biosystems .

Chromatogram.

Chromatograms

The information obtained in the automatic sequencers is saved in binary files.

These files may include, in addition to the processed chromatogram (trace), the raw data read by the auto- matic sequencer, the nucleotide sequence and the qualities.

So far the market for automatic sequencing based on the Sanger method is mainly controlled by Applied Biosystems.

Applied sequencers generate the data in abi format.

These abi files can be read with different programs: jokes (Win), the Sequence Scanner (Win) or the trev of the analysis package Staden (Mac, Pc, Linux).

Another format in which chromatograms are saved is scf. scf is a free format not controlled by a company.

scf only includes the processed chromatogram and the sequence.

In the NCBI some of the chromatograms obtained in the sequencing projects are deposited.

Basecalling

From the chromatograms, the nucleotide sequence must be obtained.

This process is done automatically by the programs that read the chromatograms, but it should be re- viewed manually because in many occasions failures occur when assigning the bases.

The sequence proposed by the automatic sequencer software almost always has a large part at the end that needs to be removed.

Quality

All sequencing systems estimate the probability that each of the sequenced nucleotides is wrong, this pa- rameter is often called quality. This error estimate is specific to each technology and is calculated by the

team software. To facilitate analysis and interpretation of results, these values are often changed on a stan-

dard scale used by all sequencing technologies, the scale of Phred. Phred was originally a basecalling pro- gram, but is now mostly known as the quality value scale. This is defined as:

Phred score = – 10 log (prob error).

In this way it is easy to interpret the probability of error according to its value.

Phred score	Read error probability on that basis
10	1/10
twenty	1/100
30	1/1000
40	1/10000

Interpretation of sequence chromatograms

Examples obtained from the Web from the University of Michigan sequencing service.

Noise

In theory a chromatogram should always be perfect, but it is not always so.

A good chromatogram.

Chromatogram with some noise.

Chromatogram with a lot of noise.

Possible causes of background noise are: contamination of another DNA or contamination with another primer (usually the reverse primer used in PCR).

Basecalling errors

Sometimes the automatic sequencer is not able to place the peaks corresponding to the different bases at an equidistant distance. For example this happens frequently in the GA dinucleotide.

Well interpreted poorly spaced.

Bad spacing that introduces an extra N.

Bad spacing and background introducing an extra base.

Heterozygot

Basecalling programs often interpret heterozygotes as N.

There are specialized programs to detect these double peaks and to label them properly.

C / T heterozygous.

Loss of resolution

Even good sequences lose resolution as the sequence progresses, due to chromatography. This is one of the reasons that make the readings no more than 700-800 bp.

Good resolution.