_images / normal3730.gif

Sequencing through Sanger basics

December 15, 2024 Off By admin
Shares

Sequencing through Sanger

Sanger sequencing relies on DNA polymerization and the incorporation of dideoxynucleotides, which act as reaction terminators. Modern sequencing methods are based on a modified PCR process that uses fluorophore-labeled dideoxynucleotides, with the results analyzed through capillary electrophoresis. The most used system is the one developed by Applied Biosystems .

_images / sequencing_sanger.png

Chromatogram.

_images / seq_error.jpg

Chromatograms

The information obtained in the automatic sequencers is saved in binary files.

These files may include, in addition to the processed chromatogram (trace), the raw data read by the auto- matic sequencer, the nucleotide sequence and the qualities.

So far the market for automatic sequencing based on the Sanger method is mainly controlled by Applied Biosystems.

Applied sequencers generate the data in abi format.

These abi files can be read with different programs: jokes (Win), the Sequence Scanner (Win) or the trev of the analysis package Staden (Mac, Pc, Linux).

Another format in which chromatograms are saved is scf. scf is a free format not controlled by a company.

scf only includes the processed chromatogram and the sequence.

In the NCBI some of the chromatograms obtained in the sequencing projects are deposited.

Basecalling

From the chromatograms, the nucleotide sequence must be obtained.

This process is done automatically by the programs that read the chromatograms, but it should be re- viewed manually because in many occasions failures occur when assigning the bases.

The sequence proposed by the automatic sequencer software almost always has a large part at the end that needs to be removed.

Quality

All sequencing systems estimate the probability that each of the sequenced nucleotides is wrong, this pa- rameter is often called quality. This error estimate is specific to each technology and is calculated by the

team software. To facilitate analysis and interpretation of results, these values are often changed on a stan-

dard scale used by all sequencing technologies, the scale of Phred. Phred was originally a basecalling pro- gram, but is now mostly known as the quality value scale. This is defined as:

Phred score = – 10 log (prob error).

In this way it is easy to interpret the probability of error according to its value.

Phred scoreRead error probability on that basis
101/10
twenty1/100
301/1000
401/10000

Interpretation of sequence chromatograms

Examples obtained from the Web from the University of Michigan sequencing service.

Noise

In theory a chromatogram should always be perfect, but it is not always so.

_images / no_noise.gif

A good chromatogram.

_images / some_noise.gif

Chromatogram with some noise.

_images / bad_noise.gif

Chromatogram with a lot of noise.

Possible causes of background noise are: contamination of another DNA or contamination with another primer (usually the reverse primer used in PCR).

Basecalling errors

Sometimes the automatic sequencer is not able to place the peaks corresponding to the different bases at an equidistant distance. For example this happens frequently in the GA dinucleotide.

_images / GA_space.gif

Well interpreted poorly spaced.

_images / gap_n.gif

Bad spacing that introduces an extra N.

_images / bgnd_g.gif

Bad spacing and background introducing an extra base.

Heterozygot

Basecalling programs often interpret heterozygotes as N.

There are specialized programs to detect these double peaks and to label them properly.

_images / het_n.gif

C / T heterozygous.

Loss of resolution

Even good sequences lose resolution as the sequence progresses, due to chromatography. This is one of the reasons that make the readings no more than 700-800 bp.

_images / normal3730.gif

Good resolution.

_images / fair3730.gif

Acceptable resolution.

_images / late3730.gif

Bad resolution.

Problems during the sequencing reaction

Sometimes there may be problems in the sequencing reaction that prevent a good sequence. You need to diagnose the problem to fix it.

Possible causes for sequence problems. No signal:

There was no mold DNA. There was no primer.

The primer has not recognized the mold.

The signal is very weak:

There was little mold DNA.

The primer has not recognized the mold well.

There is a sign, but the resolution is bad from the beginning:

_images/okres.gif

There may be a contaminant in the sample that affects chromatography.

The signal and resolution are good, but there are several bands in each position:

_images / mixed.gif

Various molds in reaction:

The primer joins in various positions of the mold. Various products are being sequenced.

The primers of the original PCR have not been removed.

Gradual loss of signal in large sizes:

_images / skislope.gif

The signal is good at first, but it decreases rapidly:

Excess salts in the mold. Excess DNA mold.

Contaminant that inhibits polymerase.

A large peak breaks the sequence at a specific point.

_images/blob.gif

poorly removed ddNTPs from the sequencing reaction.

The signal is good to a specific point and then abruptly decreases:

_images/secstruct.gif

Secondary structure in the DNA mold.

Chromatogram display ¶

Download the chromatogram chrome file

Open the chromatograms that we had downloaded with the trev one by one and check their quality.

The trev should be asked to mark the quality of the allocation of each base (view – > display confidence).

Mark poor quality regions at the beginning and end of sequences (edit – > left quality, edit – > right quality).

Evaluate for each of them:

Do you have a signal? Do you see bands or is it all noise?

Where does the good quality region start? Where does it end? Does the poor quality region end abruptly?

Possible diagnosis of problems.

Are there bases misinterpreted by the baseball team? Is there a lot of background noise?

Save the obtained sequence in plain text.

 

Shares