Sequence data evaluation filename .ab1 filename.seq

Sequence data evaluation
For each sample, the following files are generated: a filename .ab1 and a filename .seq file.
The filename.seq file is a text file output of the sequence and can be used for blast searches.
The filename .ab1 file contains annotation of the sample, the raw data trace and the analysed
electropherogram. Basecalling and analysis algorithms are applied to the raw data to create the
analysed data trace. When evaluating or trouble-shooting sequence data, it is important to look at
the raw and analysed data traces in addition to data values (signal strength and start/end points)
displayed in the annotation file.
Software available to open the filename .ab1 file and view the data include:
Sequence scanner (for Windows) – http://www.appliedbiosystems.com/sequencescanner
Ape (for Windows and Macintosh) – http://www.biology.utah.edu/jorgensen/wayned/ape
The analysed data
trace should show
sharp, evenly
spaced peaks
across the read
and a clear
baseline
Tabs allow you to
view raw,
analysed,
annotation or
sequence views of
the data
Vertical QV Bars
indicate the
Probability of Error
for each base call.
Blue = QV>20 =
Pe < 1%
Refer Fig. 2
The raw data
should show an
even distribution
of peaks across
the read and no
residual dyes
←
Fig. 1 Analysed and raw data views of a LongRead control sample opened in Sequence Scanner
The analysis
program is set to call
an “N” when QV<15
for a base call
Figure 2 Quality Value Chart indicates the probability of error for each base call
The base call start
indicates the scan
point that the read
commences at
and should be
~600 to 800.
The end point
should be ~13,000
to 14,000 or at the
end of the read
Average signal to
noise ratio
indicates labelling
efficiency and
should be 100 to
750
The number of QV
bases >=20 should
be ~950 to 1000
(less for shorter pcr
fragments)
Figure 3 Annotation of the sample and expected values for signal strength and start/end points
Average signal to
noise ratio
indicates labelling
efficiency and
should be 100 to
750
Figure 4 Printed electropherogram
Filename and plate position location
Start and end points
HiSQV indicates the number of
bases called that have QV>20
Client name and unique laboratory
number
Instrument name
Date run
commenced and
finished
Plate name
Trouble-shooting (Figure 5)
For each sample approximately 1000 bases of sequence can be read, however, the accurate read
length is largely dependent upon a variety of factors. The analysed electropherogram may display:
1. No recognisable sequence
2. Poor data and weak signal
3. Top-heavy sequence
4. Abrupt signal loss
5. Multiple sequences
6. Repeat sequence
7. Slippage after homopolymer regions
8. Delayed migration
9. Excess dye peaks
10. Pull-up peaks and very strong signal
1. No recognisable sequence
Failed reactions are characterised by:
• The absence of clearly defined peaks in the raw data trace
• The absence of base calls in the analysed electropherogram
• Very low signal-to-noise ratios S/N G:<25 A:<25 T:<25 C:<25
The cause of a failed reaction may be due to:
• Insufficient or poor quality template and/or primer
• Absence of primer annealing site or mutation in primer binding site
• Failed sequencing reaction or clean-up
Recommended actions include:
• Check template and/or primer concentrations and quality
• Check primer binding site and primer design
• Check ethanol concentrations and centrifugation speed and times
Only back ground
noise present in
the analysed trace
Absence of peaks
in the raw data
profile
Figure 6 No analysed data is present because the signal-to-noise level is below the threshold for
bases to be called
2. Poor data and weak signal
Reactions displaying low signal strength are characterised by:
• Very low peak height in the raw data trace and the presence of dye blobs
• In the analysed electropherogram, base calls fade off before the end of the read
• Very low signal-to-noise ratios S/N G:<50 A:<50 T:<50 C:<50
Low signal strength may be the result of:
• Insufficient or poor quality template and/or primer
• Poor primer design (low Tm) or mutation in primer binding site
• Inferior reagents used or poor clean-up
Recommended actions include:
• Check template and/or primer concentrations and quality
• Check primer binding site and primer design
• Check sequencing reagents and clean-up protocol
Figure 7 Poor data, weak signal and the presence of dye blobs are often a result of low template
concentration
3. Top-heavy sequence
Reactions displaying top-heavy sequence are characterised by:
• Very high peaks in the raw data trace that fade off abruptly
• In the analysed electropherogram, base calls fade off before the end of the read
• An excess of short fragments are generated that are preferentially injected into the capillary
Top-heavy sequence may be the result of:
• Too much template used in the sequencing reaction
• Too much primer used in the sequencing reaction
Recommended actions include:
• Check template concentration.
• Check primer concentration. Use 3.2 pmol
Figure 8 Sample set up with too much template. Template and primers are exhausted at the
beginning of cycle sequencing creating an excess of short fragments
4. Abrupt signal loss
Abrupt signal loss is often characterised by:
• Very high peaks in the raw data trace that stop abruptly
• In the analysed electropherogram, base calls suddenly stop before the end of the read
Abrupt signal loss may be the result of:
• Secondary structure in the template
• High GC content
• Primer dimer contamination
Recommended actions include:
• Sequence complementary strand
• Use a primer that anneals at a different position
• Incubate the reaction at 96 degrees C for 10 minutes before cycling
• Increase the extension temperature by 2 to 3 degrees C
• Increase denaturation temperature to 98 degrees C
• Add DMSO to a final concentration of 5%
• Double all reaction components and incubate at 98 degrees for 10 minutes before cycling
• Linearise the DNA with a restriction enzyme
• Shear the insert into smaller fragments (<200bp) and subclone
• Redesign primer to avoid primer dimer formation
Figure 9 Sample displays abrupt signal loss due to the presence of secondary structure
5. Multiple sequences
Reactions displaying multiple sequences are characterized by:
• Lower peaks in the raw data trace
• More than one sequence trace in the analysed data trace
• More than one sequence commencing after base 50 to 100 (MCS)
Multiple sequences may be the result of:
• Mixed plasmid preparation
• Multiple PCR products
• Frame shift mutation
• Primer-dimer contamination
• Multiple priming sites
• Multiple primers in reaction
• Primer with N-1 contamination
•
Slippage after homopolymer or repeat regions in the template
Recommended actions include:
• Re-isolate the DNA from a pure colony and re-sequence
• Check PCR template on gel for single band
• Use a different primer after the mutation or sequence the complementary strand
• Optimise PCR amplification or redesign primer
• Make sure primer only has one priming site
• Ensure only one primer has been used
Figure 10 Multiple sequences – plasmid template. In this example overlapping sequences start
after the multiple cloning region in the vector because more than one colony was purified
Figure 11 Multiple sequences – PCR template. The presence of more than one PCR template in a
reaction will result in overlapping sequences being generated
6. Repeat sequences
Reactions displaying repeat sequences are characterised by:
•
•
The gradual decrease of peak height in the raw data trace after the repeat region
In the analysed electropherogram, base calls fade off after the repeat region
Recommended actions include:
• Sequence the complementary strand
• Use a primer that anneals at a different position
Figure 12 Sample displays signal loss due to the presence of a repetitive sequence
7. Homopolymer regions
Sequence data containing homopolymer regions display:
• Overlapping sequence following a homopolymer region due to slippage of the enzyme
Recommended actions include:
• Sequence the complementary strand
• Use a primer that anneals at a different position
• Use an anchored primer (i.e., a sequencing primer that is polyT containing a A, C, or G
base at the 3’ end of a poly A region). The 3’ base will anchor the primer into place at the
end of the homopolymer region
Figure 13 Long homopolymer T regions (or A regions) can cause problems due to enzyme
slippage. E.g. in a 20 “T” homopolmyer region, 20 “T” bases as well as 21 “T” or 22 “T” bases
may be incorporated causing overlapping sequence after the homopolymer region
8. Delayed migration
Sequence data displaying delayed migration show:
• Peaks commence after the usual start point of 600 to 800 in the raw data trace
• Peaks are not evenly spaced in the raw and analysed traces
• Poor base calls in the analysd electropherogram
Delayed migration may be the result of:
• Contaminating negative ions (salts or other contaminants) in the sample being
preferentially injected to the labeled fragments
• Heavily overloaded samples. Excess of template used during sequencing
Recommended actions include:
• Diluting the sample in deionised formamide and rerunning the sample can often correct
this problem and yield good data
Figure 14 Delayed samples often result from an excess of salt in the sample
9. Excess dye peaks at beginning of sequence
Sequence data displaying excess dye peaks have:
• Peaks of excess dye present in the raw data trace
• Dye blobs in the analysed data trace at positions 80, 120 and 190
• Low signal to noise ratios S/N G:<50 A:<50 T:<50 C:<50
The presence of dye blobs in the data may be the result of:
• Incorrect estimation of template concentration (i.e. insufficient used)
• Poor removal of unincorporated dye terminators
Recommended actions include:
• Check template concentration by agarose gel
• Use fresh ethanol and sodium acetate (at room temp.) and use correct concentrations
• With microfuge tubes, aspirate the supernatant rather than decanting
• Do not use denatured alcohol
• Do not leave reactions precipitating overnight
Figure 15 Incomplete removal of excess dyes during the post cycle sequencing cleanup can
obscure data at the beginning of the sequence
10 Pull-up peaks and very strong signal
Sequence data displaying pull-up peaks are characterized by:
• very high peaks in the raw data trace
• very high peaks in the analysed data trace with pull up peaks and poor base calls
• very high signal to noise ratios S/N G:>750 A:>750 T:>750 C:>750
Pull-up peaks may be the result of:
• Incorrect estimation of template concentration (i.e. too much used)
Recommended actions include:
• Diluting the sample in deionised formamide and rerunning the sample can often correct
this problem and yield good data
• Reduce the amount of template used in sequencing
Figure 16 Pull-up peaks and very high signal may result from use of too much template during
cycle sequencing