Melissa Kramer: Illumina Pipeline Overview

Illumina Pipeline Overview
Casava Pipeline v1.8.2
Consensus Assessment of Sequence And VAriation
New ISAAC software
Much of this information comes
from the Illumina documentation
,QWURGXFWLRQ
’ž›Ž ŗ ŽšžŽ—Œ’— ŠŠ —Š•¢œ’œ ˜›”•˜ Firecrest – Image Analysis
tiff image files
RTA allows on the fly image
processing and basecalling.
Bustard – Base calling, quality calibration,
filtering and statistics
intensity files
sequence files
RTA file format called .bcl files – binary files contain all basecalled
sequence reads and quality scores.
CASAVA – demultiplexing,
alignment and more statistics
Align reads with phageAlign or ELAND
ELAND output - export files
ELAND
- Efficient Large-Scale Alignment Nucleotide Databases
eland_pair
-  align each read separately, then use uniquely aligning reads
to estimate read orientation and distance
- perl script picks the best read pair
-  anomaly file is created for reads that do not map in the
expected orientation or size range
eland_rna
-abundant sequences files: mitochondrial DNA, ribosomal region
sequences, 5S RNA (optional), and other contaminants
-splice junction set files
New ELAND v2e
•  Multi-seed and gapped alignment (since
CASAVA1.6)
•  Improved repeat resolution (multiple
overlapping seeds to anchor into unique)
•  Orphan alignment
–  Try to align orphaned mate with defined
window (default ~450bp)
Run time improvements
Basic Sample Sheet
*HQHUDWLQJ WKH 6DPSOH 6KHHW
‘Ž žœŽ› Ž—Ž›ŠŽ œŠ–™•Ž œ‘ŽŽ ǻŠ–™•Ž‘ŽŽǯŒœŸ ’•ŽǼ ŽœŒ›’‹Žœ ‘Ž œŠ–™•Žœ Š—
™›˜“ŽŒœ ’— ŽŠŒ‘ •Š—Žǰ ’—Œ•ž’— ‘Ž ’—Ž¡Žœ žœŽǯ ‘Ž œŠ–™•Ž œ‘ŽŽ œ‘˜ž• ‹Ž •˜ŒŠŽ ’—
‘Ž ŠœŽŠ••œ ’›ŽŒ˜›¢ ˜ ‘Ž ›ž— ˜•Ž›ǯ ˜ž ŒŠ— Œ›ŽŠŽǰ ˜™Ž—ǰ Š— Ž’ ‘Ž œŠ–™•Ž œ‘ŽŽ
’— ¡ŒŽ•ǯ
‘Ž œŠ–™•Ž œ‘ŽŽ Œ˜—Š’—œ ‘Ž ˜••˜ ’— ˜•ž–—DZ
˜•ž–—
ŽŠŽ›
Š—Ž
Š–™•Ž
Š–™•ŽŽ
—Ž¡
ŽœŒ›’™’˜—
˜—›˜•
ŽŒ’™Ž
™Ž›Š˜›
Š–™•Ž›˜“ŽŒ
ŽœŒ›’™’˜—
•˜ ŒŽ•• ˜œ’’ŸŽ ’—ŽŽ›ǰ ’—’ŒŠ’— ‘Ž •Š—Ž —ž–‹Ž› ǻŗȬŞǼ
˜ ‘Ž œŠ–™•Ž
‘Ž ›ŽŽ›Ž—ŒŽ œŽšžŽ—ŒŽ ˜› ‘Ž œŠ–™•Ž
—Ž¡ œŽšžŽ—ŒŽ
ŽœŒ›’™’˜— ˜ ‘Ž œŠ–™•Ž
’—’ŒŠŽœ ‘’œ •Š—Ž ’œ Š Œ˜—›˜• •Š—Žǰ –ŽŠ—œ
œŠ–™•Ž
ŽŒ’™Ž žœŽ ž›’— œŽšžŽ—Œ’—
Š–Ž ˜› ˜ ‘Ž ˜™Ž›Š˜›
‘Ž ™›˜“ŽŒ ‘Ž œŠ–™•Ž ‹Ž•˜—œ ˜
˜ž ŒŠ— Ž—Ž›ŠŽ ’ žœ’— ¡ŒŽ• ˜› ˜‘Ž› Ž¡ Ž’’— ˜˜• ‘Š Š••˜ œ ǯŒœŸ ’•Žœ ˜ ‹Ž
œŠŸŽǯ —Ž› ‘Ž Œ˜•ž–—œ œ™ŽŒ’’Ž Š‹˜ŸŽ ˜› ŽŠŒ‘ œŠ–™•Žǰ Š— œŠŸŽ ‘Ž ¡ŒŽ• ’•Ž ’— ‘Ž
ǯŒœŸ ˜›–Šǯ  ‘Ž œŠ–™•Ž ¢˜ž Š— ˜ œ™ŽŒ’¢ ˜Žœ —˜ ‘ŠŸŽ Š— ’—Ž¡ œŽšžŽ—ŒŽǰ •ŽŠŸŽ
‘Ž —Ž¡ ’Ž• Ž–™¢ǯ
••ŽŠ• ‘Š›ŠŒŽ›œ
›˜“ŽŒ Š— œŠ–™•Ž —Š–Žœ ’— ‘Ž œŠ–™•Ž œ‘ŽŽ ŒŠ——˜ Œ˜—Š’— ’••ŽŠ• Œ‘Š›ŠŒŽ›œ —˜
Š••˜ Ž ‹¢ œ˜–Ž ’•Ž œ¢œŽ–œǯ ‘Ž Œ‘Š›ŠŒŽ›œ —˜ Š••˜ Ž Š›Ž ‘Ž œ™ŠŒŽ Œ‘Š›ŠŒŽ› Š—
‘Ž ˜••˜ ’—DZ
" > @ ?
! A _ )LOHV
— ‘Ž ŠœŽŠ••œ ˜•Ž› ‘Ž›Ž ’œ Š—˜‘Ž› Œ˜—’ǯ¡–• ’•Ž Œ˜—Š’—’— ‘Ž –ŽŠȬ’—˜›–Š’˜—
Š‹˜ž ‘Ž ‹ŠœŽ ŒŠ••Ž› ›ž—œǯ
Run Folder Structure
!"#$%&'#()*&#+%&+#$),-./01$)
110608_Nirvana_0063_AB0ABTABXX/Data/Intensities/BaseCalls
!"#$%&'(%)*+(%*,+-./.+0%1'-23.'45%
%%%%!#421.64+0%
%%%%%%%%!7('8+-39:$;9;%
%%%%%%%%%%%%!<2=,1+9$;>?@A>%
%%%%%%%%%%%%%%%%$;>?@A>9;BC;CD9E??F9">9??>G/2*3HG6I%
%%%%%%%%%%%%%%%%$;>?@A>9;BC;CD9E??F9">9??FG/2*3HG6I%
%%%%%%%%%%%%%%%%$;>?@A>9;BC;CD9E??F9">9??AG/2*3HG6I%
%%%%%%%%%%%%%%%%$;>?@A>9;BC;CD9E??F9"F9??>G/2*3HG6I%
%%%%%%%%%%%%%%%%$;>?@A>9;BC;CD9E??F9"F9??FG/2*3HG6I%
%%%%%%%%%%%%%%%%$;>?@A>9;BC;CD9E??F9"F9??AG/2*3HG6I%
%%%%%%%%%%%%%%%%$;>?@A>9;BC;CD9E??A9">9??>G/2*3HG6I%
%%%%%%%%%%%%%%%%$;>?@A>9;BC;CD9E??A9">9??FG/2*3HG6I%
%%%%%%%%%%%%%%%%$;>?@A>9;BC;CD9E??A9"F9??>G/2*3HG6I%
%%%%%%%%%%%%%%%%$;>?@A>9;BC;CD9E??A9"F9??FG/2*3HG6I%
%%%%%%%%%%%%!<2=,1+9$;>?@JK%
%%%%%%%%%%%%%%%%$;>?@JK9CD;BDB9E??F9">9??>G/2*3HG6I%
%%%%%%%%%%%%%%%%$;>?@JK9CD;BDB9E??F9">9??FG/2*3HG6I%
%%%%%%%%%%%%%%%%$;>?@JK9CD;BDB9E??F9"F9??>G/2*3HG6I%
%%%%%%%%%%%%%%%%$;>?@JK9CD;BDB9E??F9"F9??FG/2*3HG6I%
%%%%%%%!7('8+-39"$;9L%
%%%%%%%%%%%%!<2=,1+9$;>?@M>%
%%%%%%%%%%%%%%%%$;>?@M>9CD;BDB9E??A9">9??>G/2*3HG6I%
%%%%%%%%%%%%%%%%$;>?@M>9CD;BDB9E??A9"F9??>G/2*3HG6I%
%%%%%%%%%%%%!<2=,1+9$;>?@M?%
%%%%%%%%%%%%%%%%$;>?@M?9BB;DDC9E??A9">9??>G/2*3HG6I%
%%%%%%%%%%%%%%%%$;>?@M?9BB;DDC9E??A9"F9??>G/2*3HG6I%
%%%%%%%!7('8+-39C'43('1%
""""""""""""""""""""""""""""""""""""&
:)(=76.=&?@&,-.&A<<&+,&BCD&+2=&E<CDF&,-.@&).:).3.2,&,-.&*(3,&:(:79+)&'()*+,&7
Z&
:)(5.6,3&G-.).&,-.&EDH&83&82I(9I.=;&J.&G899&73.&,-.&3+*.&+::)(+6-;&&$-.&&!"#$%
CCCC<<<<[\"[C<[SOO[[[[[[[ACC"OOOO"OO&
>K8:&6(*:).33.=&,(&*828*8K.&3,()+>.;&&L+2@&:(:79+)&+98>2.)3&+).&+?9.&,(&=8).6,9@
!
6(*:).33.=&!"#$%&'89.3;&
$-.&'8)3,&982.&83&:).'8M.=&?@&,-.&ȃOȄ&3@*?(9&+2=&6(2,+823&,-.&).+=&2+*.;&$-.3.&2+*.3&+).&
"&3+*:9.&.2,)@&83&:)(I8=.=&+2=&.M:9+82.=&?.9(GN&
:+)3.=&72,89&,-.&'8)3,&.26(72,.).=&G-8,.3:+6.;&A7.&,(&,-83&?.-+I8()F&+==82>&+==8,8(2+9&,+>3&,(&
,-.&-.+=.)&982.&83&2(,&:)(?9.*+,86&'()&.M,+2,&!"#$%&:+)3.)3;&&!
&
$-.&3.6(2=&982.&6(2,+823&,-.&3.]7.26.&?+3.3&!
OB"#/PQN/PRN!<S0RTUNVNWN/000N/VXW0&/NYN/XN"$<"<1&
$-.&,-8)=&982.&83&:).'8M.=&?@&+&Z&3@*?(9&+2=&3(*.,8*.3&).:.+,3&,-.&).+=&2+*.;&$-.&).+=&2+*.&
""""""""""""""""""""""""""""""""""""&
83&(*8,,.=&82&,-.&*828*+9&!"#$%&6+3.;&!
Z&
$-.&'(7),-&982.&6(2,+823&,-.&?+3.&]7+98,8.3&G-.).&C%&Z&PP&^&"#<DD&I+97.&3-(G2&82&,-.&?+3.&
CCCC<<<<[\"[C<[SOO[[[[[[[ACC"OOOO"OO&
]7+98,@&3,)82>&!
&
!
$-.&-.+=.)&982.&83&82,.):).,.=&+3&'(99(G3N!
$-.&'8)3,&982.&83&:).'8M.=&?@&,-.&ȃOȄ&3@*?(9&+2=&6(2,+823&,-.&).+=&2+*.;&$-.3.&2+
O&\823,)7*.2,_2+*.`N\)72&DA`N\'9(G6.99&DA`N\9+2._27*?.)`N\,89._27*?.)`N&&
:+)3.=&72,89&,-.&'8)3,&.26(72,.).=&G-8,.3:+6.;&A7.&,(&,-83&?.-+I8()F&+==82>&+==8,
\M_:(3`N&\@_:(3`&\).+=&27*?.)`N\83&'89,.).=`N\6(2,)(9&27*?.)`N\?+)6(=.&3.]7.26.`&&
,-.&-.+=.)&982.&83&2(,&:)(?9.*+,86&'()&.M,+2,&!"#$%&:+)3.)3;&&!
&
&
&
$-.&3.6(2=&982.&6(2,+823&,-.&3.]7.26.&?+3.3&!
E(,.&,-.&3:+6.&?.,G..2&\@:(3`&+2=&\).+=&27*?.)`;&&D2&+&:+8).=&.2=&)72F&).+=&/&+2=&).+=&V&
G899&?.&82&=8''.).2,&!"#$%&'89.3F&?7,&G.&G+2,&,-.*&,(&-+I.&*+,6-82>&,.*:9+,.&2+*.3;&&$-.&
$-.&,-8)=&982.&83&:).'8M.=&?@&+&Z&3@*?(9&+2=&3(*.,8*.3&).:.+,3&,-.&).+=&2+*.;&$2+*.&7:_,(&,-.&3:+6.&G899&+93(&?.&73.=&+3&,-.&).+=&2+*.&82&,-.&'82+9&C"L&'89.;&&&
83&(*8,,.=&82&,-.&*828*+9&!"#$%&6+3.;&!
\).+=&27*?.)`&G899&,@:86+99@&?.&/&()&VF&?7,&,-.&'8.9=&6+2&37::(),&(,-.)&I+97.3;&&a!()&.M+*:9.F&
$-.&'(7),-&982.&6(2,+823&,-.&?+3.&]7+98,8.3&G-.).&C%&Z&PP&^&"#<DD&I+97.&3-(G2&82
6.),+82&82=.M82>&'()*+,3&9.+=&,(&P&).+=3;b&
]7+98,@&3,)82>&!
\83&'89,.).=`&83&Y&8'&,-.&).+=&83&'89,.).=F&E&(,-.)G83.;&&
\6(2,)(9&27*?.)`&83&0&G-.2&2(2.&('&,-.&6(2,)(9&?8,3&+).&(2F&(,-.)G83.&8,&83&+2&.I.2&27*?.);&&
&
\?+)6(=.&3.]7.26.`&).:).3.2,3&,-.&c#BdC"#B#&*+3e.=&?+)6(=.&3.]7.26.F&.*:,@&(,-.)G83.;&
$-.&-.+=.)&982.&83&82,.):).,.=&+3&'(99(G3N!
Fastq files
O&\823,)7*.2,_2+*.`N\)72&DA`N\'9(G6.99&DA`N\9+2._27*?.)`N\,89._27*?.)`N&&
Demultiplex Stats file (Indexing)
QC metrics (Summary file)
For aligned paired reads, the summary file shows read
orientation and average insert size for each lane.
ISAAC Aligner and Variant Caller
(processing in less than half the time compared to Casava)
Aligner
-  Sort reference index by 32mers
-  Find candidate mappings by
32bp seed search
-  Select best mapping (3’ LQ
and adapter trimming)
-  Assign alignment scores
(use base quality and
position of mismatches)
-  Output is sorted de-duped
BAM
Variant Caller
-  Call SNVs and small indels
(<50bp)
-  Bayesian SNP caller
computes probability of each
genotype
-  Filters are applied (quality,
depth, etc)
-  Reads are realigned around
indels (Bayesian indel caller)
-  gVCF output
([FHOOHQW 4XDOLW\ 0HWULFV
Sequence
Analysis
(SAV)
‘Ž ’ž›Ž ‹Ž•˜ œ‘˜ œ
Š œŒ›ŽŽ— œ‘˜ ›˜– Viewer
’œ™•Š¢’— Š ›ž— ’‘
Ž¡ŒŽ••Ž— šžŠ•’¢
–Ž›’Œœǯ ˜Ž ‘Ž ›Ž— ˜ ‘’‘ ȬœŒ˜›Žœ ǻƖǁřŖǼ ŠŒ›˜œœ ŽŠŒ‘ Œ¢Œ•Ž ǻ•Ž œ’ŽǼ Š— ‘Ž
Œž–ž•Š’ŸŽ ’œ›’‹ž’˜— ˜ ƖǁřŖ Š–˜— ‘Ž ›ŽŠœ ǻ›’‘ œ’ŽǼǯ
’ž›Ž ř Œ›ŽŽ—œ‘˜ ‘˜ ’— ¡ŒŽ••Ž— žŠ•’¢ Ž›’Œœ
Save thumbnail images
SAV quality metrics charts
SAV check images
SAV Summary
,QWHUIDFH &RPPDQGV
‘Ž Š›’Š—ž’˜
’—Ž›ŠŒŽ ’œ Š—Studio
’—Ž›ŠŒ’ŸŽ Ÿ’Ž ˜ Ž—Žœ Š— ŸŠ›’Š—œ ’— Š œŽ•ŽŒŽ
Illumina
Variant
œŠ–™•Žǯ œŽ ‘Ž ’—Ž›ŠŒŽ Œ˜––Š—œ ˜ ’–™˜› ŸŠ›’Š—œǰ œ˜› ŠŠǰ Š™™•¢ ’•Ž›œǰ Š— Ž¡™˜›
ŠŠ ˜ Š ›Ž™˜›ǯ
’ž›Ž ř Š›’Š—ž’˜ —Ž›ŠŒŽ
•  Import vcf files
•  Annotate, filter, and
classify variants
•  Filter based on family
structure and disease
model
•  Somatic variants/
COSMIC
•  Generate reports with
histograms and charts
$ Ž—ž Š— Œ˜––Š—œȯ˜—Š’—œ Œ˜––Š—œ ˜› –Š—Š’— ‘Ž ™›˜“ŽŒǰ Š——˜Š’—
ŸŠ›’Š—œǰ Š— ›Ž™˜›’— ›Žœž•œǯ ˜––Š—œ Š›Ž ˜›Š—’£Ž ’— ˜ž› Š‹œDZȱ
˜–Žǰ
——˜Š’˜— Š— •Šœœ’’ŒŠ’˜—ǰ Ž™˜›œǰ Š— Ž•™ǯ
% ’•Ž›œ ™Š—Žȯ›˜Ÿ’Žœ ˜™’˜—œ ˜› ’•Ž›’— ŠŠ žœ’— Š—¢ Œ˜–‹’—Š’˜— ˜ ’•Ž›œǯ
& ’•Ž› ‘’œ˜›¢ȯ™Ž—œ ‘Ž ‘’œ˜›¢ ™Š—Ž• ‘Š œ‘˜ œ Š•• ’•Ž›œ Š™™•’Ž ˜ ‘Ž ™›˜“ŽŒǯ
' Š‹•Ž Š‹œȯŠŸ’Š’˜— ‹Ž ŽŽ— ‘Ž Š›’Š—œ Š‹•Žǰ Ž—Žœ Š‹•Žǰ Š— ˜ȬŠ•• Ž’˜—œ
Š‹•Žǯ
( Ž—Ž Ÿ’Ž ȯ‘˜ œ Š ›Š™‘’ŒŠ• ›Ž™›ŽœŽ—Š’˜— ˜ ‘Ž œŽ•ŽŒŽ Ž—Žǯ
) Š‹•Ž Ÿ’Ž œȯ’Ž ˜ ŠŠ œ‘˜ — ’— ‘Ž Š›’Š—œ Š‹•Žǰ Ž—Žœ Š‹•Žǰ Š— ˜ȬŠ••
Ž’˜—œ Š‹•Žǯ œŽ ‘Ž Š‹•Ž Š‹œ ˜ ˜•Ž ‹Ž ŽŽ— Š‹•Ž Ÿ’Ž œǯ
MiSeq Reporter
MiSeq Reporter Workflows
Instrument
Control
Software
(MCS)
RTA
Images
Base calls &
Quality Scores
MiSeq Reporter
Resequencing
Amplicon
Library QC
Small RNA
Limited Visualization via HTTP interface
Denovo
Assembly
16S
Metagenomics
MiSeq Workflows
• 
• 
• 
• 
• 
• 
Library QC
Resequencing
Amplicon (up to 384 loci in 96 samples)
De Novo Assembly (<20Mb)
Small RNA
Metagenomics (16S rRNA)
MiSeq PhiX Validation Run
Paired end
100 cycle
run
Illumina Control Library
(PE 151 MiSeq run)
Coverage report
Coverage
depth
Mismatches
Quality
score
(avg)
Variant
call
score