Marburgvirus - St. Olavs Hospital

We propose to develop a system for identifying and tracking all microbial pathogens in re
time, using genome sequencing of infected material from patients at hospitals. Data will
generated on a daily basis using Oxford Nanopore MinION chips (or similar technology), fr
clinical isolates, as outlined in the figure below. The preliminary work done in this proposal w
be done as collaboration between ORNL, Emory University and the Centers for Disease Con
in Atlanta, Georgia. This can be expanded later (with additional external funding) to inclu
other regional hospitals, and eventually form a network of reporting hospitals across the nation
allow for the monitoring and tracking of epidemiological outbreaks in ‘real-time’, within ho
of the patient’s visit to the hospital.
Third Generation Sequencing for
Rapid Biosurviellence
What is it?
How do we treat?
Have we seen this?
Clinical Sample Genome Sequences
Ussery
PrincipalDave
Investigator:
David W. Ussery
UT#Battelle)Business)Sensitive)))))))
NTNU kurset MOL8013 - Bacterial Genomics
Thursday, 21 May, 2015
1
21 May, 2015
Monitor
NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway
1)
2
3
21 May, 2015
NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway
http://www.genomicepidemiology.org/
4
21 May, 2015
NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway
Benchmarking methods for Genomic Taxonomy
Journal of Clinical Microbiology, 52:1529-1539, (2014).
5
21 May, 2015
NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway
REVIEWS
Samples
Rapid
growers
1–2 days
Blood
Traditional
diagnosis
Media for culture
A
B
Urine
C
Months
D
E
F
G
H
1–3 weeks
workflow
for processing
samples clinical
for bacterial
pa
to be
used in routine
micr
a few weeks to a few months. The schematic is an
current
techniques
withora precise
single,d
not intended
to be
a comprehensive
Growth and subtyping
Susceptibility on a rich
Typingmedium that will support the growth o
flora
present
amicrobiology
challenge for growing
the infecti
1–2 days
1–6
weeks
Clinical
is a discipline
th
Escherichia coli
are used to favour the growth of the suspected p
rapidly characterizing pathogen sample
A common inhabitant of the
? faeces. Boxes A to H arbitrarily represent th
from
guts of many animals, but
management of individual infected patie
designed for growing mycobacteria that have sp
some strains can cause serious
tic microbiology) and to monitor the
food poisoning, as reminded morphological appearance and density of growt
disease
(public
healthare
m
likelyoftoinfectious
be? pathogenic.
The likely
pathogens
by the 2011 outbreak in
Applications
inand
epidemiology
to determine
species
antimicrobialinclude
susceptid
Germany.
breaks,
monitoring
trends
in
infection
a
matrix-assisted laser desorption/ionization–tim
thesetting
emergence
of new threats.
Ongoing
before
up susceptibility
testing.
The othe
?
species;
susceptibility testing
is often setare
up sim
in DNA-sequencing
technologies
like
into diagnosis
groups
is
needed
to
choose
the
a
? of species
and monitoring of all pathoge
e.g. Legionella spp.
and perceived
likelihood
of an
outbreak,
a small
viruses, bacteria,
fungi
and
parasites,
but f
Mycobacterium
range
of
typing
tests
that
are
often
only
provided
spp.
All Mycobacterium
we focus on bacterial pathogens to dem
tuberculosis
positioned
arbitrarilythat
to indicate
that the
the further
1
Department of Statistics,
likely changes
arise from
adopt
MALDI–TOF
University of Oxford,
whole-genome sequencing.
1 South Parks Road, Oxford
Bacterial pathogens account for much
OX1 3TG, UK.
wide
burden of infection.
Forpresence
patientso
2
resistance,
Wellcome Trust Centre for species, antimicrobial
Figure 1 | Principles of current processing of bacterial pathogens. A schematic representation of the current
infections,
the crucial steps are to grow an
Nature
Reviews
| Genetics
Human
Genetics,
University
Samples
Media
for culture
workflow for
processing
samples for bacterial pathogens is presented,
showing
high complexity
and adeterminants,
typical
timescale
of and strain typing to detect out
of Oxford,
a few weeks to a few months. The schematic is an approximation that
highlightsRoosevelt
the principalDrive,
steps in support
the workflow;
it is
specimen,
to identify its species, to determ
surveillance.
Rapid
1–2
days to be a comprehensive or precise description. Samples that are
1–12
hours
not
intended
likely
to be normally
sterile are often cultured
Oxford OX3
7BN,
UK.
growers
genic
potential and to test its susceptibi
on a rich medium that will support the growth of any culturable organism.
Samples that are contaminated with
colonizing
3
NIHR Oxford Biomedical
flora present a challenge for growing the infecting pathogen. Many types of culture media (referred to
as
selective
media)
crobial
drugs. microbiology
Together, this information
Resistance
Virulance
Current
clinical
Bloodare used to favour the
A growth of the suspected pathogen; this approach
Sterile
Research
Centre,Relatedness
is
particularly
important for knowledge
culturing pathogens
Species
knowledge
specific
and
rational
treatment of
patien
from faeces. Boxes A to H arbitrarily represent the many different John
media Radcliffe
for culture. Hospital,
The medium H represents
a medium
base
base
The principles
behind diagnostic
bacterio
B mycobacteria that have specific growth requirements. When an organism is growing, the
designed for growing
Oxford OX3 9DU, UK.
health
purposes, knowledge also needs
changed
little
morphological appearance and density of growth are properties that
need specialist knowledge for deciding
whether
it is over the past 50 years. Most o
4
Nuffield
Department
of
Urinelikely to be pathogenic.
C
relatedness of the
pathogenistod
The likely pathogens are then processed through a complex pathway that hasput
many about
contingencies
from athe
microbiological
laboratory
Clinical
to determine species and antimicrobial susceptibility. Broadly, there
are twoMedicine,
approaches.University
One approach uses
of the same
species
to investigate
Pus matrix-assisted laser
D desorption/ionization–time of flight (MALDI–TOF) mass spectrometry for specieson
isolating
a viable
organism.
Moretransm
than
of Oxford,Database
John Radcliffe
Read processing
of all sequences,identification
metadata,
Sequencer
before setting up susceptibility testing.
The other uses
Gram
staining
followed
by
biochemical
testing
to
determine
and
to
allow
the
recognition
of
outbreak
and assembly Hospital, Oxford
analysis results
and
visual
analytics
OX3 9DU, of experimentation has led to the develop
Surface swabs
E testing is often set up simultaneously with doing biochemical tests.
species; susceptibility
Categorization of pathogens
Mycobacterium
steps
in this of
process
of characterizing
UK.
into groups of species is needed to choose the appropriate susceptibility-testing
panel. Finally, depending
on
the
species
wide
repertoire
methods
for isolating
tuberculosis
Sputum
F
to D.W.C.
and perceived likelihood of an outbreak, a small subset of isolates Correspondence
may be chosen for further
investigation using
a wide on many specialized, species-spe
depends
bacterial
Thelaboratories.
causative agent
of lines and question
range of typing tests that are often only provided by reference
The dashed
marks arepathogens. After culture, diagnos
e-mail: derrick.crook@ndcls.
ologies
that have
developed
d
Faecespositioned arbitrarily
G to indicate that the further investigation
Contaminated
is varied andithappens
in only a small number
of cases. depends
tuberculosis,
terization
onbeen
a wide
range of over
testing
ox.ac.uk infects
require the extensive knowledge base of cl
Local hospital clinical record
system
National and
international databases
approximately
one-third
of the
H
(FIG. 1), many aspects of which are speciesdoi:10.1038/nrg3226
human
population
and
claims
ologists who
com
Published online
Complexity
and aapply
lack labour-intensive,
of automation preven
species,
antimicrobial
resistance,
presence
of
virulence
The
cardinal
steps
in
processing
a
sample
are
isolatover
one
million
lives
per
year,
Mycobacterium
7 August 2012
slow
to yield the relevant info
1–3 weeks
determinants,
and strain typing to detect outbreaks andmaking
ing aitpathogen,
determining
the species,
testing
anti-complete diagnostic informati
return
of techniques
the
spp.
isolates
theAllmost
deadly
support surveillance.
microbial susceptibility and virulence and, in specific
bacterialsequencing.
pathogen ofAhumans.
bacterial
Figure 2 | Hypothetical workflow based on whole-genome
schematic
representation
ofisolate.
the
settings, intra-species
typing. The
first three
steps
are
Nature
Reviews
| Genetics
workflow
anticipated
after adoption of whole-genome sequencing
is shown,
with anmanagement
expected timescale
could fit
Current
clinical
microbiology
crucial for
the optimal
of anthat
infected
NATURE
| GENETICS
within a single day. The culture steps would be the same as those
that areREVIEWS
currently used
in a routine microbiology
The principles behind diagnostic bacteriology have patient, and the last step is valuable for identifying
laboratory. Some types of sample might be directly sequenced (see ‘Future directions’, not shown here). When a sample
changed little over the past 50 years. Most of the out- outbreaks and surveillance.
or likely
pathogen
is ready forNTNU
sequencing,
will
be602
extracted.
This procedure
is becoming
simpler, as
NTNU kurset MOL8013
- Bacterial
Genomics,
/ St. DNA
Olavs
Hospital,
Trondheim,
Norway
| SEPTEMBER
2012
| VOLUME
13the input
© 2012 Macmillan Publish
put from a microbiological laboratory is dependent
Mycobacterium
spp.
REVIEWS
Genome-based
diagnosis
6
21 May, 2015
Weeks
Samples
Rapid
growers
Sterile
Media for culture
1–2 days
Blood
A
B
Genome-based
diagnosis
Contaminated
Urine
C
Pus
D
Surface swabs
E
Sputum
F
Faeces
G
H
Mycobacterium
spp.
1–3 weeks
workflow
for processing
samples clinical
for bacterial
pa
to be
used in routine
micr
a few weeks to a few months. The schematic is an
current
techniques
withora precise
single,d
not intended
to be
a comprehensive
1–12 hours
on a rich medium that will support the growth o
floraClinical
present amicrobiology
challenge for growing
the infecti
is a discipline
th
Escherichia coli
Resistance
Virulance
are used to favour the growth of the suspected p
Relatedness
Species
knowledge
rapidly
characterizing pathogen sample
A common inhabitant of the knowledge
from
Boxes A to H arbitrarily represent th
base faeces.
base
guts of many animals, but
management
of individual infected patie
designed for growing mycobacteria that have sp
some strains can cause serious
tic microbiology) and to monitor the
food poisoning, as reminded morphological appearance and density of growt
disease
(public
healthare
m
likelyoftoinfectious
be pathogenic.
The likely
pathogens
by the 2011 outbreak in
Read processing
Database of all sequences, metadata,
Applications
inand
epidemiology
to determine
species
antimicrobialinclude
susceptid
Sequencer
and assemblyGermany. analysis results and visual analytics
breaks,
monitoring
trends
in
infection
a
matrix-assisted laser desorption/ionization–tim
thesetting
emergence
of new threats.
Ongoing
before
up susceptibility
testing.
The othe
species;
susceptibility testing
is often setare
up sim
in DNA-sequencing
technologies
like
into diagnosis
groups of species
is
needed
to
choose
the
a
and monitoring of all pathoge
Local hospital clinical record system
National and international databases
and perceived
likelihood
of an
outbreak,
a small
viruses, bacteria,
fungi
and
parasites,
but f
range
of
typing
tests
that
are
often
only
provided
we focus on bacterial pathogens to dem
positioned
arbitrarilythat
to indicate
that the
the further
1
All isolates
Department
of Statistics,
likely changes
arise from
adopt
Weeks
University
of Oxford,
Figure 2 | Hypothetical workflow based on whole-genome
sequencing.
A schematic representation
of the
Naturewhole-genome
Reviews
| Genetics sequencing.
workflow anticipated after adoption of whole-genome sequencing
shown,Road,
with anOxford
expected timescale that could fit
1 Southis Parks
Bacterial pathogens account for much
within a single day. The culture steps would be the same as those
currently used in a routine microbiology
OX1 that
3TG,areUK.
laboratory. Some types of sample might be directly sequenced
(see
‘Future
directions’,
not
shown
here).
When
a sample of infection. For patients
wide
burden
2
antimicrobial
resistance, presence o
Wellcome
Trust Centre
for species,
or likely pathogen is ready for sequencing, DNA will be extracted.
This procedure
is becoming
simpler, as the input
infections,and
the crucial
steps are
to growout
an
Human Genetics, University determinants,
strain typing
to detect
of Oxford, Roosevelt Drive,
specimen,
to identify its species, to determ
support
surveillance.
(BOX 1) could enable sequencing without preparation. Therefore, bacterial genome sequencing in hours and possibly
Oxford OX3 7BN, UK.
genic
even minutes is a realistic prospect. After sequencing, the main
processes for yielding information will
be potential and to test its susceptibi
3
Oxford
Biomedical
computational. The development of software and databasesNIHR
is a major
challenge
to overcome before
pathogen
crobial
drugs. microbiology
Together, this information
Current
clinical
Research
Centre,
sequencing can be deployed in clinical microbiology. Automated
sequence
assembly algorithms
will be necessary
specific
and
rational
treatment of
patien
to process the raw sequence data (BOX 1). This assembled sequence
would then
be analysed byThe
modular
software to behind diagnostic
John Radcliffe
Hospital,
principles
bacterio
determine species, relationship to other isolates of the sameOxford
species,OX3
antimicrobial
resistance
profile
and
virulence
9DU, UK.
healthwill
purposes,
alsoMost
needso
changed
over theknowledge
past 50 years.
gene content. Results of this analysis will be reported through
hospital information systems. All
of the resultslittle
also
4
Nuffield
Department
of
about
relatedness of the
pathogenistod
be used for outbreak detection and infectious diseases surveillance. These developments will put
require
a new large
from
athe
microbiological
laboratory
Clinical
Medicine,
University
database and other informatics technology and will take time
to develop.
In particular,
it will need ‘intelligent systems’,
of
the same
species
to investigate
on isolating
a viable
organism.
Moretransm
than
which will incorporate elements of machine learning to allow
updating
of key knowledge
bases for species
of automatic
Oxford, John
Radcliffe
and
to
allow
the
recognition
of
outbreak
identification, antimicrobial resistance determination and virulence
detection.
Formal
evaluation
of
such
a
solution
will
Hospital, Oxford OX3 9DU, of experimentation has led to the develop
also need robust testing to ensure that it performs atMycobacterium
least as well as current methods.
in this of
process
of characterizing
UK.
widesteps
repertoire
methods
for isolating
tuberculosis
Correspondence to D.W.C.
depends
on
many
specialized,
species-spe
bacterial pathogens. After culture,
diagnos
The causative agent of
e-mail: derrick.crook@ndcls.
ologies
that
have
been
developed
d
typing. Methods that are commonly used now include
even
though
it
is
not
always
yet
known
how
to
intertuberculosis, it infects
terization
depends
on a wide range of over
testing
PCR (for example, multiple-locus variable number of ox.ac.uk
pret these data. However, the genomerequire
also includes
vast
theaspects
extensive
of cl
approximately one-third of the
(FIG. 1)
, many
ofknowledge
which are base
speciestandem repeats (MLV–VNTR) analysis)22, restriction doi:10.1038/nrg3226
amounts of additional data that
are currently
unavailhuman
population
and
claims
ologists
who
apply
labour-intensive,
com
fragment length polymorphisms (for example, pulsed Published
able fromonline
routine processing, thus
opening the prospect
Complexity
and a lack of automation preven
over one
million
lives per
year,into pathogen
field gel electrophoresis)23 or fractional sequencing
large-scale
research
genotype–phe7 for
August
2012
slow
techniques
to yield
the relevant
info
return
of the
diagnostic
informati
24
making
it the most
deadly from routinely
(for example, multi-locus sequence typing (MLST))
. notype
associations
collected
data.complete
The
Through substantial investment in monitoringbacterial
and hurdles
to implementing
pathogen
of humans. whole-genome
bacterialsequencing
isolate.in
reference facilities, turnaround time for these methods clinical and public health laboratories are substantial, as
can be reduced to a few days. However, because most NATURE
widespread
adoption would
require incorporating the
REVIEWS
| GENETICS
locations do not benefit from such facilities, typing knowledge from more than a century of characterizing
NTNU kurset MOL8013typically
- Bacterial
Genomics,
/ St. Olavscontrol
Hospital,
Trondheim,
contributes
little NTNU
to the immediate
of| SEPTEMBER
pathogens —Norway
currently
by a 13
skilled workforce
602
2012 |delivered
VOLUME
© 2012 Macmillan Publish
an outbreak.
— into an entirely new framework of mainly computer-
Meta-Genome
based
diagnosis
7
21 May, 2015
MinION
~ 2 hours (!)
3rd Gen. Seq.
Growth in GenBank
1982-2014
3,0E+15
GenBank bases
WGS bases
SRA bases
Moore’s law
2,5E+15
2,0E+15
1,5E+15
2nd Gen. Seq.
1,0E+15
10 years
H. influenza
20 years
500,000,000,000,000 bp of DNA sequence
8
21 May, 2015
NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway
4/1/14
8/1/13
12/1/12
8/1/11
4/1/12
12/1/10
4/1/10
8/1/09
12/1/08
4/1/08
8/1/07
4/1/06
12/1/06
8/1/05
4/1/04
12/1/04
8/1/03
4/1/02
12/1/02
8/1/01
4/1/00
12/1/00
8/1/99
2/1/98
10/1/98
6/1/97
2/1/96
10/1/96
6/1/95
2/1/94
10/1/94
6/1/93
9/1/92
9/1/91
9/1/90
9/1/89
10/1/88
12/1/87
2/1/87
2/1/86
10/1/84
0,0E+00
12/1/82
5,0E+14
Turnaround time: bacterial genome
current techniques with a single, more
Sanger: 1 Mb
flora present
Seq platform useful and
are used to fa
mple collections.
Clinical microbiology is a discipline
that focB
from faeces.
Escherichia coli
obiology,
fast,60
compact
Sanger:
days
Turnaround time: bacterial genome
rapidly
characterizing
pathogen
samples
toford
designed
A common inhabitant of the
ed to the large, high-capacity
guts of many animals, but
morphologic
management of individual infected
patients (d
some strains can cause serious
encing. Two such platforms,
likely
be pa
tic microbiology) and to monitor thetoepide
food poisoning, as reminded
to determine
, both of which use established
of infectious disease (public health
microb
Sanger: 60 days
by the 2011 outbreak in
matrix-assiste
aration and amplification as
Applications
in
epidemiology
include
detect
Germany.
before
setting
Illumina HiSeq2000:
breaks, monitoring trends in infection and
ide
coming popular among
species;
susc
14 days (2 days)
the emergence of new threats. Ongoing devel
from Oxford
Nanopore
454 Genome
into groups o
in DNA-sequencing technologies are
likely to a
Sequencer:
mmercial
release in 2012
and perceive
Illumina MiSeq: 1 day (<1 day)
diagnosis and monitoring of all pathogens,
in
2 dayspassing through
ION PGM: 1 day (<1 day)
range of typin
NA molecule
Illumina HiSeq2000:
viruses, bacteria, fungi and parasites,
but
for
thi
positioned ar
14 days (2 days)
ol of a processive enzyme is
we
focus
on
bacterial
pathogens
to demonst
454 Genome
(Oxford Nanopore: 2 hours)
al current across a lipid
Department of Statistics,
changes
from
theIadoption
EV
E W Sof
Sequencer:
Illumina MiSeq: 1likely
day (<1
day) that ariseR
ny, data are collected in real
University
of
Oxford,
whole-genome
sequencing.
2 days
ION PGM: 1 day (<1
day)
1 South Parks Road, Oxford
econd, and they expect up to
anti
Bacterial pathogens account forspecies,
much of
th
OX1 3TG, UK.
determinant
. These data are translated
wide
burden of infection. For patients with b
Wellcome Trust
CentreNanopore:
for
(Oxford
2 hours)
Cost per Mb assembled
sequence
support
surv
Box 1 | Sequencing platforms for clinical microbiology
e using on-board electronics.
infections, the crucial steps are to grow
an isola
Human Genetics, University
of Oxford, Roosevelt Drive,
specimen, to identify its species, to determine i
e configured to readReleased
2,000 orin 2005 with reads of ~110 b, the first next-generation
Raw
daily output
Oxford OX3 7BN, UK.
Current cli
genic potential and to test its susceptibility
to
readsSanger:
can be up
to tens
of
sequencers,
from Roche-454, could sequence bacterial genomes
$10,000
NIHR
Oxford
Biomedical
69
The
princip
crobial
drugs.
Together,
this
information
facili
Gb)
Illumina
HiSeq2000:
(100
in a single run . Initial applications were focused on diversity
native DNA, the Oxford
Research Centre,
Costofper
discovery. Later versions
theMb
454 assembled
platform havesequence
increased read
specific and rational treatment ofchanged
patients.litt
Fo
John Radcliffe Hospital,
to work with fairly crude
length (~500 b) to approach that of Sanger sequencing but at a
put
from
Oxford OX3 9DU, UK.
health purposes, knowledge also needs to ab
ns. The company plans
two
(Oxford Nanopore: 15 Gb)
much lower cost and so have retained a role in producing
Nuffield Department of
on isolating
about the relatedness of the pathogen
to othe
which multiple sequencing
Clinical
Medicine,
University
high-contiguity assembliesSanger:
of bacterial
genomes.
$10,000
MiSeq: 1.4
Gb
(7 Gb)
of
experime
of theMycobacterium
same speciesIllumina
to investigate
transmissio
of Oxford, John Radcliffe
Initially
of ~2 Gb
of Genome
data an hour)
canlaunched in 2006 with short (36 b) reads, Illumina
454
1
wide repert
and totuberculosis
allow the recognition
of outbreaks
. Ea
Hospital, Oxford OX3 9DU,
ION PGM: 1 Gb (2 Gb)
Genome
Analysers
have
captured
the
bulk
of
the
sequencing
Sequencer:
gle-use, USB-connected
bacterial
steps The
in this
process
of characterizing
thepap
causative
agent of
UK. 454 Genome
market for both microbiology and larger organisms. With
$1,000
Sequencer:
ly capacity
of ~150 Mb.
If
tuberculosis,
it
infects
Correspondence
to D.W.C.
terization de
depends on many specialized, species-specific
incrementally increasing capacity and
length,
the($100)
current
30 Mb
Ionread
PGM:
$600
approximately one-third of the
e-mail: derrick.crook@ndcls.
to current next-generation
, man
ologies that have been developed (FIG. 1)
over decade
standard configuration (at the end of 2011) delivers ~300 Gb of raw
human population and claims
ox.ac.uk
Complexity
requireover
theone
extensive
knowledge base
of clinical
per eight-lane flow cell in the form
of 100MiSeq:
b paired$120
reads.($80)
e complete genomes data
to be
Illumina
million lives per year,
doi:10.1038/nrg3226
454 Genome
Tagging each
own 6–8 b index sequence allows at
return
of the
ologists
whoitapply
labour-intensive,
complex
a
achine. This new technology
is sample with its
making
the most
deadly
Sequencer:
Illumina HiSeq2000: $25 ($25) Published online
least 96 samples to be sequenced simultaneously in each lane.
7 August
2012 1 Mb
slow techniques
to yield
the relevant
informati
bacterial pathogen
of humans.
bacterial
iso
Sanger:
esigned platforms that are
$1,000 (Oxford Nanopore: $25)
This approach makes the Illumina HiSeq platform useful and
ment is sequencing technology.
Ion PGM: $600 ($100)
cost-effective for large bacterial sample collections.
of sequencing
technologies
It
is
clear
that
for
most
uses
in
microbiology,
fast,
compact
2004
2005
2011
2012
Turnaround
time: bacterial
genome 602 | SEPTEMBER 2012 | VOLUME 13
NATURE REVIEWS
| GENETICS
Illumina MiSeq: $120 ($80)
bench-top machines will be preferred to the large, high-capacity
ghlighting the continuing
© 2012 Macmillan Publishers Lim
machines
TwoReviews
such platforms,
Nature
| Genetics
Illumina HiSeq2000: $25 ($25)
and reductions in costs.
Valuesdesigned for human sequencing.
the Ion PGM and the Illumina MiSeq, both of which use established
(Oxford
Sanger:
60 daysNanopore: $25)
n dots
are projections
for
involve
preparation
amplification
21 May, 2015chemistries that
NTNU
kursetlibrary
MOL8013
- Bacterialand
Genomics,
NTNU /as
St. Olavs Hospital, Trondheim, Norway
9
Data Throughput & Generation Speed
MinION
Single molecule sequencing!
Fast
1
3rd Gen. Seq.
2
3
Cheap
4
High Throughput
Metagenomics - no cultivation needed !
the first steps in sequencing, are becoming popular among
Emory University, 21 August, 2014
10
11
12
13
21 May, 2015
NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway
0
0%
200
300
400
500
100
1
Fecal
bacteria
are
difficult
to
classify.
600
700 741
Meta-Genomic Species
MetaGenomic Species
Pathogens?
8e+05
6e+05
gene richness
4e+05
Gene richness
Taxonomy unknown
Spirochaetes
Synergistetes
Euryarchaeota
Fusobacteria
Lentisphaerae
Verrucomicrobia
Proteobacteria
Actinobacteria
Bacteroidetes
Firmicutes
H. Bjørn
50
gene−rich MGS per sample
b
100 150 200 250 300
Taxonomical
breakdown
of MGS
per sample
Even phyla
are uncertain
(using
BLAST)
for most species.
0
2e+05
Enterotype: R
P
B
Crohn’s disease:
>400 metagenome samples, 10 Tbytes, 9 Million genes
Nature Biotechnology, 32:822-828, (2014).
14
21 May, 2015
NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway
Crohn’s disease
Ch
la
Cy
my
Chlo
an
ob
dia
robi
e
ac
ter
riu
m
Fla
v ob
act
e
Planctom
ycetes
ria
Actinobacte
c
eoba
P ro t
teria
BACTERIA
Who is there?
ARCHAEA
ry
Eu
ia
ta
eo
a
h
arc
Bacteroidetes
tes
Spirochae
Clostridium
s
s
icute acillu
Firm
B
exi
rofl
ia s
o
l
ter rmu
Ch
c
ba The
ido
c
A
De
i
Th noc
er oc
m cu
ot s
og
Aq
a
uif
ica
e
a
ot
rc
na
e
Cr
e
ha
The genome sequence is the
best ‘unique identifier’….
EUCARYA
Gia
rdia
old
S
m
lime
s
yce
rom
Babesia
cha
Sac
Unicellular
eukaryotes
15
Protozoans
pa
Try
so
no
ma
Animals
Plants
Macro-organisms
Fig. 1.1 A phylogenetic tree displaying the genetic distances between members of the three superNTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway
kingdoms of life:
Bacteria, Archaea, and Eucarya. The represented bacterial genera will appear in
21 May, 2015
dtree.ornl.gov
16
21 May, 2015
NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway
~3000 Complete Genomes
AAI tree
on
s
u
ich
i
h
c
eri
h
c
Es
r
che
Es
erg
f
a
i
ia
i
col
4
4:H
0
O1
7
7:H
5
O1
17
21 May, 2015
CoZEE Zoonoses Network Autumn Conference
11 November, 2014
i
h
S
a
l
l
ge
NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway
Comparison of 2084 E. coli genomes
Pansize : 89,003 unique protein families
coresize: 3,188 for a cutoff of 0.95 for core
coresize:
304 for a cutoff of 1.00 for core
18
21 May, 2015
NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway
Comparison of 88 E. coli O157 genomes
Pansize : 12,136 unique protein families
coresize: 4,108 for a cutoff of 0.95 for core
coresize: 3,042 for a cutoff of 1.00 for core
E. coli O157 genomes range from 4.9 to 6.2 Mbp
19
21 May, 2015
NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway
Salmonella enterica Typhimurium DT104
315 S.Typhimurium DT104 isolates sampled from 1969 to 2012 from six continents.
20
21 May, 2015
NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway
297 MDR S.Typhimurium DT104 isolates.
Nature Genetics, submitted 2014.
21
21 May, 2015
NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway
Dissemination of S. enterica Typhimurium DT104.
Nature Genetics, submitted 2014.
22
21 May, 2015
NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway
V. cholera BLAST atlas, with 140 genomes
PNAS | November 20, 2012 | vol. 109 | no. 47
Haiti non-O1 strains (14 strains, Rita Colwell)
Haiti O1 strains (11 strains, Rita Colwell)
Nepal O1 strains (24 strains, Paul Keim)
Haiti O1 strains (104 strains, CDC)
Finished genomes are important!
23
21 May, 2015
NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway
24
21 May, 2015
NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway
“Viruses have gotten a bad rap,” said
Ken Cadwell, an immunologist at New
York University
School of Medicine.
Koh Y., Wu X., Ferris A. L., Matreyek K. A., Smith S. J., et al. (2013) Differential effects of human immunodeficiency
“They don’t
cause
disease.”
virusalways
type 1 capsid
and cellular
factors nucleoporin 153 and LEDGF/p75 on the efficiency and specificity of viral DNA
integration. J Virol 87: 648-658
The New York Times, 20 November, 2014
Santini S., Jeudy S., Bartoli J., Poirot O., Lescot M., et al. (2013) Genome of Phaeocystis globosa virus PgV-16T
highlights the common ancestry of the largest known DNA viruses infecting eukaryotes. Proc Natl Acad Sci U S A 110:
10800-10805
DNA VIRUSES
USE ONLY
Routine sequencing of DNA viruses has produced a large
number of viral genomes that highlight the remarkable
variability of viruses. The differences between the genomes
of laboratory strains and clinical isolates of the same virus
can be substantial, underscoring the need to routinely
21
sequence clinical isolates .
UPMMGSFF64
tUFMtUFDITVQQPSU!JMMVNJOBDPNtXXXJMMVNJOBDPN
TSFTFSWFE
BZ#FBE9QSFTTD#PU$41SP%"4-%FTJHO4UVEJP&DP("**Y(FOFUJD&OFSHZ(FOPNF"OBMZ[FS
)J4DBO)J4FR*OýOJVNJ4FMFDU.J4FR/FYUFSB4FOUSJY4PMFYB5SV4FR5SV4JHIU7FSB$PEFUIF
IF(FOFUJD&OFSHZTUSFBNJOHCBTFTEFTJHOBSFUSBEFNBSLTPSSFHJTUFSFEUSBEFNBSLTPG*MMVNJOB*OD
DPOUBJOFEIFSFJOBSFUIFQSPQFSUZPGUIFJSSFTQFDUJWFPXOFST
rrent as of 30 August 2013
…60–99% of the sequences generated in
different viral metagenomic studies are
not homologous
to known
viruses.
Viral
Detection
and
Research
A reviewMokili
of publications
et al.featuring
2012Illumina Technology
®
References
http://res.illumina.com/documents/products/research_reviews/viral_detection_research_review.pdf
25
21 May, 2015
NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway
Chiu C. Y., Yagi S., Lu X., Yu G., Chen E. C., et al. (2013) A novel adenovirus species associated with an acute respiratory
Ch
la
Cy
my
Chlo
an
ob
dia
robi
e
ac
ter
riu
m
Fla
v ob
act
e
Planctom
ycetes
ria
Actinobacte
c
eoba
P ro t
teria
BACTERIA
ARCHAEA
ry
Eu
ia
ta
eo
a
h
arc
Bacteroidetes
tes
Spirochae
Clostridium
s
s
icute acillu
Firm
B
exi
rofl
ia s
o
l
ter rmu
Ch
c
ba The
ido
c
A
a
ot
rc
na
De
i
Th noc
er oc
m cu
ot s
og
Aq
a
uif
ica
e
e
Cr
e
ha
Viruses
EUCARYA
Gia
rdia
old
S
m
lime
s
yce
rom
Babesia
cha
Sac
Unicellular
eukaryotes
26
Protozoans
pa
Try
so
no
ma
Animals
Plants
Macro-organisms
Fig. 1.1 A phylogenetic tree displaying the genetic distances between members of the three superNTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway
kingdoms of life:
Bacteria, Archaea, and Eucarya. The represented bacterial genera will appear in
21 May, 2015
e
Poty
virid
a
lo
pil
Pa
e
ida
An
ell
ov
ir
4050 genomes
vir
ma
ida
e
es)
g
ha
p
(
s
le
a
vir
o
d
u
Ca
Geminiviridae
gvirus
r
u
b
r
Ma
Ebolaus
vir
Ebola
s (pha
Caud
oviral
e
s
svirale
Ge
mi
niv
irid
ae
ges)
Reoviridae
Caudovirales (phag
es)
Herpe
27
h colors:
RefSeq viruses
21 May, 2015
Coloring orders would show: Current circle colors:
NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway
F
e
a
d
i
r
i
ilov
family Filoviridae
Marburgvirus
28
21 May, 2015
NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway
Zaire Ebolavirus genomes
1976
1995 / 1996
2007 / 2008
2014
29
21 May, 2015
NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway
family Filoviridae
Marburgvirus
30
21 May, 2015
NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway
Ebolavirus -­‐ NC_002549, Marburgvirus – NC_024781
IPR008986
NP
VP40
IPR008609
IPR014023-­‐IPR026890
IPR014459
L
VP30
IPR002561
IPR002953
VP35
IPR009433
GP
VP24
Cuevavirus – NC_016144
IPR014023-­‐IPR026890-­‐PF14314
IPR008986
IPR008609
GP5
VP40
NP
IPR002561
IPR002953
GP7
IPR014459
GP4
VP35
VP30
L
IPR009433
VP24
Marburgvirus
4
3
Cuevavirus
6
Cuevavirus
7
Ebolavirus
31
21 May, 2015
2
7
Ebolavirus
Marburgvirus
Functional Domains
Full length protein alignments
NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway
Finished genomes are important! Context matters!
Z. ebolavirus
18,959 bp
A)
T helper
Cytotoxic T
Lymphocytes
genome seq.
variation
B)
C)
D)
E)
GP >
F)
GP >
NP >
VP35 >
VP40 >
GP >
VP30 >
VP24 >
L>
G)
0k
2.5k
5k
7.5k
10k
12.5k
15k
17.5k
Resolution: 8
C) CTL weak 4
A) Th weak 6
E) Variation
fix
avg
fix
avg
0.00
0.00
100.00
B) Th strong 5
75.00
10.00
90.00
21 May, 2015
fix
avg
CDS +
0.30
0.70
100.00
fix
avg
0.00
10.00
Center for Biological Sequence Analysis
http://www.cbs.dtu.dk/
32
G) Percent AT
F) Annotations:
D) CTL strong 3
fix
avg
0.00
fix
avg
NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway
GENOME ATLAS
Range:
5850
..
8350
Z. ebolavirus
ZMapp binding
18,959 bp
Qiu et al., Nature 2014
A)
T helper
B)
Cytotoxic T
Lymphocytes
C)
genome seq.
variation
E)
D)
Filo_glycop >
F)
Filo_glycop >
Filo_glycop >
HR1B >
HR1D >
HR2 >
Ebola-like_HR1-HR2 >
GP >
G)
GP >
GP >
H)
6k
6.25k
6.5k
6.75k
7k
7.25k
7.5k
7.75k
8k
8.25k
Resolution: 1
C) CTL weak 4
A) Th weak 6
E) Variation
fix
avg
fix
avg
0.00
0.00
100.00
B) Th strong 5
75.00
10.00
F) Annotations:
G) Annotations:
DOmain ann
90.00
21 May, 2015
fix
avg
CDS +
0.30
0.70
100.00
fix
avg
0.00
10.00
Center for Biological Sequence Analysis
http://www.cbs.dtu.dk/
33
H) Percent AT
D) CTL strong 3
fix
avg
0.00
fix
avg
NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway
GENOME ATLAS
sample air and water supplies for possible biological pathogens. We will use the ORNL Com
genomes
important!
matters!
andFinished
Data Environment
forare
Science
(CADES) Context
infrastructure
to manage data from this project.
Significance
We propose to develop a system for identifying and tracking all microbial pathogens in r
Summary
far!)
time, using genome sequencing
of infected (so
material
from patients at hospitals. Data wil
generated on a daily basis using Oxford Nanopore MinION chips (or similar technology), f
clinical isolates, as outlined in the figure below. The preliminary work done in this proposal
Monitor - bacteria? virus?
be done as collaboration between ORNL, Emory University and the Centers for Disease Con
Identification
Sequences
Treat (e.g.,external
antibioticfunding)
resist?) to inc
in Atlanta,
Georgia. This can be
expanded later (with additional
other regional hospitals, and eventually form a network of reporting
hospitals
across the natio
Follow / map
outbreaks
allow for the monitoring and tracking of epidemiological outbreaks in ‘real-time’, within h
of the patient’s visit to the hospital.
What is it?
How do we treat?
Have we seen this?
Clinical Sample Genome Sequences
34
Principal Investigator: David W. Ussery
21 May, 2015
Monitor
UT#Battelle)Business)Sensitive)))))))
NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway
1)
35
Questions:
● How can metagenomics be used in clinical
diagnosis?
● Is Ebola ‘rapidly evolving’?
● Why is vaccine development for viruses difficult?
36
21 May, 2015
NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway