We propose to develop a system for identifying and tracking all microbial pathogens in re time, using genome sequencing of infected material from patients at hospitals. Data will generated on a daily basis using Oxford Nanopore MinION chips (or similar technology), fr clinical isolates, as outlined in the figure below. The preliminary work done in this proposal w be done as collaboration between ORNL, Emory University and the Centers for Disease Con in Atlanta, Georgia. This can be expanded later (with additional external funding) to inclu other regional hospitals, and eventually form a network of reporting hospitals across the nation allow for the monitoring and tracking of epidemiological outbreaks in ‘real-time’, within ho of the patient’s visit to the hospital. Third Generation Sequencing for Rapid Biosurviellence What is it? How do we treat? Have we seen this? Clinical Sample Genome Sequences Ussery PrincipalDave Investigator: David W. Ussery UT#Battelle)Business)Sensitive))))))) NTNU kurset MOL8013 - Bacterial Genomics Thursday, 21 May, 2015 1 21 May, 2015 Monitor NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway 1) 2 3 21 May, 2015 NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway http://www.genomicepidemiology.org/ 4 21 May, 2015 NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway Benchmarking methods for Genomic Taxonomy Journal of Clinical Microbiology, 52:1529-1539, (2014). 5 21 May, 2015 NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway REVIEWS Samples Rapid growers 1–2 days Blood Traditional diagnosis Media for culture A B Urine C Months D E F G H 1–3 weeks workflow for processing samples clinical for bacterial pa to be used in routine micr a few weeks to a few months. The schematic is an current techniques withora precise single,d not intended to be a comprehensive Growth and subtyping Susceptibility on a rich Typingmedium that will support the growth o flora present amicrobiology challenge for growing the infecti 1–2 days 1–6 weeks Clinical is a discipline th Escherichia coli are used to favour the growth of the suspected p rapidly characterizing pathogen sample A common inhabitant of the ? faeces. Boxes A to H arbitrarily represent th from guts of many animals, but management of individual infected patie designed for growing mycobacteria that have sp some strains can cause serious tic microbiology) and to monitor the food poisoning, as reminded morphological appearance and density of growt disease (public healthare m likelyoftoinfectious be? pathogenic. The likely pathogens by the 2011 outbreak in Applications inand epidemiology to determine species antimicrobialinclude susceptid Germany. breaks, monitoring trends in infection a matrix-assisted laser desorption/ionization–tim thesetting emergence of new threats. Ongoing before up susceptibility testing. The othe ? species; susceptibility testing is often setare up sim in DNA-sequencing technologies like into diagnosis groups is needed to choose the a ? of species and monitoring of all pathoge e.g. Legionella spp. and perceived likelihood of an outbreak, a small viruses, bacteria, fungi and parasites, but f Mycobacterium range of typing tests that are often only provided spp. All Mycobacterium we focus on bacterial pathogens to dem tuberculosis positioned arbitrarilythat to indicate that the the further 1 Department of Statistics, likely changes arise from adopt MALDI–TOF University of Oxford, whole-genome sequencing. 1 South Parks Road, Oxford Bacterial pathogens account for much OX1 3TG, UK. wide burden of infection. Forpresence patientso 2 resistance, Wellcome Trust Centre for species, antimicrobial Figure 1 | Principles of current processing of bacterial pathogens. A schematic representation of the current infections, the crucial steps are to grow an Nature Reviews | Genetics Human Genetics, University Samples Media for culture workflow for processing samples for bacterial pathogens is presented, showing high complexity and adeterminants, typical timescale of and strain typing to detect out of Oxford, a few weeks to a few months. The schematic is an approximation that highlightsRoosevelt the principalDrive, steps in support the workflow; it is specimen, to identify its species, to determ surveillance. Rapid 1–2 days to be a comprehensive or precise description. Samples that are 1–12 hours not intended likely to be normally sterile are often cultured Oxford OX3 7BN, UK. growers genic potential and to test its susceptibi on a rich medium that will support the growth of any culturable organism. Samples that are contaminated with colonizing 3 NIHR Oxford Biomedical flora present a challenge for growing the infecting pathogen. Many types of culture media (referred to as selective media) crobial drugs. microbiology Together, this information Resistance Virulance Current clinical Bloodare used to favour the A growth of the suspected pathogen; this approach Sterile Research Centre,Relatedness is particularly important for knowledge culturing pathogens Species knowledge specific and rational treatment of patien from faeces. Boxes A to H arbitrarily represent the many different John media Radcliffe for culture. Hospital, The medium H represents a medium base base The principles behind diagnostic bacterio B mycobacteria that have specific growth requirements. When an organism is growing, the designed for growing Oxford OX3 9DU, UK. health purposes, knowledge also needs changed little morphological appearance and density of growth are properties that need specialist knowledge for deciding whether it is over the past 50 years. Most o 4 Nuffield Department of Urinelikely to be pathogenic. C relatedness of the pathogenistod The likely pathogens are then processed through a complex pathway that hasput many about contingencies from athe microbiological laboratory Clinical to determine species and antimicrobial susceptibility. Broadly, there are twoMedicine, approaches.University One approach uses of the same species to investigate Pus matrix-assisted laser D desorption/ionization–time of flight (MALDI–TOF) mass spectrometry for specieson isolating a viable organism. Moretransm than of Oxford,Database John Radcliffe Read processing of all sequences,identification metadata, Sequencer before setting up susceptibility testing. The other uses Gram staining followed by biochemical testing to determine and to allow the recognition of outbreak and assembly Hospital, Oxford analysis results and visual analytics OX3 9DU, of experimentation has led to the develop Surface swabs E testing is often set up simultaneously with doing biochemical tests. species; susceptibility Categorization of pathogens Mycobacterium steps in this of process of characterizing UK. into groups of species is needed to choose the appropriate susceptibility-testing panel. Finally, depending on the species wide repertoire methods for isolating tuberculosis Sputum F to D.W.C. and perceived likelihood of an outbreak, a small subset of isolates Correspondence may be chosen for further investigation using a wide on many specialized, species-spe depends bacterial Thelaboratories. causative agent of lines and question range of typing tests that are often only provided by reference The dashed marks arepathogens. After culture, diagnos e-mail: derrick.crook@ndcls. ologies that have developed d Faecespositioned arbitrarily G to indicate that the further investigation Contaminated is varied andithappens in only a small number of cases. depends tuberculosis, terization onbeen a wide range of over testing ox.ac.uk infects require the extensive knowledge base of cl Local hospital clinical record system National and international databases approximately one-third of the H (FIG. 1), many aspects of which are speciesdoi:10.1038/nrg3226 human population and claims ologists who com Published online Complexity and aapply lack labour-intensive, of automation preven species, antimicrobial resistance, presence of virulence The cardinal steps in processing a sample are isolatover one million lives per year, Mycobacterium 7 August 2012 slow to yield the relevant info 1–3 weeks determinants, and strain typing to detect outbreaks andmaking ing aitpathogen, determining the species, testing anti-complete diagnostic informati return of techniques the spp. isolates theAllmost deadly support surveillance. microbial susceptibility and virulence and, in specific bacterialsequencing. pathogen ofAhumans. bacterial Figure 2 | Hypothetical workflow based on whole-genome schematic representation ofisolate. the settings, intra-species typing. The first three steps are Nature Reviews | Genetics workflow anticipated after adoption of whole-genome sequencing is shown, with anmanagement expected timescale could fit Current clinical microbiology crucial for the optimal of anthat infected NATURE | GENETICS within a single day. The culture steps would be the same as those that areREVIEWS currently used in a routine microbiology The principles behind diagnostic bacteriology have patient, and the last step is valuable for identifying laboratory. Some types of sample might be directly sequenced (see ‘Future directions’, not shown here). When a sample changed little over the past 50 years. Most of the out- outbreaks and surveillance. or likely pathogen is ready forNTNU sequencing, will be602 extracted. This procedure is becoming simpler, as NTNU kurset MOL8013 - Bacterial Genomics, / St. DNA Olavs Hospital, Trondheim, Norway | SEPTEMBER 2012 | VOLUME 13the input © 2012 Macmillan Publish put from a microbiological laboratory is dependent Mycobacterium spp. REVIEWS Genome-based diagnosis 6 21 May, 2015 Weeks Samples Rapid growers Sterile Media for culture 1–2 days Blood A B Genome-based diagnosis Contaminated Urine C Pus D Surface swabs E Sputum F Faeces G H Mycobacterium spp. 1–3 weeks workflow for processing samples clinical for bacterial pa to be used in routine micr a few weeks to a few months. The schematic is an current techniques withora precise single,d not intended to be a comprehensive 1–12 hours on a rich medium that will support the growth o floraClinical present amicrobiology challenge for growing the infecti is a discipline th Escherichia coli Resistance Virulance are used to favour the growth of the suspected p Relatedness Species knowledge rapidly characterizing pathogen sample A common inhabitant of the knowledge from Boxes A to H arbitrarily represent th base faeces. base guts of many animals, but management of individual infected patie designed for growing mycobacteria that have sp some strains can cause serious tic microbiology) and to monitor the food poisoning, as reminded morphological appearance and density of growt disease (public healthare m likelyoftoinfectious be pathogenic. The likely pathogens by the 2011 outbreak in Read processing Database of all sequences, metadata, Applications inand epidemiology to determine species antimicrobialinclude susceptid Sequencer and assemblyGermany. analysis results and visual analytics breaks, monitoring trends in infection a matrix-assisted laser desorption/ionization–tim thesetting emergence of new threats. Ongoing before up susceptibility testing. The othe species; susceptibility testing is often setare up sim in DNA-sequencing technologies like into diagnosis groups of species is needed to choose the a and monitoring of all pathoge Local hospital clinical record system National and international databases and perceived likelihood of an outbreak, a small viruses, bacteria, fungi and parasites, but f range of typing tests that are often only provided we focus on bacterial pathogens to dem positioned arbitrarilythat to indicate that the the further 1 All isolates Department of Statistics, likely changes arise from adopt Weeks University of Oxford, Figure 2 | Hypothetical workflow based on whole-genome sequencing. A schematic representation of the Naturewhole-genome Reviews | Genetics sequencing. workflow anticipated after adoption of whole-genome sequencing shown,Road, with anOxford expected timescale that could fit 1 Southis Parks Bacterial pathogens account for much within a single day. The culture steps would be the same as those currently used in a routine microbiology OX1 that 3TG,areUK. laboratory. Some types of sample might be directly sequenced (see ‘Future directions’, not shown here). When a sample of infection. For patients wide burden 2 antimicrobial resistance, presence o Wellcome Trust Centre for species, or likely pathogen is ready for sequencing, DNA will be extracted. This procedure is becoming simpler, as the input infections,and the crucial steps are to growout an Human Genetics, University determinants, strain typing to detect of Oxford, Roosevelt Drive, specimen, to identify its species, to determ support surveillance. (BOX 1) could enable sequencing without preparation. Therefore, bacterial genome sequencing in hours and possibly Oxford OX3 7BN, UK. genic even minutes is a realistic prospect. After sequencing, the main processes for yielding information will be potential and to test its susceptibi 3 Oxford Biomedical computational. The development of software and databasesNIHR is a major challenge to overcome before pathogen crobial drugs. microbiology Together, this information Current clinical Research Centre, sequencing can be deployed in clinical microbiology. Automated sequence assembly algorithms will be necessary specific and rational treatment of patien to process the raw sequence data (BOX 1). This assembled sequence would then be analysed byThe modular software to behind diagnostic John Radcliffe Hospital, principles bacterio determine species, relationship to other isolates of the sameOxford species,OX3 antimicrobial resistance profile and virulence 9DU, UK. healthwill purposes, alsoMost needso changed over theknowledge past 50 years. gene content. Results of this analysis will be reported through hospital information systems. All of the resultslittle also 4 Nuffield Department of about relatedness of the pathogenistod be used for outbreak detection and infectious diseases surveillance. These developments will put require a new large from athe microbiological laboratory Clinical Medicine, University database and other informatics technology and will take time to develop. In particular, it will need ‘intelligent systems’, of the same species to investigate on isolating a viable organism. Moretransm than which will incorporate elements of machine learning to allow updating of key knowledge bases for species of automatic Oxford, John Radcliffe and to allow the recognition of outbreak identification, antimicrobial resistance determination and virulence detection. Formal evaluation of such a solution will Hospital, Oxford OX3 9DU, of experimentation has led to the develop also need robust testing to ensure that it performs atMycobacterium least as well as current methods. in this of process of characterizing UK. widesteps repertoire methods for isolating tuberculosis Correspondence to D.W.C. depends on many specialized, species-spe bacterial pathogens. After culture, diagnos The causative agent of e-mail: derrick.crook@ndcls. ologies that have been developed d typing. Methods that are commonly used now include even though it is not always yet known how to intertuberculosis, it infects terization depends on a wide range of over testing PCR (for example, multiple-locus variable number of ox.ac.uk pret these data. However, the genomerequire also includes vast theaspects extensive of cl approximately one-third of the (FIG. 1) , many ofknowledge which are base speciestandem repeats (MLV–VNTR) analysis)22, restriction doi:10.1038/nrg3226 amounts of additional data that are currently unavailhuman population and claims ologists who apply labour-intensive, com fragment length polymorphisms (for example, pulsed Published able fromonline routine processing, thus opening the prospect Complexity and a lack of automation preven over one million lives per year,into pathogen field gel electrophoresis)23 or fractional sequencing large-scale research genotype–phe7 for August 2012 slow techniques to yield the relevant info return of the diagnostic informati 24 making it the most deadly from routinely (for example, multi-locus sequence typing (MLST)) . notype associations collected data.complete The Through substantial investment in monitoringbacterial and hurdles to implementing pathogen of humans. whole-genome bacterialsequencing isolate.in reference facilities, turnaround time for these methods clinical and public health laboratories are substantial, as can be reduced to a few days. However, because most NATURE widespread adoption would require incorporating the REVIEWS | GENETICS locations do not benefit from such facilities, typing knowledge from more than a century of characterizing NTNU kurset MOL8013typically - Bacterial Genomics, / St. Olavscontrol Hospital, Trondheim, contributes little NTNU to the immediate of| SEPTEMBER pathogens —Norway currently by a 13 skilled workforce 602 2012 |delivered VOLUME © 2012 Macmillan Publish an outbreak. — into an entirely new framework of mainly computer- Meta-Genome based diagnosis 7 21 May, 2015 MinION ~ 2 hours (!) 3rd Gen. Seq. Growth in GenBank 1982-2014 3,0E+15 GenBank bases WGS bases SRA bases Moore’s law 2,5E+15 2,0E+15 1,5E+15 2nd Gen. Seq. 1,0E+15 10 years H. influenza 20 years 500,000,000,000,000 bp of DNA sequence 8 21 May, 2015 NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway 4/1/14 8/1/13 12/1/12 8/1/11 4/1/12 12/1/10 4/1/10 8/1/09 12/1/08 4/1/08 8/1/07 4/1/06 12/1/06 8/1/05 4/1/04 12/1/04 8/1/03 4/1/02 12/1/02 8/1/01 4/1/00 12/1/00 8/1/99 2/1/98 10/1/98 6/1/97 2/1/96 10/1/96 6/1/95 2/1/94 10/1/94 6/1/93 9/1/92 9/1/91 9/1/90 9/1/89 10/1/88 12/1/87 2/1/87 2/1/86 10/1/84 0,0E+00 12/1/82 5,0E+14 Turnaround time: bacterial genome current techniques with a single, more Sanger: 1 Mb flora present Seq platform useful and are used to fa mple collections. Clinical microbiology is a discipline that focB from faeces. Escherichia coli obiology, fast,60 compact Sanger: days Turnaround time: bacterial genome rapidly characterizing pathogen samples toford designed A common inhabitant of the ed to the large, high-capacity guts of many animals, but morphologic management of individual infected patients (d some strains can cause serious encing. Two such platforms, likely be pa tic microbiology) and to monitor thetoepide food poisoning, as reminded to determine , both of which use established of infectious disease (public health microb Sanger: 60 days by the 2011 outbreak in matrix-assiste aration and amplification as Applications in epidemiology include detect Germany. before setting Illumina HiSeq2000: breaks, monitoring trends in infection and ide coming popular among species; susc 14 days (2 days) the emergence of new threats. Ongoing devel from Oxford Nanopore 454 Genome into groups o in DNA-sequencing technologies are likely to a Sequencer: mmercial release in 2012 and perceive Illumina MiSeq: 1 day (<1 day) diagnosis and monitoring of all pathogens, in 2 dayspassing through ION PGM: 1 day (<1 day) range of typin NA molecule Illumina HiSeq2000: viruses, bacteria, fungi and parasites, but for thi positioned ar 14 days (2 days) ol of a processive enzyme is we focus on bacterial pathogens to demonst 454 Genome (Oxford Nanopore: 2 hours) al current across a lipid Department of Statistics, changes from theIadoption EV E W Sof Sequencer: Illumina MiSeq: 1likely day (<1 day) that ariseR ny, data are collected in real University of Oxford, whole-genome sequencing. 2 days ION PGM: 1 day (<1 day) 1 South Parks Road, Oxford econd, and they expect up to anti Bacterial pathogens account forspecies, much of th OX1 3TG, UK. determinant . These data are translated wide burden of infection. For patients with b Wellcome Trust CentreNanopore: for (Oxford 2 hours) Cost per Mb assembled sequence support surv Box 1 | Sequencing platforms for clinical microbiology e using on-board electronics. infections, the crucial steps are to grow an isola Human Genetics, University of Oxford, Roosevelt Drive, specimen, to identify its species, to determine i e configured to readReleased 2,000 orin 2005 with reads of ~110 b, the first next-generation Raw daily output Oxford OX3 7BN, UK. Current cli genic potential and to test its susceptibility to readsSanger: can be up to tens of sequencers, from Roche-454, could sequence bacterial genomes $10,000 NIHR Oxford Biomedical 69 The princip crobial drugs. Together, this information facili Gb) Illumina HiSeq2000: (100 in a single run . Initial applications were focused on diversity native DNA, the Oxford Research Centre, Costofper discovery. Later versions theMb 454 assembled platform havesequence increased read specific and rational treatment ofchanged patients.litt Fo John Radcliffe Hospital, to work with fairly crude length (~500 b) to approach that of Sanger sequencing but at a put from Oxford OX3 9DU, UK. health purposes, knowledge also needs to ab ns. The company plans two (Oxford Nanopore: 15 Gb) much lower cost and so have retained a role in producing Nuffield Department of on isolating about the relatedness of the pathogen to othe which multiple sequencing Clinical Medicine, University high-contiguity assembliesSanger: of bacterial genomes. $10,000 MiSeq: 1.4 Gb (7 Gb) of experime of theMycobacterium same speciesIllumina to investigate transmissio of Oxford, John Radcliffe Initially of ~2 Gb of Genome data an hour) canlaunched in 2006 with short (36 b) reads, Illumina 454 1 wide repert and totuberculosis allow the recognition of outbreaks . Ea Hospital, Oxford OX3 9DU, ION PGM: 1 Gb (2 Gb) Genome Analysers have captured the bulk of the sequencing Sequencer: gle-use, USB-connected bacterial steps The in this process of characterizing thepap causative agent of UK. 454 Genome market for both microbiology and larger organisms. With $1,000 Sequencer: ly capacity of ~150 Mb. If tuberculosis, it infects Correspondence to D.W.C. terization de depends on many specialized, species-specific incrementally increasing capacity and length, the($100) current 30 Mb Ionread PGM: $600 approximately one-third of the e-mail: derrick.crook@ndcls. to current next-generation , man ologies that have been developed (FIG. 1) over decade standard configuration (at the end of 2011) delivers ~300 Gb of raw human population and claims ox.ac.uk Complexity requireover theone extensive knowledge base of clinical per eight-lane flow cell in the form of 100MiSeq: b paired$120 reads.($80) e complete genomes data to be Illumina million lives per year, doi:10.1038/nrg3226 454 Genome Tagging each own 6–8 b index sequence allows at return of the ologists whoitapply labour-intensive, complex a achine. This new technology is sample with its making the most deadly Sequencer: Illumina HiSeq2000: $25 ($25) Published online least 96 samples to be sequenced simultaneously in each lane. 7 August 2012 1 Mb slow techniques to yield the relevant informati bacterial pathogen of humans. bacterial iso Sanger: esigned platforms that are $1,000 (Oxford Nanopore: $25) This approach makes the Illumina HiSeq platform useful and ment is sequencing technology. Ion PGM: $600 ($100) cost-effective for large bacterial sample collections. of sequencing technologies It is clear that for most uses in microbiology, fast, compact 2004 2005 2011 2012 Turnaround time: bacterial genome 602 | SEPTEMBER 2012 | VOLUME 13 NATURE REVIEWS | GENETICS Illumina MiSeq: $120 ($80) bench-top machines will be preferred to the large, high-capacity ghlighting the continuing © 2012 Macmillan Publishers Lim machines TwoReviews such platforms, Nature | Genetics Illumina HiSeq2000: $25 ($25) and reductions in costs. Valuesdesigned for human sequencing. the Ion PGM and the Illumina MiSeq, both of which use established (Oxford Sanger: 60 daysNanopore: $25) n dots are projections for involve preparation amplification 21 May, 2015chemistries that NTNU kursetlibrary MOL8013 - Bacterialand Genomics, NTNU /as St. Olavs Hospital, Trondheim, Norway 9 Data Throughput & Generation Speed MinION Single molecule sequencing! Fast 1 3rd Gen. Seq. 2 3 Cheap 4 High Throughput Metagenomics - no cultivation needed ! the first steps in sequencing, are becoming popular among Emory University, 21 August, 2014 10 11 12 13 21 May, 2015 NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway 0 0% 200 300 400 500 100 1 Fecal bacteria are difficult to classify. 600 700 741 Meta-Genomic Species MetaGenomic Species Pathogens? 8e+05 6e+05 gene richness 4e+05 Gene richness Taxonomy unknown Spirochaetes Synergistetes Euryarchaeota Fusobacteria Lentisphaerae Verrucomicrobia Proteobacteria Actinobacteria Bacteroidetes Firmicutes H. Bjørn 50 gene−rich MGS per sample b 100 150 200 250 300 Taxonomical breakdown of MGS per sample Even phyla are uncertain (using BLAST) for most species. 0 2e+05 Enterotype: R P B Crohn’s disease: >400 metagenome samples, 10 Tbytes, 9 Million genes Nature Biotechnology, 32:822-828, (2014). 14 21 May, 2015 NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway Crohn’s disease Ch la Cy my Chlo an ob dia robi e ac ter riu m Fla v ob act e Planctom ycetes ria Actinobacte c eoba P ro t teria BACTERIA Who is there? ARCHAEA ry Eu ia ta eo a h arc Bacteroidetes tes Spirochae Clostridium s s icute acillu Firm B exi rofl ia s o l ter rmu Ch c ba The ido c A De i Th noc er oc m cu ot s og Aq a uif ica e a ot rc na e Cr e ha The genome sequence is the best ‘unique identifier’…. EUCARYA Gia rdia old S m lime s yce rom Babesia cha Sac Unicellular eukaryotes 15 Protozoans pa Try so no ma Animals Plants Macro-organisms Fig. 1.1 A phylogenetic tree displaying the genetic distances between members of the three superNTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway kingdoms of life: Bacteria, Archaea, and Eucarya. The represented bacterial genera will appear in 21 May, 2015 dtree.ornl.gov 16 21 May, 2015 NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway ~3000 Complete Genomes AAI tree on s u ich i h c eri h c Es r che Es erg f a i ia i col 4 4:H 0 O1 7 7:H 5 O1 17 21 May, 2015 CoZEE Zoonoses Network Autumn Conference 11 November, 2014 i h S a l l ge NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway Comparison of 2084 E. coli genomes Pansize : 89,003 unique protein families coresize: 3,188 for a cutoff of 0.95 for core coresize: 304 for a cutoff of 1.00 for core 18 21 May, 2015 NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway Comparison of 88 E. coli O157 genomes Pansize : 12,136 unique protein families coresize: 4,108 for a cutoff of 0.95 for core coresize: 3,042 for a cutoff of 1.00 for core E. coli O157 genomes range from 4.9 to 6.2 Mbp 19 21 May, 2015 NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway Salmonella enterica Typhimurium DT104 315 S.Typhimurium DT104 isolates sampled from 1969 to 2012 from six continents. 20 21 May, 2015 NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway 297 MDR S.Typhimurium DT104 isolates. Nature Genetics, submitted 2014. 21 21 May, 2015 NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway Dissemination of S. enterica Typhimurium DT104. Nature Genetics, submitted 2014. 22 21 May, 2015 NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway V. cholera BLAST atlas, with 140 genomes PNAS | November 20, 2012 | vol. 109 | no. 47 Haiti non-O1 strains (14 strains, Rita Colwell) Haiti O1 strains (11 strains, Rita Colwell) Nepal O1 strains (24 strains, Paul Keim) Haiti O1 strains (104 strains, CDC) Finished genomes are important! 23 21 May, 2015 NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway 24 21 May, 2015 NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway “Viruses have gotten a bad rap,” said Ken Cadwell, an immunologist at New York University School of Medicine. Koh Y., Wu X., Ferris A. L., Matreyek K. A., Smith S. J., et al. (2013) Differential effects of human immunodeficiency “They don’t cause disease.” virusalways type 1 capsid and cellular factors nucleoporin 153 and LEDGF/p75 on the efficiency and specificity of viral DNA integration. J Virol 87: 648-658 The New York Times, 20 November, 2014 Santini S., Jeudy S., Bartoli J., Poirot O., Lescot M., et al. (2013) Genome of Phaeocystis globosa virus PgV-16T highlights the common ancestry of the largest known DNA viruses infecting eukaryotes. Proc Natl Acad Sci U S A 110: 10800-10805 DNA VIRUSES USE ONLY Routine sequencing of DNA viruses has produced a large number of viral genomes that highlight the remarkable variability of viruses. The differences between the genomes of laboratory strains and clinical isolates of the same virus can be substantial, underscoring the need to routinely 21 sequence clinical isolates . UPMMGSFF64 tUFMtUFDITVQQPSU!JMMVNJOBDPNtXXXJMMVNJOBDPN TSFTFSWFE BZ#FBE9QSFTTD#PU$41SP%"4-%FTJHO4UVEJP&DP("**Y(FOFUJD&OFSHZ(FOPNF"OBMZ[FS )J4DBO)J4FR*OýOJVNJ4FMFDU.J4FR/FYUFSB4FOUSJY4PMFYB5SV4FR5SV4JHIU7FSB$PEFUIF IF(FOFUJD&OFSHZTUSFBNJOHCBTFTEFTJHOBSFUSBEFNBSLTPSSFHJTUFSFEUSBEFNBSLTPG*MMVNJOB*OD DPOUBJOFEIFSFJOBSFUIFQSPQFSUZPGUIFJSSFTQFDUJWFPXOFST rrent as of 30 August 2013 …60–99% of the sequences generated in different viral metagenomic studies are not homologous to known viruses. Viral Detection and Research A reviewMokili of publications et al.featuring 2012Illumina Technology ® References http://res.illumina.com/documents/products/research_reviews/viral_detection_research_review.pdf 25 21 May, 2015 NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway Chiu C. Y., Yagi S., Lu X., Yu G., Chen E. C., et al. (2013) A novel adenovirus species associated with an acute respiratory Ch la Cy my Chlo an ob dia robi e ac ter riu m Fla v ob act e Planctom ycetes ria Actinobacte c eoba P ro t teria BACTERIA ARCHAEA ry Eu ia ta eo a h arc Bacteroidetes tes Spirochae Clostridium s s icute acillu Firm B exi rofl ia s o l ter rmu Ch c ba The ido c A a ot rc na De i Th noc er oc m cu ot s og Aq a uif ica e e Cr e ha Viruses EUCARYA Gia rdia old S m lime s yce rom Babesia cha Sac Unicellular eukaryotes 26 Protozoans pa Try so no ma Animals Plants Macro-organisms Fig. 1.1 A phylogenetic tree displaying the genetic distances between members of the three superNTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway kingdoms of life: Bacteria, Archaea, and Eucarya. The represented bacterial genera will appear in 21 May, 2015 e Poty virid a lo pil Pa e ida An ell ov ir 4050 genomes vir ma ida e es) g ha p ( s le a vir o d u Ca Geminiviridae gvirus r u b r Ma Ebolaus vir Ebola s (pha Caud oviral e s svirale Ge mi niv irid ae ges) Reoviridae Caudovirales (phag es) Herpe 27 h colors: RefSeq viruses 21 May, 2015 Coloring orders would show: Current circle colors: NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway F e a d i r i ilov family Filoviridae Marburgvirus 28 21 May, 2015 NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway Zaire Ebolavirus genomes 1976 1995 / 1996 2007 / 2008 2014 29 21 May, 2015 NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway family Filoviridae Marburgvirus 30 21 May, 2015 NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway Ebolavirus -‐ NC_002549, Marburgvirus – NC_024781 IPR008986 NP VP40 IPR008609 IPR014023-‐IPR026890 IPR014459 L VP30 IPR002561 IPR002953 VP35 IPR009433 GP VP24 Cuevavirus – NC_016144 IPR014023-‐IPR026890-‐PF14314 IPR008986 IPR008609 GP5 VP40 NP IPR002561 IPR002953 GP7 IPR014459 GP4 VP35 VP30 L IPR009433 VP24 Marburgvirus 4 3 Cuevavirus 6 Cuevavirus 7 Ebolavirus 31 21 May, 2015 2 7 Ebolavirus Marburgvirus Functional Domains Full length protein alignments NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway Finished genomes are important! Context matters! Z. ebolavirus 18,959 bp A) T helper Cytotoxic T Lymphocytes genome seq. variation B) C) D) E) GP > F) GP > NP > VP35 > VP40 > GP > VP30 > VP24 > L> G) 0k 2.5k 5k 7.5k 10k 12.5k 15k 17.5k Resolution: 8 C) CTL weak 4 A) Th weak 6 E) Variation fix avg fix avg 0.00 0.00 100.00 B) Th strong 5 75.00 10.00 90.00 21 May, 2015 fix avg CDS + 0.30 0.70 100.00 fix avg 0.00 10.00 Center for Biological Sequence Analysis http://www.cbs.dtu.dk/ 32 G) Percent AT F) Annotations: D) CTL strong 3 fix avg 0.00 fix avg NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway GENOME ATLAS Range: 5850 .. 8350 Z. ebolavirus ZMapp binding 18,959 bp Qiu et al., Nature 2014 A) T helper B) Cytotoxic T Lymphocytes C) genome seq. variation E) D) Filo_glycop > F) Filo_glycop > Filo_glycop > HR1B > HR1D > HR2 > Ebola-like_HR1-HR2 > GP > G) GP > GP > H) 6k 6.25k 6.5k 6.75k 7k 7.25k 7.5k 7.75k 8k 8.25k Resolution: 1 C) CTL weak 4 A) Th weak 6 E) Variation fix avg fix avg 0.00 0.00 100.00 B) Th strong 5 75.00 10.00 F) Annotations: G) Annotations: DOmain ann 90.00 21 May, 2015 fix avg CDS + 0.30 0.70 100.00 fix avg 0.00 10.00 Center for Biological Sequence Analysis http://www.cbs.dtu.dk/ 33 H) Percent AT D) CTL strong 3 fix avg 0.00 fix avg NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway GENOME ATLAS sample air and water supplies for possible biological pathogens. We will use the ORNL Com genomes important! matters! andFinished Data Environment forare Science (CADES) Context infrastructure to manage data from this project. Significance We propose to develop a system for identifying and tracking all microbial pathogens in r Summary far!) time, using genome sequencing of infected (so material from patients at hospitals. Data wil generated on a daily basis using Oxford Nanopore MinION chips (or similar technology), f clinical isolates, as outlined in the figure below. The preliminary work done in this proposal Monitor - bacteria? virus? be done as collaboration between ORNL, Emory University and the Centers for Disease Con Identification Sequences Treat (e.g.,external antibioticfunding) resist?) to inc in Atlanta, Georgia. This can be expanded later (with additional other regional hospitals, and eventually form a network of reporting hospitals across the natio Follow / map outbreaks allow for the monitoring and tracking of epidemiological outbreaks in ‘real-time’, within h of the patient’s visit to the hospital. What is it? How do we treat? Have we seen this? Clinical Sample Genome Sequences 34 Principal Investigator: David W. Ussery 21 May, 2015 Monitor UT#Battelle)Business)Sensitive))))))) NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway 1) 35 Questions: ● How can metagenomics be used in clinical diagnosis? ● Is Ebola ‘rapidly evolving’? ● Why is vaccine development for viruses difficult? 36 21 May, 2015 NTNU kurset MOL8013 - Bacterial Genomics, NTNU / St. Olavs Hospital, Trondheim, Norway
© Copyright 2024