Sequence-Based Approaches for Uncultivated Microbes

Sequence-based
approaches to
uncultivated microbes
Susannah Green Tringe
DOE Joint Genome Institute
Metagenome Program Lead
Department of Energy: Mission areas
Bioenergy
Carbon Cycle
DOE Joint Genome Institute
Walnut Creek, CA since 1999
Mission: To advance genomics in support of the DOE mission
Biogeochemistry
Uncultivated microbial
communities
Wetland
carbon cycling
Root-associated
communities
Uncultivated microbial
communities
Wetland
carbon cycling
Root-associated
communities
JGI Programs & Infrastructure
Bioenergy
Metagenomes
DNA
Sequencing
Carbon Cycling
Plants
Fungi
Biogeochemistry
Microbes
Genomic
Computational
Technologies
Analysis
Synthesis
Synthesis
What is metagenomics?
Isolate (pure culture)
Microbial community
Genomics
Metagenomics
tame
wild
≠
Most bacteria don’t grow on plates:
“the great plate count anomaly”
16S ribosomal RNA as a phylogenetic marker
21 proteins
16S rRNA
30S
70S Ribosome
subunits
50S
34 proteins
5S rRNA
23S rRNA
Escherichia coli 16S rRNA
Primary and Secondary Structure
Falk Warnecke
16S-based phylogeny
Carl Woese 1928-2012
Woese Microbiol Rev 1987
Norm Pace
16S rRNA in environmental microbiology
Falk Warnecke
Bacterial phylogenetic tree expansion
cultured
uncultured
Modified from Baker et al 2013 (Microbe)
The situation is similar for archaea
Baker et al 2013 (Microbe)
Why metagenomics?
Industrial
enzymes
Most microbes are uncultured!
Antibiotics
Greenhouse gas cycling
Suizenbacher et al 1997; www.chm.bris.ac.uk;
Functional metagenomics
Gillespie 2002
Turbomycin synthesis genes
isolated from a soil metagenome library
Schloss & Handelsman 2003
Discovery of proteorhodopsin
de la Torre, J. R. et al. (2003) Proc. Natl. Acad. Sci. USA 100, 12830-12835
Rhodopsin-like gene –
never before seen in bacteria!
Copyright ©2003 by the National Academy of Sciences
Bacterial rRNA operon
Shotgun metagenomics
Shotgun sequencing:
Could it be done?
Schloss & Handelsman 2003
Metagenome assembly - like putting together
several jigsaw puzzles
Falk Warnecke
. . . with some pieces missing
Falk Warnecke
Can we still reconstruct?
Falk Warnecke
Can we still reconstruct?
Falk Warnecke
One approach: a simple community
anaerobic
aerobic
effluent
influent
return sludge
EUB
sedimentation
tank
waste sludge
PAO
Enhanced Biological Phosphate
Removal (EBPR) reactors are
dominated by Candidatus
Accumulibacter phosphatis (CAP) but
it cannot be grown in pure culture
Crocetti et al., 2001
Proposed biological model for polyPaccumulating organisms (PAOs)
Anaerobic zone
Cell
PHA
NAD
Aerobic zone
Cell
PHA
NADH
Glycogen
Acetyl-CoA
Glycogen
ADP
ADP
PolyP
PolyP
ATP
ATP
PO4
TCA
cycle
acetate
PO4
CO2
CAP metabolic reconstruction
Anaerobic Phase
Aerobic Phase
Garcia Martin et al. 2006
What about more complex communities?
EBPR sludge
1
Sargasso Sea
10
100
Soil
1000
Species complexity
?
10000
A
Adaptive gene for habitat A
Adaptive gene for habitat B
Essential gene
B
Environmental Gene Tags
(EGTs)
Comparative metagenomics
COG5524:
Bacteriorhodopsin
COG1292: Cholineglycine betaine transporter
COG3459:
Cellobiose phosphorylase
Tringe et al 2005
Metagenome sequence output
100 Tb
Sequence output (Tb)
100
90
JGI Sequenced Bases
80
Metagenome bases
70
60
50
40
30 Tb
30
20
40 Gb
10
0
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Fiscal Year
meta-GENOMEs
Hess 2012: 15
rumen genomes
Wrighton 2012:
3 uncultivated phyla
49 genomes
Iverson 2012:
Marine Group A
Annotation of metagenomes
Shotgun
data
Singlet fasta
ORF prediction
Contig fasta
Assembly
RNA prediction
Contig stats
Jill Banfield, UCB
Chongle Pan, ORNL
Contig
Depth
G+C
Contig1
10.1
.49
Contig2
5.6
.35
Contig3
19.8
.38
Genome binning,
extraction, improvement
ORF prediction
RNA prediction
The single-cell approach: how it works
isolation
lysis
+
polymerase
+
random hexamers
dNTPs
+
template DNA
MDA
Tanja Woyke
Single cell genomics: key challenges
CHALLENGE
isolation
lysis
Sample contamination
(‘hitchhiker’ DNA)
No universal lysis for all taxa
Chimerism
MDA
Reagent contamination
MDA bias
Tanja Woyke
JGI single-cell sequencing pipeline
Whole genome
amplification
Single cell
isolation
Sample
Community
profiling
16S rRNA gene
identification
Genome
sequencing
Tanja Woyke
Draft genome
Assembly
Data QC
Annotation
Data curation
Current limitations: 100 cells ≠ 100 SAGs
Whole genome
amplification
Single cell
isolation
Sample
Community
profiling
???
16S rRNA gene
identification
o Not every cell can be isolated
Draft genome
o Not every cell can be lysed and WGA’d
o Not every cell can be 16S ID’d
Genome
sequencing
Tanja Woyke
Assembly
Data QC
Annotation
Data curation
Recovered diversity: 16S tags vs SAGs
Tanja Woyke
Modified from Clingenpeel et al 2014 (Frontiers in Microbiol)
marker genes
culture
draft/ complete
genomes
Tanja Woyke
single cell
partial draft genomes,
complete genomes
(rarely)
target cell enrichment
metagenomics
draft genomes,
complete genomes
unassembled data,
genome bins,
complete genomes
culture
Tanja Woyke
single cell
target cell enrichment
metagenomics
metagenomic
approach
single-cell
approach
Tanja Woyke
marker genes
Uncultivated microbial
communities
Wetland
carbon cycling
Root-associated
communities
Wetland restoration
Global
carbon
Carbon
stored by
wetlands
(35% of all
terrestrial
carbon)
Wetlands
(9%)
Global
land area
CH4
CO2
Wetland restoration
What determines if a wetland serves as a
greenhouse gas source or a greenhouse gas
sink?
CH4
CO2
San Francisco Bay wetlands
PAST
Wetland
PRESENT
Farmland
Salt pond
www.wrmp.org
Subsidence and carbon loss
1850s
Elevation loss
Levee failure
Mount and Twiss 2005
Subsidence and carbon loss
1850s
Elevation loss
Levee failure
Mount and Twiss 2005
Twitchell Island restored wetland
-
Est. 1997
Peat accretion: ~4 cm/year
Net GHG budget: -494 g CO2 eq/m2
Shaomei He
Mark Waldrop
Lisamarie Windham-Myers
Twitchell
wetland
Sampling site gradients
A
Site
B
C/L
Methane
flux
Water
inlet
Do the microbial communities reflect
these changes in geochemistry?
Peat
accretion
Oxygen,
Nitrate,
Sulfate
Water
outflow
What controls methane?
CH
CH 4
4
Abundance
Activity
Species composition
Relative Gene Family Abundances
Samples with more
methanogenesis
genes have fewer
genes in
denitrification,
dissimilatory sulfate
reduction, and metal
reduction.
Methane oxidation
genes were more
abundant in
rhizomes.
Methanogen abundance
DNA abundance
Methanogen marker
genes from
metagenome
RNA abundance
(÷2)
CH4
CH4
- Methane emissions
correlated to methanogen
ABUNDANCE and
ACTIVITY
Methanogen diversity
Hydrogenotrophic Methanogenesis:
CO2 + 4 H2 → CH4 + 2 H2O
Acetoclastic Methanogenesis:
CH3COOH → CH4 + CO2
Hydrogenotrophic
Methanomicrobiales (order) [H]
Aug_C
Methanosaetaceae;Methanosaeta [A]
Methanobacteriaceae;Methanobacterium [H]
Feb_L
Methanobacteriales (order) [H]
Methanosarcinaceae;Methanosarcina [A]
Aug_B
Methanocellaceae;Methanocella [H]
Methanomicrobiaceae;Methanofollis [H]
MSBL1;SAGMEG-1 [H]
Feb_B
Methanospirillaceae [H]
Methanococcales (order) [H]
Aug_A
Methanosarcinaceae;Methanomethylovorans [A]
Methanobacteriaceae;Methanobrevibacter [H]
Feb_A
Methanobacteriaceae;Methanosphaera [H]
pMC1;pMC1FA [H]
0%
50%
Acetoclastic
100%
Bay / Delta Salinity Gradient
Historic and restored wetlands
sampled along a salinity
gradient
Methane flux varies
with salinity and wetland age
Salinity (ppt)
Average seawater
Susie Theroux
Uncultivated microbial
communities
Wetland
carbon cycling
Root-associated
communities
Rhizosphere Grand Challenge
1) Rhizosphere / endophyte microbes
-provide nutrients
-protect from pathogens and stress
-influence growth
-sequester carbon
2) Challenges
-soil microbial communities are notoriously
complex
-plant genomes are complex
-crosses multiple disciplines and programs
-statistical rigor requires high sample
numbers
52
Arabidopsis rhizosphere project
Rhizosphere
Endosphere
Soil
Are there unique communities
in each compartment?
Does the plant control access?
53
Rhizosphere Grand Challenge
Identifying the major determinants of microbial
community assembly
Host factors
Root-associated
microbial
communities
Full factorial design
1117 samples
16S pyrotag profiles
Variables investigating:
Soil type – Mason Farm vs. Clayton
Sample fraction – Bulk soil vs. rhizosphere vs. endophyte
Plant age – bolting (young) vs. senescent (old)
Genotype – 8 ecotypes
Individual – Aim for 10 individuals per condition
54
Jeff Dangl
The Arabidopsis microbiome
The endophyte community is
unique and reproducible and similar
across soil types
OTUs
Lundberg
Nature 2012
Sample type
Rhizosphere/
Soil 1
Rhizosphere/
Soil 2
Endophyte
The Arabidopsis Microbiome
Cultured isolates
Single cells
“Plate scrape” metagenomes
Flow-sorted
“mini-metagenomes”
Sphingobacteriales
OTU 2324
Streptomyces
OTU 14834
Pseudonocardiaciae
OTU 13797
Dangl lab, UNC
Woyke lab, JGI
An endophyte genome catalog
Actinobacteria
99 isolates and 130 SAGs
fully sequenced
>50% of target OTUs
Firmicutes
Alphaproteobacteria
Cyanobacteria
Single Cells
Isolates
Bacteroidetes
Chloroflexi
Betaproteobacteria
S. Clingenpeel
Recolonization and RNA-seq
Full phosphate
No bacteria full P
+ Bacteria full P
Low phosphate
No bacteria low P
Sterile
+ Bacteria low P
Colonized
Inoculation with
root-associated
microbes reverses
low P phenotype
Conclusions
• Most microorganisms cannot be readily
grown in the lab
• Nucleic acid sequencing provides a means
to study uncultivated organisms via their
genomes
• Next-gen sequencing is opening up new
opportunities in metagenomics and single
cell genomics
• These techniques are enabling advances in
diverse areas of science
Acknowledgments
• EBPR sludge project
– Trina McMahon, UWM
– Phil Hugenholtz
• Wetlands project
– Shaomei He
– Lisamarie Windham-Myers,
USGS
– Mark Waldrop, USGS
– Susie Theroux
– Wyatt Hartman
– Dennis Baldocchi, UCB
• Rhizosphere project
– Jeff Dangl, UNC
– Scott Clingenpeel
– Tanja Woyke
JGI: Next Generation Genome Science User Facility
• Community Science Program
• JGI-EMSL Collaborative Science Program
• Emerging Technologies Opportunity Program
(ETOP)
• Visiting Scientist Program
• Distinguished Postdoctoral Fellow in Genomics
Questions?
Contact: [email protected]
Genomic signatures of plant association
Taxon
# PA
genomes
# NPA
genomes
# total
genes
Bacillus
28
141
852,695
Burkholderiales
60
144
1,072,509
Microbacteriaceae
18
27
140,326
Micrococcaceae
13
22
117,775
Nocardiaceae
13
37
315,053
Paenibacillus
11
29
226,631
Pseudomonas
152
77
1,242,490
Rhizobiales
170
152
1,717,437
Sphingomonas
7
12
73,946
Streptomyces
9
65
513,229
Xanthomonadaceae
93
19
433,488
Comparing genomes of
phylogenetically related plantassociated and non-plant-associated
bacteria to identify gene families
overrepresented in plant-associated
(PA) organisms
1300 genomes (6.5M genes)
included in analysis
Recurrent plant-associated functions include chemotaxis, certain
transposases, carbohydrate transport and metabolism, type III
secretion systems, and nodulation
Asaf Levy
64