Big data meets Darwin’s “entangled bank” – Changing perspectives on plant distributions

Big data meets Darwin’s
“entangled bank” – Changing
perspectives on plant distributions
Robert Peet
US-NVC
SEEK
BIEN
IAVS & EVS

Site data: e.g. climate, soils, topography, etc.

Taxon attribute data: e.g. phylogeny,
distribution, life-history, functional
attributes, etc.

Occurrence data: attributes of individuals
and taxa that occur at a site.

Ecoinformatics vision
 Mobilization of biodiversity data
 Examples over 15 years
VegBank and the US-NVC (1997-2015)
 IAVS (2004- 2015)
 SEEK (2003-2006)
 BIEN (2008-2015)

Taxonomic database challenge
The well-known problem:
Integration of data from different
times & places, by multiple investigators using
varied taxonomic standards.
The well-known solution:
Identifications to taxon concepts that have mapped
relationships to related concepts.
AZ NM
CO WY MT AB eBC
wBC WA OR
Distribution
Abies lasiocarpa
var. arizonica
Abies lasiocarpa var. lasiocarpa
USDA - ITIS
Abies bifolia
Abies lasiocarpa
Flora North America
A
Minimal concepts
B
C
Andropogon virginicus complex
9 elemental units; 17 concepts, 27 scientific names






1993: ESA Vegetation Panel & FGDC Mandate
1997: FGDC Standard, Version 1
1997: ESA Vegetation Panel Data Committee
2000: VegBank (ESA) & BIOTICS (NatureServe)
2008: FGDC Standard, Version 2
2013: MOU with USFS, USGS, ESA, NatureServe
www.vegbank.org
VegBank – the ESA Plot database
• The ESA Vegetation Panel maintains
VegBank (www.vegbank.org)
as a public vegetation plot archive.
• VegBank is expected to function in a
manner analogous to GenBank.
• Primary data will be deposited for reference,
novel synthesis, and reanalysis.
• Globally, many plot databases.
Project
Plot
Core elements
of VegBank
Plot
Observation
Taxon / Individual
Observation
Taxon
Interpretation
Plot
Interpretation
The basis for VegX, the
new international XML
data exchange standard
Interpretation
Plants
• Tax Interpretation
• Taxon Alt
• Linkage to concept
database
Communities
• Class event
• Multiple interpretations






1993: ESA Vegetation Panel & FGDC Mandate
1997: FGDC Standard, Version 1
1997: ESA Vegetation Panel Data Committee
2000: VegBank (ESA) & BIOTICS (NatureServe)
2008: FGDC Standard, Version 2
2013: MOU with USFS, USGS, ESA, NatureServe
European Vegetation Survey
GIVD
VegX
sPlot
Arctic Vegetation Archive
Wiser, Spencer, De Caceres, Kleikamp, Boyle & Peet.
2011. J. Vegetation Science 22: 598-609.
>2,200,000 plot records in >125 databases
Dengler, Peet, et al. 2011. J. Vegetation Science 22:582-597
Science Environment for Ecological Knowledge
Multidisciplinary project to create:
Scientific-workflow system (Kepler)
 Design, reuse, and execute scientific analyses
Distributed data network (EcoGrid)
 Environmental, ecological, and systematics data
KR & Semantic Mediation
 Discover, integrate, and compose hard-to-relate data and services
via ontologies
Taxonomic concept services
 Resolve taxon ambiguities
Collaborators (the SEEK team)


NCEAS, UNM, SDSC/UCSD, U Kansas
Vermont, Napier, ASU, UNC

Benefits of the Taxonomic Object Service
 Allows integration of ecological datasets
 Allows taxonomists to author new ideas, make new
connections
 Allows all researchers to see previous taxonomic
opinions
 Provides a stable identification system to reference
taxon concepts

Document and manage taxon concepts
from multiple sources

Document and manage concept
relationships from multiple sources

Input data files as txt, xls, mdb, or TCSXML

Export data as txt, mdb, or TCS-XML
Case study: Southeast US
1. Regional floras obsolete and incomplete.
Need for an updated atlas of the flora of the
Southeast
2. Datasets with inconsistent taxonomic
concepts have defied integration
3. 65,000 concept relationships &
2,000,000+ taxon occurrences
http://www.herbarium.unc.edu/seflora/firstviewer.htm
NCU
RAB
USDA
CVS
Carya carolinae-septentrionalis

According to Radford 1968, USDA
PLANTS v 4.0, & Weakley 2008
 Carya carolinae-septentrionalis
 Carya ovata

According to Stone 1997 in FNA
 Carya ovata var. australis
 Carya ovata var. ovata






Weakley 2005 – Reference concepts
Radford 1968 – Concepts mapped
NC Heritage Program – Weakley concepts
CVS – Weakley concepts (mostly)
USDA – Kartesz 1999 concepts (mostly)
NCU & NCSC – Nominal concepts only
Most museum collection identifications must
be interpreted as nominal concepts!! To do
otherwise would be to introduce false
positives.
Some nominal occurrences might or
might not represent the taxon
Carya carolinae-septentrionalis




Choice of primary authority not available
Versioning by date not available
Many collections and observations not digital
Most collections still identified as nominals
BIEN Working Group (2008–2014)
Ecologists, Informaticians, Plant Taxonomists
Botanical Information
and Ecology Network
2012
Principal Investigators
Brad Boyle, U Arizona
Richard Condit, STRI
Steven Dolins, Bradley U
Brian Enquist, U Arizona
Robert Peet, U North Carolina
Mark Schildhauer, NCEAS
Barbara Theirs, NY Bot Garden
Core Participants
John Donoghue, U Arizona
Peter Jorgensen, Missouri Bot Garden
Nathan Kraft, U Maryland
Aaron Marcuse-Kubitza, NCEAS
Brian McGill, U Maine
Naia Morueta-Holme, Aarhus U, DK
Martha Narro, iPlant
Bill Piel, Yale U
Jim Regetz, NCEAS
Brody Sandel, Aarhus U, DK
Irena Simova, Charles U, CZ
Nick Spencer, Landcare NZ
Jens C. Svenning, Aarhus U DK
Cyrille Violle, CNRS FR
Susan Wiser, Landcare, NZ


Goal: Understand and predict the
occurrence and co-occurrence of
plant taxa in the New World.
Approach: Collect in one database all the
known plant occurrence and occurrence
data.
> 14,000,000 plant occurrence records,
representing ~100,000 taxa.
Data Sources
Cyberinfrastructure
Plot and Trait Data
TAXONOMIC
PHYLOGENETIC
INTELLIGENCE
DATA SCRUBBING
CORRECTING,
Data Standardization &
feedback Tools
Specimen Data
Exchange
schema
Database BIEN 2.0
Data
Discovery
Confederated resource
BIEN 3.0
Science ! Deliverables
Use of raw
biodiversity
observation data
will yield seriously
erroneous results
BIEN Plant Species
Richness (Pre-scrubbing)
http://tnrs.iplantcollaborative.org/
BIEN 2.0 New World Summary
(post Geo + Taxonomic scrubbing)
Plot Data
- CTFS
- FIA (conservative)
- Madidi plots
- Vegbank
- TEAM
- SALVIAS
- Many others . . .
Herbarium and Observation Data
- GBIF
- MOBOT
- NYBG
- CRIA (Brazil collections)
- Arizona
- UNC, NCS etc.
- REMIB (Mexico)
- Utrecht
- Many others . . .
Specimens = 9,345,197
Species = 92,788
Observations = 12,171,014
Plots = 329,741
Plant Traits
- Numerous literature sources
- BIEN researcher data
Traits = 27
Trait observations = 109,659
bien.nceas.ucsb.edu/bien/
BIEN 2 data and deliverables (geographic range maps & phylogeny) are available for use!
BIEN Plant Species Richness from Geographic Range Overlap
Number of Plant
Species
BIEN 3.0 Schema
A perpetual motion
machine?
Individual data sources—or the
entire database—can be loaded
and re-loaded rapidly, allowing
updates as data sources are
modified or grow, or new sources
are acquired.
Aaron Marcuse-Kubitza
NCEAS
Brad Boyle, UofA
BIEN3.0 results soon to come!

Ecoinformatics vision
 Mobilization of biodiversity data
 Examples over 15 years
VegBank and the US-NVC (1997-2015)
 IAVS (2004- 2015)
 SEEK (2003-2006)
 BIEN (2008-2015)

We are pleased to acknowledge
the support and cooperation of
Gap Analysis Program