Challenges of Big Data in Health

Challenges of Big Data in Health
Toshiba – JH Collaboration
Scott L. Zeger
Professor and Director
Johns Hopkins inHealth Initiative
© 2014 Hopkins inHealth
• 40 year old man, no family history, tests “positive”
in a cancer screening test
• What is his cancer state?
Data from population of “similar” people
True cancer
status
Test result
Yes
No
Total
Positive
15
985
1,000
Negative
5
8,995
9000
Total
20
9,980
10,000
© 2014 Hopkins inHealth
© 2014 Hopkins inHealth
“Big Data” – large, complex data sets that challenge
current data science knowledge/skills that hold out the
promise of hidden truths –> “data mining” for “gold”
© 2014 Hopkins inHealth
Boole - a - Bayes
Thomas Bayes 1701-1761
George Boole 1815-1864
© 2014 Hopkins inHealth
Radar to Land Aircraft Safely at BWI
N
Measurements
Symbol of type
of aircraft
Smoothed
track updates
Initial Radar
Measurements
(radar echoes)
Filtered (smoothed)
“track” updates
Indicates
expected
future
direction
and speed
© 2014 Hopkins inHealth
Tracking
through
changing
trajectory
Display of
altitude
Data:
Track No. 1234
Position (x,y,z)
Speed and direction
Acceleration
Identification
Track State Analogy
location
genotype
velocity
phenotype
acceleration
environment/behavior
accuracy
uncertainties
biomarkers/labs/
images
identification
identification
© 2014 Hopkins inHealth
Clinical and
Public
Health
Discovery
Improved
Health at
More
Affordable
Costs
Scale
and
replicate
Scale
and
replicate
Biohealth Pilot Projects
Cancer screening
OncoSpace in
Radiation Oncology
Population Health Demonstration
Cancer screening and
early diagnosis
Obesity and Diabetes
Children’s asthma
prevention and
control
Cardiovascular
disease diagnosis and
treatment
Management of
autoimmune diseases
Cardiovascular
disease
Genomics of
cystic fibrosis
Myostatin in
sarcopenia
Age-related cognitive
loss
Telomere biology
and chronic diseases
Methodology Cores
Health
measurement
Bioethics
Big Data and
software
solutions
Statistical
design and
analysis
© 2014 Hopkins inHealth
Behavior
change and
dissemination
Finance –
organization
models
A data system designed to
individualize radiation therapy
Todd McNutt, Kim Evans, Joe Moore, Harry Quon, Joseph Herman,
Andrew Sharabi, Wuyang Yang, John Wong, Theodore DeWeese
Disclosure:
Funding from Elekta and Philips
© 2014 Hopkins inHealth
Sample Automated Radiation Plan
Original
plan
Automated plan
30% reduction in dose to parotids
1
0.9
0.8
Auto plan
Original Plan
Dot: right
No-dot: left
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
10
20
30
40
Dose(Gy)
50
60
70
10
© 2014 Hopkins inHealth
80
© 2014 Hopkins inHealth
Model Framework for Infections (I) and Pathogen
Blood LytA PCR
Measurements (M)
measure, M
B,PCR
External to
the lung
Blood
infection, IB
A/C antibodies
measure MSERA
NP measure
MNP, PCR
NP INP
NP measure
Mnp, CX
Blood culture
measure, MB,CX
Sputum ISP
Lung aspirate culture and PCR
measure*, ML,CX and ML,PCR
Sputum measure
MSP, CX
Gastric aspirate*§
MGA, TB CX
Pleural fluid culture and PCR
measure*, MPF, CX and MPF, PCR
* collected on a subset of cases
§ TB testing only
Sputum measure
MSP, PCR
© 2014 Hopkins inHealth
Measurement of pathogen/evidence of infection
Pathogen direction of travel
Potential non-pneumonia causes of pathogen detection
Sample collected in both cases and controls
Unknown
regression
coefficient γ
Covariates for
adjustment
(X)
Population
Pathogen
Distribution
Unobserved
Lung Infection
(I)
Infection
Measurements
(M)
Prob(M|Case)
Prob(M|Control)
© 2014 Hopkins inHealth
Clinical
Pneumonia
(Y)
DRAFT
© 2014 Hopkins inHealth
DRAFT
© 2014 Hopkins inHealth
DRAFT
© 2014 Hopkins inHealth
All that glitters….
© 2014 Hopkins inHealth
Beware the Texas Sharpshooter
© 2014 Hopkins inHealth
Congratulations to Toshiba
Welcome to Johns Hopkins!
Thank You
© 2014 Hopkins inHealth