What kind of questions can we ask of data and how to

What kind of
questions can we ask
of data and how to
answer them
Dr. Hulya Farinas
@emirfarinas
© Copyright 2013
2014 Pivotal. All rights reserved.
1
The Quantified Patient
Medical History
Genetics
Family
History
Imaging
Clinical
Narratives
Medications
Molecular
Diagnostics
Lab tests
Environment
© Copyright 2014 Pivotal. All rights reserved.
Sensors
& Mobile
2
Integrative patient phenotype
Medical
history
© Copyright 2014 Pivotal. All rights reserved.
Genetics
Imaging
Clinical
narratives
Labs
3
Targeting for Breast Cancer Treatment
Getting the whole picture improves predictive power
True positive rate
• More data from different sources
• Provides a more complete view
• Improves statistics and inference
Medical
History
False positive rate
Medical
History
Clinician
Notes
Medical
History
Clinician
Notes
Genetics
Medical
History
Clinician
Notes
Genetics
Imaging
Is more data always better?
ICD-9-CM
428.0  Congestive Heart Failure
Can we trust this diagnosis?
Demographics
• 25 year old
• Female
Lab Orders
• no Complete Blood Count
• no electrolytes (NA, K, Mg)
• no Serum Creatinine
• no Blood Urea Nitrogen
Patient Medical History
• No known history of
cardiovascular diseases
• No known respiratory
illnesses
Medications Given
• no anticoagulants
• no antiplatelet agents
• no beta blockers
• no diuretics
• no statins
Having access to multiple sources of data allow us to interrogate and clean the data,
employ smart imputation methods when data completeness is an issue
© Copyright 2014 Pivotal. All rights reserved.
5
What questions to ask ?
Clinical Decision Support
Resource Allocation
Patient Behavior
Operations
Diagnosis with patient history,
medical images, gene
expression etc.
Risk stratification based on
predicted health care spend
Identify usage patterns i.e.
ER Frequent Flyers
Physician Benchmarking
Comparative effectiveness
research
Risk stratification for
mortality and hospital
readmissions
Adherence to medication,
exercise regiment
Mining charts for reporting
compliance
Mining biomedical literature
Risk stratification for hospital
acquired infections
Behavioral Risk Assessment
for Surgical Outcome
Revenue Assurance: Claims
processing
Mining EHR to drive current
visit outcomes
Predict Emergency Wait
Times
Alert for known adverse events
and discover new ones
Predict Disease
Prevalence/Outbreaks
Fraud Detection with
Accounts Payable
Identifying Gaps in Care/
Proactive Care
© Copyright 2014 Pivotal. All rights reserved.
6
APPS
PEOPLE
Where to start* ?
Hospital Admins
Hospital Staffing
Scheduling
DATA
MODEL
Discharge
Nurses/PA
Gaps in Care
Surface Relevant Literature
Coordinate Care
Healthcare Cost
Readmission
Length of Stay
Encounter
Procedures
Lab Results
*Not a comprehensive list
Mortality
Diagnoses
Medication
Chronic Disease Mgmt
Therapy Recommender
Adverse Events
Differential Diagnosis
Risk Assessment
EDIP
Bed History
Monitor Feeds
Engagement
Comparative Effectiveness
Personalized Medicine
Biomedical Lit. Matching
Orders
Patients
Surface Relevant EMR
Compliance
Hospital Census
Bed Mgmt
Physicians
Medical Images
Clinician Notes
Sequence Data
Intervention Design
Pharmacovigilance
Telemetry
Literature
Environment
Self Reporting Data
Molecular Diagnosis
How to solve these problems?
1. Acquire the
Right Technology
© Copyright 2014 Pivotal. All rights reserved.
2. Hire Data
Scientists
3. Establish the
Path for
Operationalization
8
Step 1 - Acquire the Right Technology
REQUIREMENTS
 Able to deal with structured and
unstructured data sources
 Scalable
Anticipate all the
questions
we may ask to data
one day
 Computation needs to be closer to data
 Vendor should have data science
services that can help with technology
adoption
© Copyright 2014 Pivotal. All rights reserved.
Data
Modeler
9
Technology Adoption Journey of a Major
Healthcare Provider
Prove that
better
technology
can speed up
discovery
Prove that
better
technology
can improve
model quality
•Length of Stay
Modeling
Prove that
technology is
accessible to
my clinicians
and
researchers
Prove that
data science
can help in
areas other
than clinical
analytics
Prove that,
once trained,
our scientists
can get to
insights as
quickly as the
Pivotal DS
team
•Fraud Detection for
Accounts Payable
•EDIP Modeling in 4
days
•Comorbidity
Feature Generation
App
• Code-a-thon
© Copyright 2014 Pivotal. All rights reserved.
10
Step 2 - Hire Data Scientists
(or Teach New Tricks to Your Data Analysts/Scientists)
 Data analysts approach at a provider is
typically hypothesis driven
Problem
Definition
Interview
Clinicians
for
Potential
Features
Interview
Data
Owners to
Locate
Data
Try Out
Different
Model
Forms
Try Out
Different
Model
Forms
Validate
Findings
with
Experts
 Data Driven Approach
Problem
Definition
Build
Hundreds
of
Features
© Copyright 2014 Pivotal. All rights reserved.
11
Problem Statement
 Motivation
– Excessive LOS are estimated to be 5-6% of
hospital budgets
– Knowing when a patient would be ready for
discharge allows better resource allocation
decisions
 Business Objective
– Predict the length of stay conditioned on things
that are known at the time of admission
 Scope
– Patients whose admission diagnosis is AMI
© Copyright 2014 Pivotal. All rights reserved.
Level 1 LOS Modeling:
Conditioned on Patient Demog,
Medical History, Hospital
Characteristics, Admission
Type/Source/Time
Level 2 LOS Modeling
+ Admission Diagnosis
Level 3 LOS
+ Monitor Feeds
Procedures data collected
during hospitalizations
12
Framework for LOS Modeling
Level 1 LOS Modeling:
Conditioned on Patient Demog,
Medical History, Hospital
Characteristics, Admission
Type/Source/Time
Classifier for
Discharge Type
Classifier
LOS>120 hrs
Conditional LOS
By Hour
Ensemble Model
Level 2 LOS Modeling
+ Admission Diagnosis
Class
Probabilities
Conditional
LOS Prediction
Linear Model
Level 3 LOS
+ Monitor Feeds
Procedures data collected
during hospitalizations
© Copyright 2014 Pivotal. All rights reserved.
Final Prediction
13
Model Fit
LOS<=120
Cond. LOS by Hour
Predicted probability of transfer
Classification
Actual discharge=transfer
© Copyright 2014 Pivotal. All rights reserved.
14
Sample insights from the modeling
Admissions By Hour
Variance Explained When Category
Excluded
Discharge By Hour
Current
Admission
Lab
Med.
History
Demog
Hospital
LOS for AMI has Markov property like qualities:
Recent observations are more predictive than
patient medical history
© Copyright 2014 Pivotal. All rights reserved.
LOS is not only biology. It is also operations!
Admission Time, Day of Week, hospital’s size and
experience with cardiology matter.
15
Step 3 - Operationalization
• 83,767* articles on sepsis
•
9,644* articles on readmissions
• 362,710* articles on drug-drug interaction
• 560,442* articles on cancer treatment
When we visit our doctors we should all expect to benefit
from the latest discoveries without requiring our physician
to be a superhero.
Dissemination of insights is as important
as extracting the insight from data.
Based on NCBI PubMed search in April 2014
© Copyright 2014 Pivotal. All rights reserved.
16
Rules of Operationalization
 Understand
–
–
–
–
Target user groups
Their attitudes towards data science
What decisions they make
When they make these decisions
 As much as possible integrate analytics
into users workflow seamlessly
 Iterate!!!
– Living models
– Living/evolving applications
© Copyright 2014 Pivotal. All rights reserved.
17
What Matters: Apps. Data. Analytics.
Apps power businesses, and
those apps generate data
Analytic insights from that data
drive new app functionality,
which in-turn drives new data
The faster you can move
around that cycle, the faster
you learn, innovate & pull
away from the competition
© Copyright 2014 Pivotal. All rights reserved.
18
BUILT FOR THE SPEED OF BUSINESS