Ebola: Nonparametric Survival Analysis Without Life Data November 16, 2014

Ebola:
Nonparametric Survival Analysis
Without Life Data
November 16, 2014
P[Time from case report to death > k]
P[Death in kth|Survive for k-1]
Extrapolate case reports
Health-Care Workers CFR?
Standard deviations
Caseload Forecasts
21 Days Incubation?
Why?
• Speculations: 70% die? Need 262 beds in Guinea
by Dec. 1? 1000 new cases per week by end of
year? 200,000 –250,000 cases by Jan. 20? 21-day
incubation?
– How long to onset? death? CFR and distributions
– How long to release? Empirical distribution [NEJM]
– How many cases in treatment each week?
• Forecast case reports and caseloads to justify
$$$, equipment, and personnel
• Compare countries? Treatments? Health-Care
workers?
Data
• http://www.who.int/csr/disease/ebola/en
– Cumulative case reports = confirmed + probable +
suspected and death counts
– “The total number of cases is subject to change due to
reclassification, retrospective investigation, consolidation of
cases and laboratory data, and enhanced surveillance. ”
• Used weekly counts to smooth corrections, unsuccessfully
– Cumulative death counts
• Days: incubation, onset, hospitalization, release…
[http://www.nejm.org/doi/pdf/10.1056/NEJMoa
1411100 appendix]
Methods
• Regress weekly case reports on time: linear, logarithmic,
and piecewise both
• Nonparametric ccdf (survivor function) estimates of time
from case report to death from WHO counts
– Maximum likelihood assuming nonstationary Poisson case
reports [George and Agrawal]
– Constrained least squares [Harris and Rattner, Gang, George]
• minS|observed weekly deaths  estimated weekly deaths|2 [Gang]
• Subject to one or both of…
– S observed total deaths = S estimated total deaths
– P[Time to death ≤ Now  first case date]  deaths/cases
– Constrained maximum entropy Sp(t)ln(p(t)) [Tribus]
• Incubation time, Length-of-Stay (LoS in hospital) conditional
empirical cdfs, from NEJM article appendix
Weekly Case Reports
• Linear and logarithmic regression of weekly
case reports Y on T, days. Heteroskedastic!
– Notice R2 values for alternative models?
Y=bmT
Y = mT+b
m, R2
Guinea
R2, seY
Liberia
R2, seY
Sierra Leone
R2, seY
b, seY
3.12
-4.34
0.36
44.66
18.97 -144.97
0.35 268.62
23.25 -78.85
0.47 184.68
m, R2
1.0570
0.18
1.2884
0.71
1.1342
0.56
b, seY
9.89
1.27
0.20
1.70
23.39
0.84
Piecewise Regression
• Weekly case report rates = slope
• Rates increase after knot points
– Not splined, chose knot points for min SSE
– Slope coefficients = case reports/week!
Country
Before\After
Slope per
week
R2, Linear
R2, Logarithmic
Guinea
7/20/2014
-1.36
0.17
0.36
Guinea
7/27/2014
5
0.17
0.13
Liberia
8/3/2014
3.31
0.58
0.4
Liberia
8/10/2014
8.9
0.01
0.03
Sierra Leone
8/10/2014
8.7
0.33
0.10
Sierra Leone
8/17/2014
33
0.26
0.23
Cumulative Case Reports, Forecasts,
and 95% Upper Prediction Limits
• Extrapolations are averages and prediction
limits: 20 bootstraps of 36 weeks for Guinea
3500
3000
2500
Case Reports, Linear extrap
2000
Case Reports, Log extrap
Case Reports, 95% upper Linear extrap
1500
Case Reports, 95% upper Log extrap
1000
500
0
3/16/2014
6/14/2014
9/12/2014
12/11/2014
“Survivor” Function Estimates:
CCDF of Weeks from Case Report to Death
1.0
0.9
0.8
0.7
0.6
Guinea
0.5
Liberia
0.4
Sierra Leone
0.3
0.2
0.1
0.0
0
4
8
12
16
20
Weeks from case report to death
24
28
32
36
Survivor Function Estimates
for Health-Care workers
1
0.9
0.8
0.7
0.6
Guinea R(t)
0.5
Liberia R(t)
0.4
Sierra Leone R(t)
0.3
0.2
0.1
0
0
4
8
12
16
20
Weeks from case report tp death
24
28
32
36
Compare Estimates
All (left) vs. Health-Care Workers
1
1.0
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
Guinea
0.5
Guinea R(t)
0.5
Liberia R(t)
Liberia
0.4
Sierra Leone
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0.0
Sierra Leone R(t)
0
0
4
8
12
16
20
24
28
Weeks from case report to death
32
36
0
4
8
12
16
20
24
28
Weeks from case report tp death
32
36
Actuarial Weekly Death Rates
Conditional on Survival
0.50
0.45
0.40
0.35
0.30
Guinea
0.25
Liberia
0.20
Sierra Leone
0.15
0.10
0.05
0.00
0
4
8
12
16
20
Weeks from case report to death
24
28
32
36
CFR ~Deaths/Case Reports
All cases CFR
Standard
Deviations
Health-Care
Workers CFR
Guinea
62.9%
0.9%
54%
Liberia
41.4%
9.8%
41%
Sierra Leone
30.3%
26.1%
80%
Deaths/Case Reports is Biased Low
• P[Time to death ≤ 32 weeks] > Deaths/Case
Reports!
– because some haven’t died yet
• Empirical standard deviation estimates are
from March-April, -May, -June, -July, -August, September, -October and total cohorts
P[Death time ≤ 2 weeks] and
P[LoS ≤ 4 weeks|Survive]
• Death time = Death – Case Report
• LoS = Release – hospitalization (Length of
Stay)
Country
Probability of
death in 2 weeks
Std. Dev.
LoS
Guinea
49.2%
5.6%
100%
Liberia
36.9%
24.1%
92.5%
Sierra Leone
20.5%
13.1%
100%
P[Incubation time > 21 days]
• Supplementary Appendix 1, N Engl J Med.
DOI: 10.1056/NEJMoa1411100
• Eyeballed data and computed empirical cdf
– P[inc. time > 21 days|one-day exposure] = 3.5%
– P[inc. time > 21 days|Multi-day exposure] = 10.1%
– ~534 and ~154 cases respectively
• Why do people think, “If you don’t have
symptoms after 21 days, you won’t get it?”
Length-of-Stay: P[LoS > t|Survive]
• New England Journal of Medicine article
appendix fit gamma distributions to data
• Mark II eyeballs read hospitalization-to-release
data from NEJM graphs. Used empirical cdf of LoS
– Sample sizes are small
• Use actuarial death rates and empirical cdf of LoS
to forecast caseloads
– Assume case report => case load in first week after
report
Caseloads vs. Required ETU Beds
• WHO Dec. 1, 2014 required ETU beds
– http://apps.who.int/iris/bitstream/10665/137091
/1/roadmapsitrep22Oct2014_eng.pdf?ua=1
• Forecast is from distribution of time from case
report to either release or death
Existing ETU
beds
Guinea
160
Liberia
620
Sierra Leone
346
WHO
Required
ETU beds
260
2690
1198
WHO
Ratio %
61%
23%
29%
Forecast
Caseloads
1034
5535
5494
Caseload Estimates and Forecasts
(NOT cumulative!)
8,000
7,000
6,000
5,000
4,000
Guinea
Liberia
3,000
Sierra Leone
2,000
1,000
0
12/14/2014
11/14/2014
10/15/2014
9/15/2014
8/16/2014
7/17/2014
6/17/2014
5/18/2014
4/18/2014
3/19/2014
Analyses: append file name to
http://pstlarry.home.comcast.net/
•
•
•
•
•
Guinea: EbolaGna.xlsm
Liberia: EbolaLib.xlsm
Sierra Leone: EbolaSL.xlsm
Regression and summary: EbolaSIR.xlsx
Health-Care workers: EbolaHCW.xlsm
Workbook Spreadsheets
• *.xlsm contain workbook tabs: Data, npmle.
nplseSummary, MaxEntropy, Recovery, nplse of subsets
–
–
–
–
Npmle didn’t fit well, partly due to data revisions
Nplse and MaxEntropy agreed tolerably
VBA convolution for actuarial forecasts
EbolaHCW.xlsm contains nplse ccdf estimates
• EbolaSIR.xlsx contains country tabs, GuineaBootstrap
SurvivalAnalysis, CaseLoad, Incubation
– The country tabs contain regression analyses of case
reports and Guinea bootstrap case prediction limits
Conclusions: If survive week 1,…
• Survivors may need care to prevent subsequent death due
to secondary causes: liver damage, eyesight, and ???
• Accounting corrections messes up statistics?
– Liberia reduced death counts
– Sierra Leone recently reported deaths affect estimates: weekly
estimates disagree with estimates from original case report and
death counts
– Walter Shewhart’s rule #1: “Preserve all relevant information in
data”
• Exponential cumulative case reports (=> Ro > 1) seems
close to linear. Let’s hope so.
• Caseloads ~8000 in Liberia and Sierra Leone by end of year.
Can’t treat unreported cases.
Questions
• Case confirmations and adjustments cause problems
– Why do estimates differ? Daily vs. weekly
• Country vs. country, health-care workers vs. populations, npmle vs.
nplse vs. max. entropy
– Standard deviation estimates could be reduced
– How to compensate for corrections?
• Use statistics for planning, resource allocation, and
service levels? SIR, SEIR?
• Would WHO and Imperial College please share their
individual case data?
– Should countries support WHO and West Africa without
data, estimates, and quantifiable uncertainty?
Next Steps
• Forecast prediction limits on caseloads for
specified service levels
– SIR by country or county?
– Contact [email protected] if you would like more
analyses
– Updated weekly or whenever I get data
• Requested case data from WHO, Imperial College,
and CDC to validate estimates and represent
dependence
• Adjust data or adjust estimates to compensate for
corrections?
References
•
•
•
•
•
•
•
•
George, L. L. and Avinash C. Agrawal, “Estimation of a hidden service distribution of an
M/G/∞ system,” Naval Research Logistics, 20: 549–555. doi: 10.1002/nav.3800200314 ,
http://pstlarry.home.comcast.net/MGinfi1.docx
George, L. L., “Field Reliability Without Life Data,” SPES/QP News, vol. 5, no. 2, Dec. 1999, pp.
13-14, http://www.amstat-online.org/sections/qp/1299newsletter.pdf
Harris, Carl M. and Edward Rattner and Clifton Sutton, “Forecasting the extent of the
HIV/AIDS epidemic,” Socio-Economic Planning Sciences, 1992, vol. 26, issue 3, pages 149-168
Ibid. “Estimating and Projecting Regional HIV/AIDS Cases and Costs. 1990-2000: A Case
Study,” Interfaces, Vol. 29, No. 5, Sept.-Oct. 1997, pp. 38-53
Gang Cheng, “The nonparametric least-squares method for estimating monotone functions
with interval censored observations,” PhD thesis, University of Iowa, 2012,
http://ir.uiowa.edu/etd/2839
Jewell, Nicholas et al., “Estimation of the Case Fatality Ratio with Competing Risks Data: An
Application to Severe Acute Respiratory Syndome (SARS),” UC Berkeley Div. of Biostat.
Working paper series Number 176, 2005
WHO Ebola Response Team, “Ebola Virus Disease in West Africa—the First 9 Months of the
Epidemic and Forward Projections,” N Engl J Med. DOI: 10.1056/NEJMoa1411100, appendix
Maimuna Majumder, “Mathematical Modeling of the 2014 Ebola Outbreak,” MIT Sept. 26,
2014, http://maimunamajumder.wordpress.com/2014/09/26/mathematical-modeling-ofthe-2014-ebola-outbreak/