Statistical Methods Highlights for Health Economists, 2015

Statistical methods highlights for health economists
Lecture notes
Randy Ellis, March 16, 2015
Problems common to Health Economics
• Spending is highly skewed, with many zeros
• Randomized controlled trials are rare, so observational studies are the norm
• Policies often implemented on non‐random sets of enrollees. E.g., diabetes intervention.
• Many outcomes of interest are discrete
• Often collaborating with biostatistics/epidemiology who use other approaches
• Policies implemented geographically or over time, so hard to distinguish from other possible causes
• Huge samples?
• Many fixed effects => unobserved covariates
• Many endogenous variables
Design
When to use
Advantages
Disadvantages
Randomization
Whenever
feasible
When there is
variation at the
individual or
community level
Gold
standard
Most powerful
Not
Randomized
Encouragement
Design
When
an
intervention is
universally
implemented

Provides
exogenous variation
for a subset of
beneficiaries
Only
Regression
Discontinuity
If
an intervention
has a clear, sharp
assignment rule

Project
beneficiaries often
must qualify through
established criteria
Only
Difference-inDifferences
If
two groups are
growing at similar
rates
 Baseline and followup data are available
Eliminates
fixed
differences not
related to treatment
Can
Propensity Score
Matching
When other
methods are not
possible
Overcomes
Assumes

observed differences
between treatment
and comparison
always feasible
Not always ethical
looks at subgroup of sample
Power of
encouragement design
only known ex post
look at subgroup of sample
Assignment rule in
practice often not
implemented strictly
be biased if
trends change
Ideally have 2 preintervention periods of
data
no
unobserved differences
(often implausible)
Propensity Score Cites
• Very useful link to sources on using propensity scores in different software packages. (Johns Hopkins University School of Public Health)
• http://www.biostat.jhsph.edu/~estuart/prope
nsityscoresoftware.html
Stata
•
psmatch2 http://ideas.repec.org/c/boc/bocode/s432001.html
– Leuven, E. and Sianesi, B. (2003). psmatch2. Stata module to perform full Mahalanobis
and propensity score matching, common support graphing, and covariate imbalance testing. – Allows k:1 matching, kernel weighting, Mahalanobis matching – Includes built‐in diagnostics – Includes procedures for estimating ATT or ATE •
pscore http://www.lrz‐muenchen.de/~sobecker/pscore.html
– Becker, S.O. and Ichino, A. (2002). Estimation of average treatment effects based on propensity scores (2002) The Stata Journal 2(4): 358‐377. – k:1 matching, radius (caliper) matching, and stratification (subclassification) – For estimating the ATT •
match http://www.economics.harvard.edu/faculty/imbens/software_imbens
– Abadie, A., Drukker, D., Herr, J. L., and Imbens, G. W. (2004). Implementing matching estimators for average treatment effects in Stata. The Stata Journal 4(3): 290‐311. Available here.
– Primarily k:1 matching (with replacement) – Allows estimation of ATT or ATE, including robust variance estimators •
cem http://gking.harvard.edu/cem/
– Iacus, S.M., King, G., and Porro, G. (2008). Matching for Causal Inference Without Balance Checking. Available here.
– Implements coarsened exact matching SAS •
•
SAS usage note: http://support.sas.com/kb/30/971.html
Local and global optimal propensity score matching – Coca‐Perraillon, M. (2007). Local and global optimal propensity score matching. In SAS Global Forum 2007. Paper 185‐2007. Available here.
– Variety of matching methods. No built in diagnostics. Assumes propensity score already estimated. •
Greedy matching (1:1 nearest neighbor) – Parsons, L. S. (2001). Reducing bias in a propensity score matched‐pair sample using greedy matching techniques. In SAS SUGI 26, Paper 214‐26. Available here.
– Parsons, L.S. (2005). Using SAS software to perform a case‐control match on propensity score in an observational study. In SAS SUGI 30, Paper 225‐25. Available here.
– Kosanke, J., and Bergstralh, E. (2004). gmatch: Match 1 or more controls to cases using the GREEDY algorithm. http://www.mayo.edu/research/departments‐divisions/department‐health‐sciences‐
research/division‐biomedical‐statistics‐informatics/software/locally‐written‐sas‐macros
•
1:1 Mahalanbois matching within propensity score calipers – Feng, W.W., Jun, Y., and Xu, R. (2005). A method/macro based on propensity score and Mahalanobis
distance to reduce bias in treatment comparison in observational study. www.lexjansen.com/pharmasug/2006/publichealthresearch/pr05.pdf
•
Weighting – Leslie, S. and Thiebaud, P. (2006). Using propensity scores to adjust for treatment selection bias. http://www.lexjansen.com/wuss/2006/Analytics/ANL‐Leslie.pdf
•
Variable ratio matching, optimal matching algorithm – Kosanke, J., and Bergstralh, E. (2004). Match cases to controls using variable optimal matching. http://www.mayo.edu/research/departments‐divisions/department‐health‐sciences‐research/division‐
biomedical‐statistics‐informatics/software/locally‐written‐sas‐macros
R
•
MatchIt http://gking.harvard.edu/matchit
–
–
–
–
•
Matching http://sekhon.berkeley.edu/matching
–
–
–
–
•
Hansen, B.B., and Fredrickson, M. (2009). optmatch: Functions for optimal matching. Variable ratio, optimal, and full matching Can also be implemented through MatchIt
PSAgraphics http://cran.r‐project.org/web/packages/PSAgraphics/index.html
–
–
•
Iacus, S.M., King, G., and Porro, G. (2008). Matching for Causal Inference Without Balance Checking. Available here.
Implements coarsened exact matching Can also be implemented through MatchIt
optmatch http://cran.r‐project.org/web/packages/optmatch/index.html
–
–
–
•
Ridgeway, G., McCaffrey, D., and Morral, A. (2006). twang: Toolkit for weighting and analysis of nonequivalent groups. Functions for propensity score estimating and weighting, nonresponse weighting, and diagnosis of the weights Primarily uses generalized boosted regression to estimate the propensity scores cem http://gking.harvard.edu/cem/
–
–
–
•
Sekhon, J. S. (2011). Multivariate and propensity score matching software with automated balance optimization: The Matching package for R. Journal of Statistical Software 42(7). http://www.jstatsoft.org/v42/i07
Uses automated procedure to select matches, based on univariate and multivariate balance diagnostics Primarily 1:M matching (where M is a positive integer), allows matching with or without replacement, caliper, exact Includes built‐in effect and variance estimation procedures twang http://cran.r‐project.org/web/packages/twang/index.html
–
–
–
•
Ho, D.E., Imai, K., King, G., and Stuart, E.A. (2011). MatchIt: Nonparametric preprocessing for parameteric causal inference. Journal of Statistical Software 42(8). http://www.jstatsoft.org/v42/i08
Two‐step process: does matching, then user does outcome analysis (integrated with Zelig package for R) Wide array of estimation procedures and matching methods available: nearest neighbor, Mahalanobis, caliper, exact, full, optimal, subclassification
Built‐in numeric and graphical diagnostics Helmreich, J.E. and Pruzek, R.M. (2009). PSAgraphics: An R Package to Support Propensity Score Analysis. Journal of Statistical Software 29(6). Available here.
From webpage: "A collection of functions that primarily produce graphics to aid in a Propensity Score Analysis (PSA). Functions include: cat.psa
and box.psa to test balance within strata of categorical and quantitative covariates, circ.psa for a representation of the estimated effect size by stratum, loess.psa that provides a graphic and loess based effect size estimate, and various balance functions that provide measures of the balance achieved via a PSA in a categorical covariate." Synth –
–
–
Abadie, A., Diamond, A., and Hainmueller, H. (2011). Synth: An R Package for Synthetic Control Methods in Comparative Cast Studies. Journal of Statistical Software 42(13). http://www.jstatsoft.org/v42/i13 Implements weighting approach to creating synthetic control groups Useful when there is a single treated unit, such as a state or country. Main idea is to form a weighted average of comparison units that, when weighted, looks like the treated unit. Matching Methods
Selected slides from an unknown professor at UC Berkeley who posted some slides.
Propensity‐Score Matching (PSM)
Propensity score matching: match treated and untreated observations on the estimated probability of being treated (propensity score). Most commonly used.
•
Match on the basis of the propensity score
P(X) = Pr (d=1|X)
– D indicates participation in project
– Instead of attempting to create a match for each participant with exactly the same value of X, we can instead match on the probability of participation.
PSM: Key Assumptions
1. No unobserved variables affecting outcomes
• participation is independent of outcomes conditional on Xi
– This is false if there are unobserved outcomes affecting participation
• Enables matching not just at the mean but balances the distribution of observed characteristics across treatment and control
2. Common Support
Non zero probability of being in treatment or control group for all observation conditional on X.
3. For diff‐in‐diff models also need parallel trends, which is related to Assumption 1.
Common support is key
Density
Density of scores for
participants
Density of scores
for nonparticipants
Region of
common
support
0
Propensity score
1
High probability of
participating given X
Steps in Score Matching
1. Need representative and comparable data for both treatment and comparison groups
2. Use a logit (or other discrete choice model) to estimate program participations as a function of observable characteristics
3. Use predicted values from logit to generate propensity score p(xi) for all treatment and comparison group members
Calculating Impact using PSM
4. Match Pairs:
 Restrict sample to common support (as in Figure)
 Need to determine a tolerance limit: how different can control individuals or villages be and still be a match?
•
Nearest neighbors, nonlinear matching, multiple matches
5. Once matches are made, we can calculate impact by comparing the means of outcomes across participants and their matched pairs
PSM vs Randomization
•
•
Randomization does not require the untestable assumption of independence conditional on observables
PSM requires large samples and good data:
1. Ideally, the same data source is used for participants and non‐participants
2. Participants and non‐participants have access to similar institutions and markets, and 3. The data include X variables capable of identifying program participation and outcomes. Lessons on Matching Methods
• Typically used when neither randomization, RD or other quasi experimental options are not possible – Case 1: no baseline. Can do ex‐post matching
– Dangers of ex‐post matching: • Matching on variables that change due to participation (i.e., endogenous)
• What are some variables that won’t change?
• Matching helps control only for OBSERVABLE differences, not unobservable differences
More Lessons on Matching Methods
• Matching becomes much better in combination with other techniques, such as:
– Exploiting baseline data for matching and using difference‐in‐difference strategy
– If an assignment rule exists for project, can match on this rule
• Need good quality data
– Common support can be a problem if two groups are very different
• What to match on? Levels? Trends? Variance?
LINK BETWEEN PAY FOR PERFORMANCE INCENTIVES AND PHYSICIAN PAYMENT MECHANISMS: EVIDENCE FROM THE DIABETES MANAGEMENT INCENTIVE IN ONTARIO
JASMIN KANTAREVICa and BORIS KRALJb
aOntario
Medical Association, Canada
bUniversity of Toronto, Canada
HEALTH ECONOMICS
Health Econ. 22: 1417–1439 (2013)
Analytical framework
Social problem can then be written as
Solving yields
Alternative uses of Propensity Score matching
• Weighting by inverse probabilities (no longer favored)
• Nearest neighbor matching
– With or without replacement?
• Caliper matching
• Conventional kernel estimator
• Local linear kernel estimator (Their preferred specification