The Trouble with Instruments: Re-Examining Shock

The Trouble with Instruments:
Re-Examining Shock-Based IV Designs
Vladimir Atanasov
College of William and Mary
Mason School of Business
Bernard Black
Northwestern University
Law School and Kellogg School of Management
(draft March 2015)
European Corporate Governance Institute
Finance Working Paper 2014/xx
Northwestern University School of Law
Law and Economics Research Paper Number 14-xx
Available on SSRN at:
http://ssrn.com/abstract=2417689
The Trouble with Instruments: Re-examining Shock-Based IV Designs
Vladimir Atanasov*
College of William and Mary
Bernard Black**
Northwestern University
Abstract: Credible causal inference in accounting and finance research often comes from “natural”
experiments. These natural experiments generate “shocks” which can be exploited using various research
designs, including difference-in-differences (DiD), instrumental variables based on the shock (shock based
IV), and regression discontinuity (RD). There is much to be said for shock-based designs. Moreover, if one
must use IV, shock-based IV designs are highly likely to be preferred to non-shock IV designs. But shockbased IV remain problematic. Often, a near-equivalent DiD design is available, and is usually preferable. We
illustrate the problems with shock-based IV by re-analyzing three recent, high-quality papers. None of the IVs
turn out to be valid. For Desai and Dharmapala’s (REStat 2009) study of the interaction between tax shelter
opportunities and corporate governance, their first stage fails when we impose a balanced sample of firms with
data both before and after the shock. For Duchin, Matsusaka and Ozbas’s (DMO) (JFE 2010) study of the
effect of board independence on firm performance, their first stage also fails when we balance treated and
control firms on the pre-shock proportion of independent directors. For Iliev’s (JF 2010) RD/IV study of the
cost of compliance with SOX § 404, we use combined DiD/RD and principal strata methods, and find cost
estimates somewhat below his RD estimate, and well below his RD/IV estimate. The principal problem is that
Iliev’s IV does not, for subtle reasons, satisfy the core “only through” condition (exclusion restriction) for a
valid instrument. We discuss common themes that emerge from our re-analysis, including the fragility of IV
compared to other shock-based designs; the need for covariate balance between treated and control firms; and
the difficulty in satisfying the only-through condition. Our results suggest that even for shock-based designs,
the scope for IV methods is very limited.
Keywords: Instrumental variables; shock-based research design; exclusion restriction; covariate balance
JEL codes: C26, G34, G38
* Mason School of Business, College of William and Mary, P.O. Box 8795, Williamsburg, VA 23187,
[email protected], 757-221-2954. We owe special thanks to Mihir Desai and Dhammika Dharmapala;
Peter Iliev; and Ran Duchin, John Matsusaka, and Oguzhan Ozbas for their willingness to share their datasets and
statistical code with us, which made this project possible. We also owe strong thanks to Dhammika Dharmapala, Peter
Iliev, and John Matsusaka for reviewing a draft of this paper and discussing with us our reinterpretation of their results.
We also thank [*to come] and participants in finance workshops at Emory University (accounting department) Rice
University (finance department), Rutgers University (finance department), [*others to come] for comments and
suggestions; [to come] for research assistance; and the Searle Center on Law, Regulation and Economic Growth at
Northwestern Law School for financial support.
** Corresponding author. Nicholas J. Chabraja Professor at Northwestern University, Law School and Kellogg School
of Management. Tel. 312-503-2784, email: [email protected].
2
The Trouble with Instruments: Re-examining Shock-Based IV Designs
I. Introduction
Accounting and finance scholarship is moving toward greater stress on “identification” of
causal effects. That has led to increased use of “natural” experiments, which exploit “shocks” that
plausibly satisfy the core “as-if random assignment to treatment” and “only through” conditions for
credible causal inference.1 For a recent survey, see our related paper (Atanasov and Black, 2015;
below, AB-2015), on which we build here.
Shocks can be exploited in a number of ways, including difference-in-differences (DiD), event
studies (ES), regression discontinuity (RD), and instrumental variable (IV) designs, as well as
combined designs such as DiD/RD (DiD on a sample limited to firms in a bandwidth around an RD
threshold). We focus here on IV designs where the instrument is, or is based on, an underlying shock.
“Shock-based IV” designs generally rest on a much sounder basis than the non-shock IV designs that
are often used in accounting and finance research. But they remain vulnerable to a number of threats
to validity. We discuss those threats, some responses, and ways to improve shock-based IV designs.
We illustrate the fragility of shock-based IV designs by re-analyzing three recent, high-quality
papers by strong authors: Desai and Dharmapala (Review of Economics and Statistics, 2009); Duchin,
Matsusaka and Ozbas (Journal of Financial Economics, 2010); and Iliev (Journal of Finance, 2010).
Desai and Dharmapala (2009, below D&D) study how corporate governance mediates the effect of
tax shelter opportunities on firm value. Their shock is 1996 Treasury regulations that simplified
taxation for small private firms. As an unintended side effect, these rules increased tax shelter
1
The only-through condition is often called an “exclusion restriction.” E.g., Angrist and Pischke (2009), § 4.1. We
use the phrase “only through condition,” to clarify what the exclusion restriction is excluding.
3
opportunities for multinational firms. D&D use this shock, interacted with measures of the firm’s
need to shelter income, as instruments for “book-tax gap” (a proxy for tax sheltering). They find that
greater sheltering opportunities increase firm value, but only for firms with high institutional
ownership (a proxy for corporate governance).
Duchin, Matsusaka and Ozbas (2010, below, DMO) study the effect of board independence
on firm value and profitability. Their instrument for a change in board independence is whether a
firm had to add independent directors to its audit committee to meet a 1999 New York Stock Exchange
(NYSE) and NASDAQ requirement that audit committees consist entirely of independent directors
(“Audit Committee Shock”). DMO find that a higher proportion of independent directors is valueneutral overall, but positive (negative) for firms with low (high) information costs. Over 2000-2005,
firms in the top quartile of information cost that increase board independence by 10% (the amount
predicted by their instrument) suffer a 3.0% drop in ROA relative to bottom-quartile firms; a 24%
relative drop in Tobin’s q; and 31% lower cumulative share returns.
Iliev (2010) studies the cost of compliance with § 404 of the Sarbanes-Oxley Act (SOX) for
firms near the compliance threshold (public float of $75M), using a combined regression discontinuity
(RD) and IV design. His RD design exploits the discontinuity at $75M in float between firms which
do (don’t) need to comply with SOX § 404. Iliev finds that some firms manipulate their float to stay
below the $75M threshold, and uses IV to address this manipulation. His clever IV is whether a firm
had float > $75M in 2002. This instrument relies on an SEC rule, adopted in 2003, which required
compliance by firms with float > $75M in 2002 (before the $75M rule was known). Iliev estimates
a mean increase in ln(audit fees) of 0.744 (110% increase) with RD alone, and 0.983 (167% increase)
with combined RD/IV.
4
We count ourselves as enthusiastic participants in the move toward stronger research designs,
often shock-based. We look for shocks and exploit them in our own work when we can.2 We began
this project expecting to illustrate how the strategies for shock-based causal inference discussed in
AB-2015 could be used to improve already strong papers, and perhaps lead to different insights. For
example, DiD is similar to shock-based IV. It provides an “intent-to-treat” estimate of the effect of a
shock on all firms to which the shock applies. Shock-based IV, using the same shock, provides an
estimate for “compliers,” – firms whose behavior is changed by the shock. What would we learn by
applying DiD to a shock-based IV paper? What would change if we attended closely to “covariate
balance” – the need for treated and control firms to be similar on pre-treatment covariates. If we used
“balancing methods,” adapted from pure observational studies, to improve covariate balance and
ensure “common support” (reasonable overlap between treated and control firms on all covariates)?
If we applied principal strata thinking (Frangakis and Rubin, 1999, 2002), which generalizes the
“causal IV” concepts of always takers, never takers, compliers and defiers (Angrist, Imbens, and
Rubin, 1996)? If we used a combined DiD/RD design where feasible?
To choose these three papers, we began with the eight shock-based IV papers in the AB-2015
sample.3 We put aside Bennedsen et al. (2007), who rely on a truly random shock, and picked what
we saw as the next three strongest IV papers (as many as one paper could reasonably re-analyze with
care). An Appendix discusses our concerns with IV validity for the remaining papers. In brief, we
chose papers we liked, from authors we know and respect, that illustrated different uses of shock-
2
See, for example, Atanasov et al. (2010) (exploiting a corporate governance shock in Bulgaria); Black, Jang and Kim
(2006), Black and Kim (2012) (exploiting a governance shock in Korea).
3
In AB-2015, we surveyed the research designs used in 863 empirical corporate governance papers, published from
2001-2011 in 22 major journals. Of these, 285 use IV (either directly or an IV-based Heckman selection model), but only
8 papers use a shock-based IV (not counting Black, Jang and Kim, 2006, who use a “fuzzy RD” design).
5
based IV designs. All three papers begin with plausible, clearly exogenous instruments. All are
careful in many ways. They address important issues, and are deservedly published in top journals.
The authors were generous enough to share their datasets and code (and went to considerable trouble
to do so). We thought our re-analysis would show some differences in inference, but did not know,
and had no priors, whether those differences would be large or small.
Some projects, however, turn out differently than you expect. For D&D, we apply both DiD
and IV to a “balanced before-after sample” of firms which appear in their dataset in both 1996 (just
before the shock) and 1997 (just after it). We find no evidence that the 1996 rule change affects booktax gap for this balanced sample. Instead, their IVs, already “weak,” become insignificant predictors
of book-tax gap. The first stage of their two-stage least squares (2SLS) analysis fails. We also discuss
why their instruments are not exogenous (despite an exogenous shock). Moreover, even with the
balanced before-after sample, there is substantial covariate imbalance between treated and control
firms, and evidence of non-parallel trends between treated and control firms. One would need to
address these issues if the results had otherwise survived.
For DMO, we apply a combination of DiD and balancing methods. The treated firms in their
sample (which had to change their audit committees) had, on average, far fewer independent directors
than control firms (which already had 100% independent audit committees). If we compare treated
firms to control firms with similar initial proportions of independent directors, the Audit Committee
Shock no longer predicts a meaningful change in the proportion of independent directors. In effect,
the first stage of DMO’s 2SLS analysis fails. Their IVs also likely violate the only through condition.
And their main 2SLS regression specification violates the standard advice to never include an
interaction variable without including the non-interacted components. IV technology makes it easy
to miss that flaw, which becomes apparent when we use DiD instead.
6
For Iliev, his core RD design is sound, though we would prefer a combined DiD/RD design.
In re-assessing his combined RD/IV design; we apply a principal strata approach in which we divide
firms into strata based on growth over 2002-2004. We estimate that SOX § 404 compliance increases
ln(audit fees) for firms near the $75M threshold by around 0.60 (an 80% increase in fees), with smaller
estimates for narrow bandwidths around the threshold. In contrast, Iliev’s RD-only estimate is a
110% increase and his preferred RD/IV estimate is a 167% increase in fees. Iliev’s higher RD
estimate, and much higher RD/IV estimates flow from a subtle violation of the covariate balance that
a valid RD design should achieve, and an even subtler but much larger violation of covariate balance
between the “compliers” with his IV and control firms, and thus a violation of the “only through”
condition for a valid instrument (the instrument must predict the outcome only through the
instrumented variable). We show that the concern which led Iliev to use IV – firms that manipulated
their float to avoid complying with SOX § 404 might have higher SOX compliance costs than the
firms that let their float grow—is not a significant issue for his sample.
Three strong papers, yet in all three, the instruments fail, for different reasons! And yet
Bennedsen et al. (2007) aside, these papers appeared to be the stronger shock-based IV papers that
we found in AB-2015. We have no reason to expect the IVs in the other four shock-based IV papers
to survive similar scrutiny. We discuss these papers briefly in the Appendix. Indeed, even before
any re-examination, only one of these four papers reports statistically significant results (at the 5%
level) for their shock-based IVs. Nor is it likely that any of the IVs in the 276 non-shock IV papers
are valid.4 Clearly, finding a valid instrument is a tricky business.
4
AB-2015 explain why none of these non-shock IVs in their study are likely to be valid. Larcker and Rusticus (2010)
similarly conclude that none of the non-shock IVs in their review of accounting papers are likely to be valid.
7
Several common themes emerge from our analysis. These include: (i) the crucial role of
covariate balance for shock-based designs; (ii) the importance of using extensive covariates (in part
to check for covariate balance); (iii) the need for common support; (iv) the frequent need to use
balancing methods, including sample trimming, to ensure covariate balance (including common
support); (v) the value of exploiting a shock using more than one research design, using combined
research designs where feasible, and assessing whether the same shock leads to similar results across
designs; and (vi) use of principal strata analysis to clarify for what subsample one can estimate a
“local average treatment effect (LATE).” Many of these steps can be completed in a “design phase”
of the analysis, with outcomes hidden, to ensure that design decisions are not affected by knowledge
of which approach will produce stronger results (Rubin, 2008).
We skip the usual literature review, because the use of shock-based IVs in accounting and
finance research is relatively new, and there is little to review. We build on the AB-2015 survey of
shock-based research designs in corporate finance. Larcker and Rusticus (2010) and Roberts and
Whited (2013) discuss the difficulties in finding valid IVs in accounting and finance research, but do
not address shock-based instruments. Karpoff and Wittry (2015) and Catan and Kahan (2015) reassess and criticize DiD studies of state adoptions of antitakeover laws. We are not aware of use of
principal strata methods in accounting or finance.
Do our results imply that finance and accounting researchers should abandon efforts to find
even shock-based IVs? Not quite. Consider Bennedsen et al. (2007), who use biological chance,
which determines the gender of first-born children of CEOs of family-run firms, as an instrument for
within-family CEO succession. In the AB-2015 survey, this was our favorite among the 77 74 shockbased papers. (Iliev (2010) was our second-favorite, and remains so, despite our re-analysis here.)
We wrote that Bennedsen et al. “have, in effect, a randomized experiment, with an encouragement
8
design. This is a beautiful paper.” In the AB-2015 sample, Black, Jang and Kim (2006), use a fuzzy
RD design -- they exploit a legal shock to the board structure of large Korean firms (assets > 2 trillion
Korean won), with no similar change for smaller firms. Yet “FUZZY RD is IV,” under a different
name (Angrist and Pischke, 2009, § 6.2). In AB-2015, we classified this paper as RD, rather than IV
(or both). Viewed as a shock-based IV design, Black, Jang, and Kim have a plausible instrument.
Other fuzzy RD designs could be reasonably valid as well.
There will also be times when applying both DiD in a primary analysis, and IV based on the
same shock in a secondary analysis, provides insight into how change in a shocked variable affects
the outcome. Still, fuzzy RD aside, valid shock-based IVs will be rare, and situations where one
should use shock-based IV alone (not just as part of a DiD-primary study) will be rarer still.
Do our results imply, more broadly, that the current stress on shock-based designs in finance
and accounting research is misguided? Not at all. We believe in exploiting shocks, when they can
be found. DiD and RD designs will often be more robust than shock-based IV. And an imperfect
shock-based paper will often still be more convincing than the non-shock alternatives. We share
neither the view of some researchers, which can be caricatured as “endogeneity is everywhere, one
can never solve it, so let’s stop worrying about it”; nor that of the “endogeneity police”, who believe
that “if causal inference isn’t (nearly) perfect, a research design is (nearly) worthless,”
2. Background on Shock-Based Research Designs and Shock-Based IV
We offer here a condensed review of shock-based research designs. We assume readers are
generally familiar with the reverse causation and omitted variable bias risks that plague much
corporate finance and accounting research, and how shock-based designs can respond to those risks.
We focus on firm-level analyses; assume a binary shock (w = 1 if firm i is “treated” (subject to the
9
shock), and 0 otherwise); discuss the qualitative aspects of shock-based design; and refer readers to
AB-2015 for details, regression mechanics, and citations to the causal inference literature.
2.1. Shock-based Designs in General
Firm-level causal analyses typically seek to estimate the causal effect τi (for firm i) of
treatment on an outcome yi, where τi is the value of yi if firm i is treated, minus the value of yi if firm
i is not treated:
τi = yi(wi = 1) – yi(wi = 0), or, more compactly: τi = yi1 – yi0
The “fundamental problem of causal inference” (Holland, 1986) is that we observe only one of the
two potential outcomes, yi1 and yi0. The usual response is to impute the missing potential outcome
for the treated firms from the control firms. The central challenge to imputation is “selection bias”:
the treated and control firms may differ in one or more ways, perhaps unobserved, which will bias
the estimated treatment effects.
A randomized experiment addresses selection bias by ensuring that treated and control firms
have similar expected values for both observed and unobserved covariates.
But randomized
experiments are rarely available in corporate finance and accounting research. Shock-based designs
are a second-best alternative. A shock-based design “works” only if, and to the extent, it creates
conditions that come close to those one would achieve from a true randomized experiment.
Different shock-based designs – including DiD, RD, and IV -- appear to rely on different
assumptions. However, all rely on a “good shock” – one which permits credible causal inference.
AB-2015 state five conditions for a good shock. To summarize:
(1) Shock Strength: The shock should be strong enough to significantly change firm behavior
or incentives.
(2) Exogenous Shock. The shock came from “outside” the system one is studying. Treated
firms did not choose whether to be treated, could not change their behavior to anticipate the
shock, the shock is expected to be permanent, and there is no reason to believe that which
10
firms were treated depends on unobserved firm characteristics.
(3) “As If Random” Assignment: The shock must separate firms into treated and controls in
a manner which is close to random. One often needs to allow an exception for the forcing
variable which determines which firms are affected by the shock and, in some studies, a
variable which is changed by the shock.
(4) Covariate balance. The forcing and forced variables aside, the shock should produce
reasonable covariate balance between treated and control firms, including “common support”
(reasonable overlap between treated and control firms on all covariates). Somewhat imperfect
balance can be address with balancing methods, but severe imbalance undermines shock
credibility.
(5) Only-Through Condition(s): The apparent effect of the shock on the outcome must come
only through the shock. There must be no other shock, at around the same time, that could
affect treated firms differently than control firms. If – as in an IV analysis –-- one expects the
shock to affect outcomes through a particular instrumented variable, the shock must affect the
outcome only through that variable. In IV analysis, this is often called an “exclusion
restriction”; we prefer the term “only-through condition.”
2.2. Shock-Based IV and Alternatives
Conditions (1), (2) and (5) are well-known for “causal IV” (e.g., Angrist and Pischke, 2009,
§ 4.1). The exogeneity [as phrased in condition (2)] and only-through conditions are implicit in the
formal “exogeneity” requirement, stated in econometrics texts, that Cov(z, ε) = 0, where ε is the
unobservable true error from regressing the outcome y on the instrument z and other “exogenous”
covariates x. But standard discussions of causal IV do not discuss the need for as-if random
assignment or its corollary, covariate balance. Indeed, severe imbalance on core covariates will be
central to our re-examination of DMO and Iliev.
Given a shock, one can often either run DiD based on the shock, or use the shock as an IV.
The DiD design lets the researcher remain agnostic about the channels through which the shock
affects the outcome. In contrast, the IV design forces the research to assume that the shock affects
the outcome only through the instrumented variable. That assumption is not testable, and is often
suspect.
11
Let us borrow from the language of randomized experiments with partial compliance. If the
instrumented variable is binary (or can be made so by binning), one can see DiD as an “intent-totreat” design, in which one estimates the average effect of the shock on all firms which were subject
to the shock (assigned to treatment). In contrast, the IV design provides a “local average treatment
effect” (LATE), for compliers -- those firms which complied with the assignment to treatment, by
changing their instrumented variable, in the predicted direction, at the cost of making an additional
only-through assumption. One goal of this paper is to highlight the similarity between DiD and shockbased IV. In our view, most uses of shock based IV will benefit if the researchers also report intentto-treat DiD results. Below, we illustrate the similarities between the two designs in our reexamination of D&D and DMO.
Following AB-2015, let q be the outcome, z be the instrumental variable, and gov (for
governance) be the instrumented variable. In 2SLS, the instrument z substitutes for the instrumented
variable; and we make the only-through assumption that the power of the instrument to predict the
outcome reflects the true power of the instrumented variable, here gov. This assumption is reflected
in the 2SLS estimate of the coefficient on gov. Without covariates, this estimate is:
𝐶𝑜𝑣(𝑧,𝑞)
𝛽̂2𝑆𝐿𝑆 = 𝐶𝑜𝑣(𝑧,𝑔𝑜𝑣)
(1)
The 2SLS coefficient b2SLS can also be expressed in terms of the intent-to-treat DiD coefficient δDiD:
̂
𝛿
effect of shock on q
𝛽̂2SLS = 𝛽̂DiD = effect of shock on gov
1S
(2)
Here 𝛽̂1𝑆 is the coefficient on z from the first-stage regression of gov on z. Eqn. (2) is known as a
Wald estimate. If we add covariates, the DiD and 2SLS estimators will diverge slightly, but should
be quite similar. Statistical strength should be similar as well. If the first-stage is strong, so that the
estimate of 𝛽̂1𝑆 is precise, the t-statistics for 𝛽̂2𝑆𝐿𝑆 and 𝛿̂𝐷𝑖𝐷 will be similar. If the first stage is not
12
strong, this will be reflected in a lower t-statistic 𝛽̂2𝑆𝐿𝑆 than for 𝛿̂𝐷𝑖𝐷 , which reflects the combined
uncertainty in estimating both 𝛿̂𝐷𝑖𝐷 and 𝛽̂1𝑆 .
If the first stage is not strong, the IV coefficient is prone to a “blowup” problem: the 2SLS
coefficient will often be much larger than the non-IV coefficient one would obtain by regressing the
outcome on the instrumented variable. Unless the IV is “perfect” – fully satisfies the only through
condition –-- the large 2SLS coefficient can be spurious, and will mostly reflect the direct effect of z
on q, rather than the effect of gov. In our experience, a high (2SLS coefficient/non-IV coefficient)
ratio is a strong warning sign for likely violation of the only-through condition.
Shock-based IV with multiple instruments –where the shock is interacted with other firm
attributes, as in the D&D study –-- is also subject to the classical “weak instruments” problem: Even
if the only-through condition is completely satisfied, standard errors can be downward-biased if one
uses multiple instruments which do not strongly predict the instrumented variable in the first-stage
regression. A common rule of thumb is that one wants a first-stage F-statistic > 10, for the instruments
taken together of at least 10 to have reasonable comfort that this bias will be small (Stock, Wright,
and Yogo, 2002).
If a shock does not directly produce covariate balance between treated and control firms,
balance can often be improved through balancing methods developed for pure observational studies,
including trimming to common support, matching, and inverse propensity weighting. We use some
of these methods below, but it is beyond our scope to discuss the many available methods and how
to choose among them.5
5
See generally Imbens and Rubin (2014). For trimming to common support, see Crump et al. (2009); for
matching, see Rosenbaum (2009); for inverse propensity weighting, see Busso, DiNardo and McCrary (2014).
13
2.3. Some Regression Details
Unless otherwise specified, all cross-sectional regressions in this paper use robust standard
errors; all panel regressions use standard errors clustered on firm. D&D, DMO, and Iliev vary in
whether they report standard errors or t-statistics; we report t-statistics throughout.
3. Re-Examination of Desai and Dharmapala (2009)
The first shock-based IV paper we examine is Desai and Dharmapala (2009). The authors
examine the effect of tax avoidance on the value of U.S. firms, and whether corporate governance
mediates this effect. They find evidence that greater tax sheltering opportunities increase Tobin’s q,
but only for firms with high institutional ownership (whom, they posit, are better governed).
3.1. Research Design
D&D rely for causal inference on a legal rule adopted in late 1996 (“check-the box” rules for
pass-through taxation of non-public U.S. firms) as a legal shock. These rules were intended to
simplify reporting for small, private companies. As an unintended byproduct, they also reduced tax
sheltering costs for large public firms that had, or could create, offshore subsidiaries.
They
hypothesize that: (i) the difference between “book” income reported to shareholders, and taxable
income reported on income tax returns (book-tax gap, below “BTG”) is a good (or at least best
available) measure of a firm’s tax sheltering activities; (ii) firms with low BTG before the 1996 law
change would gain more sheltering opportunities than firms with higher BTG; and (iii) at firms with
high institutional ownership, greater sheltering will increase firm value (proxied by tax-adjusted
Tobin’s q). In contrast, at worse-governed firms, insiders will appropriate the additional value, so
Tobin’s q will not rise.
BTG, however, is endogenous – it can be affected by many firm attributes, some unobserved.
DD address that issue by constructing shock-based instruments for BTG: they interact the 1996 shock
14
with three firm attributes that predict the firm’s need to shelter its income, net operating loss
carryforwards (NOLs), short-term debt, and long-term debt, each scaled by total assets. These
variables are endogenous too. The D&D idea is that interacting them with an exogenous legal shock
will makes the instruments effectively exogenous as well. They address the only through condition
for instrument validity by including NOLs, short-term debt, and long-term debt as separate covariates
in their IV analysis. It is then plausible that the interacted variables will affect Tobin’s q only
indirectly through BTG, and not directly, nor indirectly through an omitted variable.
This is wonderfully clever. The research design can fail in many ways. BTG might be a poor
proxy for tax sheltering. High BTG firms (which already engage in tax sheltering) might be better at
exploiting the new check-the-box opportunities than low-BTG firms. Institutional ownership might
be a poor proxy for firm governance. The IV strategy treats firm financial characteristics, including
the ones they use as instruments, as exogenous when we know they are not. And the IVs might not
be strong enough. But the authors begin with an exogenous shock and carefully defend exogeneity.
Assuming they find results, and they do, the research design is reasonably convincing. Or so we
thought when we first looked at their study. The editor and reviewers at REStat, a major journal,
were also convinced.
D&D report results with firm and year fixed effects, and use the following covariates: total
accruals/ assets, ratio of option to total compensation, sales, implied volatility of share price,
NOLs/assets, short-term debt/assets, long-term debt/assets, |foreign income or loss|/assets, and R&D
expenditures/assets. Their principal dependent variable is tax-adjusted Tobin’s q. We use the same
variables; see D&D for details and summary statistics. Both they and we use standard errors with
15
firm clusters.6
3.2. Creating a “Pre-Post Balanced” Dataset
The D&D dataset includes 862 firms observed at least once over 1993-2001. Of these, 100
are observed only once, so effectively drop out with firm FE, leaving an effective sample of 762 firms.
A general theme of this paper is the importance of sample selection, including ensuring covariate
balance. We therefore begin by assessing which firms belong in the sample. Consistent with good
practice in a natural experiment, we do so in a “design” stage of our analysis, with outcomes hidden
(Rubin, 2008; Rosenbaum, 2009).
DD’s central empirical method is shock-based IV, using instruments that interact a post-1996
dummy with NOLs, short-term debt, and long-term debt (below, “BTG predictors”). As they
recognize, this design relies on the shock for exogeneity.7 There is no basis for causal inference for
firms that are in the sample only “pre-1996” (including 1996) or only post-1996. Yet, of the 762
firms in the effective D&D sample, only 510 appear both before and after the 1996 shock. Of the
other 252 firms, 80 appear only pre-1996. Including these firms can affect the coefficients on
covariates, but will not directly affect IV estimates, because the shock*covariate IV’s are zero in 1996
and earlier years. The 172 firms that appear only post-1996 are more troublesome. For these firms,
the IVs are identical to the covariates that DD use to predict BTG and are equally endogenous. Thus,
one should limit the sample to, at most, the 510 firms that appear both before and after the shock.
We next consider the remaining 510 firms. Of these, 487 have data in both 1996 and 1997; 8
6
D&D use individual firm dummies. Each absorbs a degree of freedom. One can avoid the loss of degrees of freedom
and obtain slightly smaller standard errors using the xtivreg2.ado module for Stata. For consistency with their results, we
also use individual firm dummies.
7
See D&D at 542 (“In order to address [endogeneity] concerns, an exogenous source of variation in firms’ opportunities
for tax avoidance is required.”)
16
have data at least once over 1993-1995, but not in 1996 (the last pre-shock year); and 15 firms have
data at least once year over 1998-2001, but not in 1997 (the first post-shock year). There may be
something odd about the 23 firms which pop into and out of the sample. We confirm in unreported
regressions that they are very different from the other 487 firms on several covariates. 8 Moreover,
one goal of our paper is to compare IV and DiD results using the same shock. For DiD analysis, we
require data in 1996 because we define treated and control groups based on 1996 covariates.
Given the small loss in sample size, the differences between “popin” and other firms, and the
desire to use a similar sample for DiD and IV analyses, we use a “pre-post balanced sample” of 487
firms with data in both 1996 and 1997 in our principal analyses. Below, we first replicate the D&D
results with their sample and then switch to the pre-post balanced sample.
3.3. Firm FE Results (D&D, Table 3)
In Table DD-1 we replicate the pooled OLS results in D&D, Table 3. The dependent variable
is tax-adjusted Tobin’s q, as defined by D&D. Following D&D, we estimate two models: Model 1
uses BTG as the principal independent variable of interest; Model 2 uses three main independent
variables -- BTG, institutional ownership, and – of core interest for D&D -- an interaction between
BTG and institutional ownership. We estimate each model separately with the original sample and
the pre-post balanced sample.
Consider first Model (1). The results for the original and pre-post balanced samples are
8
For each covariate, we regress the covariate on a “popin” dummy (=1 for these 23 firms; 0 otherwise), year dummies
(omitting 1996), and constant term. The constant term then gives the mean for non-popin firms in 1996. The coefficients
on popin dummy are often large relative to the constant term. For NOLs, the coefficient on the constant term is near zero
at 0.011; the coefficient on popin dummy is 20 times larger, at 0.231, although statistically insignificant (t = 1.037). Thus
the popin firms are much more likely to have NOLs than other firms. For sales, the constant term coefficient equals 3.409
vs. the popin dummy coefficient of -1.647 (t-stat = -1.965), suggesting that popin fims have average sales that almost
50% less than the other firms. For R&D, the coefficient on the constant term is 0.043; the coefficient on pop-in dummy
is similar in magnitude at 0.039 and marginally significant (t = 1.93).
17
similar. In both, the coefficient on BTG is positive, at around 0.6, but not statistically significant.
The differences between the two samples are more pronounced when estimating Model (2). With the
original sample, the coefficients on the interaction between BTG and institutional ownership are
positive and statistically significant. With the pre-post balanced sample, the coefficient is somewhat
smaller and only marginally significant. But there are large differences between the original and prepost balanced samples in the coefficients on BTG and institutional ownership, which suggest that the
additional firms in the full sample differ from those in the balanced sample.
3.4. DiD and DiDiD Results
Given our interest in using multiple research designs based on the same shock, we next
consider a DiD and then a DiDiD framework, and report results in Table DD-2. DD hypothesize that:
(i) the 1996 check-the-box rules expand tax sheltering opportunities more for firms which had done
less sheltering (measured by BTG) than for higher-BTG firms; and (ii) additional sheltering adds
value only for firms with high institutional ownership.
In our DiD model, we define treated firms as firms with below-median BTG in 1996
(lowBTG96 dummy = 1) and control firms as firms with above-median BTG in 1996 (lowBTG96
dummy = 0. The median BTG for the pre-post balanced sample in 1996 is 0.001. We report two
specifications – one with lowBTG96 dummy as the only independent variable, and the other including
time-varying controls. In both specifications, the coefficients on lowBTG96 dummy are close to zero.
There is no evidence that firms with low BTG in 1996 achieve higher Tobin’s q relative to high BTG
firms, following the 1996 rule change.
In our DiDiD model, we define the third difference as firms with above-median institutional
ownership in 1996 (highInstOwn96 = 1) versus below-median institutional ownership
(highInstOwn96 = 0). The median institutional ownership in 1996 is 0.58. We add highInstOwn96
18
dummy and its interaction with lowBTG96 dummy to our DiD models. D&D predict a positive
coefficient on the interaction term. We find, however, that the coefficients on the interaction term
are small, with t-statistics well below one, and with mixed sign -- positive without controls, but
negative with controls.
To assess whether our non-results reflect the binary nature of our low BTG and high
institutional ownership variables, we consider continuous versions of these variables in unreported
results. In a “DiD-continuous” model, the coefficient on BTG is negative (consistent with Table DD2) but not statistically significant.
We then estimate three “DiDiD-continuous” models:
(i)
continuous BTG and binary institutional ownership; (2) continuous institutional ownership and binary
BTG; and (3) continuous BTG and continuous institutional ownership. The coefficients on the triple
interaction term are insignificant in all specifications, and are positive with binary BTG but negative
with continuous BTG.
In sum, our DiD/DiDiD analysis provide no evidence that firms with low tax shields prior to
the tax law change realized higher Tobin’s q after the change, nor evidence for the D&D hypothesis
that only firms with both low tax shields and high institutional ownership gained from the law change.
This is remarkable. The core results from a careful paper by major scholars, published in a top journal
known for attention to empirical methods, vanish when we use the pre-post balanced sample and
switch from IV to a DiDiD design. They will not reappear below, when we apply their IV analysis
to the pre-post balanced sample. We conduct that replication next, and then explore why their results
went away.
3.5. Shock Strength (D&D, Table 4)
We next replicate the first-stage IV results in D&D, Table 4, first with the DD original sample
and then with our pre-post balanced sample. We present results in Table DD-3. With the original
19
D&D specification, the instruments are statistically significant but “weak.” They have F-statistics of
3.33 (model (1)) or 3.00 (model (2)), well below the F > 10 rule of thumb for avoiding the classical
weak instruments problem.
With the pre-post balanced sample, the F-statistics drop to only
1.48(model (1)) or 1.05 (model (2)), and are statistically insignificant. The instruments no longer
predict the instrumented variables strongly enough to be usable. The greater (though still modest)
strength of the instruments with the original sample turns out to be driven by post-1996-only firms,
for which the instruments are clearly endogenous.
To further explore instrument strength, we recast the first-stage IV regression in a DiD/DiDiD
framework and present results in Table DD-4. We use a similar strategy as in Table DD-2 to define
treated and control firms, but instead of using BTG, we construct an index based on the three
instruments (NOL, short-term debt, and long-term debt) as of 1996. We rank each firm in 1996 on
each instrument, and sum the ranks. We use the sum of ranks to define a dummy variable,
lowTaxShield96 = 1 for firms with above-median sum of ranks; 0 otherwise. We treat the abovemedian firms as treated and the below median firms as. For the DiDiD model, we again define the
third difference as above vs. below median institutional ownership in 1996.
None of the DiD/DiDiD coefficients in Table DD-4 are statistically significant. Thus, there
is no evidence that the tax rule shock led firms with below-median need for tax shields to use take
advantage of the check-the-box rules and increase their BTG, relative to firms with above median
need.
The analyses in Tables DD-3 and DD-4 are consistent. The opportunity for sheltering created
by the tax rule may well lead some firms to engage in more sheltering, but it does not do so
differentially for firms with high versus low apparent need for sheltering. This weak shock cannot
20
identify a causal connection between sheltering need (proxied by BTG) and firm value, whether or
not mediated by institutional ownership.
3.6. 2SLS Estimates (D&D, Table 5)
One might stop here, but for completeness, we replicate the second-stage IV estimates from
D&D Table 5, models (1)-(3). We report results with the original and pre-post balanced samples in
Table DD-5. The coefficients on instrumented BTG*institutional ownership decrease by more than
50% when we switch from the original to the pre-post balanced sample and become insignificant.
We again see that including firms observed only post-1996 was central to the original results.
3.7. Intent-to-Treat DiD and DiDiD Estimates
We turn next to additional steps one would want to take, for a shock-based IV study which
had robust 2SLS results. In the spirit of this paper, in which we seek to use both DiD and IV designs
to study the same shock, we present Intent-to-Treat DiD and DiDiD models in Table DD-6. In the
DiD analysis in regressions (1) and (2), we regress tax adjusted Tobin’s q on the interaction between
lowTaxShield96 and a post-shock dummy (plus constant term and firm and year FE). Regression (1)
omits covariates, regression (2) adds them. We call these “intent-to-treat” models (using language
borrowed from IV methods) because greater need in 1996 to shelter income can be seen as
encouraging firms to take advantage of the check-the-box rules (as DD hypothesize), but does not
require them to do so. The coefficient on the interaction term is positive and marginally significant
in regression (1) but falls in magnitude and becomes insignificant when we add covariates.
For the DiDiD analysis, we proceed similarly to Table DD-2. We add as additional variables
highInstOwn96 * post and the triple interaction lowTaxShield96 * highInstOwn96 * post. The
coefficients on the triple interaction are positive in both specifications but not close to being
21
statistically significant. This is consistent with the encouragement being too weak to induce much of
a differential response by firms with both higher sheltering need and high institutional ownership.
3.8. Covariate Balance
In randomized trials and pure observational studies, it is customary to assess covariate balance
between the treated and control firms. Reporting covariate balance is not the norm in DiD and shockbased IV studies, but in our judgment, should be. In Table DD-7, we present limited results for the
pre-post balanced sample. We use three definitions of treated and control firms. In Panel A, we
divide firms into treated and control based on BTG in 1996. Treated firms have below-median BTG
(lowBTG96 = 1); control firms have above-median. In Panel B, we divide the same based on
institutional ownership in 1996.
Treated firms have above-median institutional ownership
(highInstOwn96 = 1). In Panel C, we divide firms based on our sum-of-ranks based measure of
overall need to shelter taxable income. Treated firms are those with greater sheltering needs in 1996
(lowTaxShield96 = 1).
In each panel we report means for the treated and control groups for 1996 for the covariates
used in prior tables. We also report absolute values for a two-sample t-test for difference in means
and a measure of “normalized differences,” suggested by Imbens and Rubin (2015), which is
independent of sample size. Unlike t- values, the normalized difference does not increase with sample
size. A fuller check for covariate balance might include assessing whether the treated and control
groups show similar dispersion around the mean, visual inspection of kernel density plots for each
covariate, and running a Kolmogorov-Smirnov test for similarity of the full distributions.
In Panels A and B, treated and control firms are relatively balanced on most covariates.
However, low BTG firms have significantly more share price volatility, and firms with high
institutional ownership grant more options, as a proportion of total compensation. Balance is
22
substantially worse in Panel C, when we classify firms based on overall need to shelter income,
proxied by lowTaxShield96. Treated firms have much lower Tobin’s q, higher sales, and lower share
price volatility, and do less R&D. This imbalance would raise concerns for any shock-based design.
If the D&D results had survived to this stage, one would want to address this imbalance using
balancing methods. We illustrate that process below, in our re-examination of DMO.
3.9. Non-Parallel Trends
In our judgment, in any DiD and shock-based IV studies, one should check for parallel trends
on the outcome variable between treated and control firms during the pre-treatment period. Checks
for parallel trends are sometimes conducted in DiD studies, but rarely in IV studies (DMO is an
exception). Non-parallel trends are a major worry sign for both designs.
In Figure DD-1, we perform such a check. We show mean Tobin’s q by year for “treated”
firms with below-median tax shields in 1996 (lowTaxShield96 = 1), hence greater need to shelter
income, versus “control” firms with above-median tax shields (lowTaxShield96 = 0). The high-tax
shield controls have much higher Tobin’s q in all years. This is consistent with the large difference
in means we found in Table DD-7, Panel C, and provides further evidence that they are not a suitable
control group for the treated firms.
Moreover, there is clear evidence of non-parallel trends. Mean Tobin’s q rises for the control
firms, relative to treated firms, in 1996, just prior to treatment. Mean Tobin’s q for the controls rises
again in 1999, relative to treated firms, well after the shock, and then falls in 2000 and 2001. The
changes over 1999-2001 suggest differing reaction to the “tech bubble” of the late 1990s, which
“popped” in 2000 and 2001. Given these non-parallel changes at times unrelated to the shock, even
if an effect had been found for (say) 1997 versus 1996, one could not ascribe it with confidence to
the shock.
23
Non-parallel pre-treatment trends (or, as in D&D, non-parallel post-treatment trends at a time
not consistent with response to the shock), severely undermine the credibility of one’s results. One
sometimes sees researchers addressing non-parallel trends by adding linear time trends to a regression
specification. In the D&D study, one would add an interaction fi*t (where fi are firm dummies and t
is year) to a panel data specification with firm and year fixed effects. But adding linear trends will
rescue the design only if (i) any non-parallel effects in the pre-treatment period were linear; and (ii)
the linear trend would have continued in the post-treatment period, but for the shock. Yet a trend
without known cause might also stop, or even reverse, in the post-shock period. Non-parallel pretreatment trends can sometimes be addressed through careful balancing of the treated and control
groups, or adding covariates that absorb the non-parallelism. Short of that, no robust shock-based
specification is available.
3.10. Principal Strata Approach (Optional)
Our analysis suggests a possible way to strengthen the D&D design. In the intent-to-treat
analysis in Table DD-6, the coefficients have the predicted signs and nontrivial magnitudes. They
are just not statistically significant. Perhaps the authors’ hypothesis is right, but their instruments are
not strong enough. If one could find stronger proxies for firms’ need to shelter their income, the
results might strengthen as well.
Perhaps too, some firms respond to tax sheltering opportunities, while others are not. If one
could isolate and study a subsample of responsive firms, the results might strengthen. This idea can
be seen as an adaptation of principal strata analysis, an approach that we develop below in our reexamination of Iliev. Assume that there are latent strata of responsive and nonresponsive firms,
analogous to compliers and noncompliers in a standard causal IV analysis, or to shrinkers and modest
growers in our discussion below of Iliev. The tax reform shock will be a weak shock for the non24
responsive firms, but should be stronger for the responsive firms, perhaps enough so to rescue the
design, and let D&D investigate their interesting hypothesis. This approach is related to methods for
strengthening an instrument which affects some units than on others, by excluding from the sample
units for which the instrument is weak (Baiocci et al., 2010; Small and Rosenbaum, 2008; Keele and
Morgan, 2013). However, separating firms into responsive and non-responsive strata would require
tax expertise, which we lack.
4. Re-Examination of Duchin, Matsusaka, and Ozbas (2010)
4.1. Overview
DMO examine two research questions: (1) does a change in the proportion of independent
directors on company boards causally affect three outcomes (Tobin’s q, ROA, and share returns), and
(2) how does any effect of independent directors on these outcomes depend on the firm’s
informational environment.
DMO hypothesize that adding independent directors can improve
performance in firms with low information acquisition costs, yet be counterproductive for firms with
high information acquisition costs. This is an interesting and plausible hypothesis, and the authors
develop a creative way to test it.
DMO recognize that firms endogenously choose board composition. They rely for causal
inference on a legal shock to audit committee independence, which could lead some firms to add
independent directors to their boards, to staff the audit committee. The shock is a 1999 change in
NYSE and /Nasdaq listing rules that forced listed firms to have 100% independent directors on the
audit committee. Previously, a majority had been permitted. These listing requirements were later
included in SOX. DMO argue that this rule change will affect only firms that lack a fully independent
audit committee in 2000 (the rule was adopted in late 1999 and became effective after the spring
25
shareholder meeting season, when most firms elected their 2000 boards). Firms which already had
100% independent audit committees serve as controls.
This can be seen as an “encouragement” design, where the law change encourages firms with
less-than-100%-independent audit committees to add independent directors to their boards. One
could exploit the shock using DiD, but DMO choose instead to use IV. Their core independent
variable is the change in percentage of independent directors from 2000 to 2005 (δIndep). Their
instrument for δIndep is the shock -- a dummy variable that equals 1 for firms without a fully
independent audit committee in 2000 (“non-comply dummy”). They use 2SLS together with a firstdifferences specification, using 2005-minus-2000 differences in the outcomes and their covariates.
They use board size, leverage, firm age, ln(market capitalization), and industry fixed effects (for the
48 Fama-French industries) as covariates.
Instrumented δIndep is DMO’s first variable of interest. They find, consistent with prior
literature, that δIndep alone is not a significant predictor of their outcome variables. Their main
contribution, besides testing the effect of board independence in a causal setting, is to hypothesize
(plausibly) that the effect of adding independent directors depends on the firm’s information
environment: if a DMO-constructed information cost index (“Info Cost”), which combines measures
of the number of analysts, analyst forecast dispersion, and analyst forecast error), is low (high), adding
independent directors will add (subtract) value. DMO find strong negative coefficients on the
interaction between δIndep and *Info Cost for all three outcome variables.
4.2. One Instrument or Two?
DMO instrument for δIndep in a first-stage regression of δIndep on the instrument (noncomply dummy) and covariates. In their core, regressions, in which they interact δIndep with Info
Cost, they use what one might call a “quasi-instrument” for δIndep*Info Cost, defined as (predicted
26
value of δIndep from first-stage regression) * Info Cost. This is technically incorrect, and is
sometimes called a “forbidden regression” (Wooldridge, 2010, § 9.5.2; Angrist and Pischke, § 4.6.1).
The resulting estimator is inconsistent; the standard errors are also incorrect. One should instead use
two instruments. In our re-analysis, we use two instruments for the two endogenous variables δIndep
and δIndep * Info Cost: the instruments are non-comply dummy and non-comply dummy * Info
Cost.
4.3. Covariate Balance
A strength of the DMO paper is that they report on covariate balance. They compare the 2000
characteristics of treated and control firms, in their Table 2. We present a similar covariate balance
analysis in Table DMO-1.9 We add several other variables that are in the DMO dataset, but were not
used in their analysis. We report both normalized differences (Imbens and Rubin, 2015) and tstatistics reported by DMO.
Overall, treated and control firms are fairly similar on the outcomes, on the covariates that
DMO used, and on the additional potential covariates listed at the bottom of Table DMO-1. There is
a moderate imbalance in the overall information cost index (Info Cost) and one of its components –
dispersion of analyst forecasts. There is huge imbalance in the percentage of independent directors
on the audit committee. This is by construction of their treated and control groups.
More troubling, there is a large imbalance in the percentage of independent directors on the
board in the base year of 2000 (“PctIndep”). Control firms average 70% independent directors, versus
53% for treated firms (t = 15.70, normalized diff. = 0.59). Although it’s not surprising that firms
9
Table DMO-1 differs slightly from DMO Table 2, because we restrict the sample for this comparison to firms
used in the Tobin’s q regressions (with non-missing values for all covariates and δQ). We impose the same restriction in
reporting first-stage results in Tables DMO-3 to -5 and in creating Figures DMO-1 to DMO-3. The usable sample varies
slightly depending on the outcome variable.
27
without a fully independent audit committee have fewer independent directors generally, this
imbalance is a cause for concern. PctIndep could directly predict δIndep (we will see below that it
does) and the outcome variables, leading to bias. Thus, careful attention to balance on PctIndep is
crucial for proper research design. A core weakness in the DMO study is their failure to address this
imbalance. We pursue this concern in detail below.
4.4. Replicating and Extending DMO’s Original Results: One IV or Two?
We next replicate and then correct and extend DMO’s results. We pursue three main
corrections: (i) replacing their forbidden regression with a permitted one; (ii) controlling for Info
Cost; and (iii) controlling for pre-treatment levels of PctIndep.
As we do so, their results
progressively weaken. Once we control for PctIndep, the DMO results disappear entirely.
In Table DMO-2, columns (1)-(4), we replicate the first and second stage of the simple DMO
IV model, which allows for a direct effect of δIndep on their outcomes, without as yet interacting
δIndep with Info Cost. We also report, in columns (5)-(7), corresponding intent-to-treat DiD
regressions, in which we directly use the non-comply dummy to predict the outcomes. The first stage
of the IV specification appears very strong. Non-comply dummy takes a coefficient of 11.4, implying
that treated firms, on average, increase the percentage of independent directors on their boards by
11.4% more than control firms over 2000-2005. The t-statistic is 9.40, which easily satisfies any
concerns about instrument strength. The coefficient on instrumented δIndep is basically zero for all
three outcomes, in both the IV and DiD specifications.
We note, however, a concern with the DMO specification: The variables with covariate
imbalance are Info Cost (moderately) and PctIndep (strongly). Both variables should therefore be
included as covariates; but neither is. We indicate this by adding rows for these variables, which
indicate that they were not included in the regressions.
28
We turn in Table DMO-3 to the core DMO specification, in which they include both predicted
δIndep (from a separate first-stage regression) and predicted δIndep interacted with Info Cost. In
columns (1)-(3), we replicate their results. They find an economically large and statistically strong
negative coefficient on (predicted δIndep)*Info Cost for all three outcome variables. A plausible
summary measure of statistical strength is the average t-statistic for the three outcomes, which is (3.10
+ 7.90 + 4.98)/3 = 5.33.
In the remaining columns of Table DMO-3, we report first and second-stage results from
conventional 2SLS, in which we use non-comply dummy and (non-comply dummy * Info Cost) as
separate instruments, and instrument for both δIndep and δIndep*Info Cost. The first stage remains
respectable, but weakens noticeably. The coefficient on non-comply dummy, as a predictor of δIndep,
remains large at 12.13. But the first stage t-statistic falls to 4.43, implying larger overall 2SLS
standard errors. In columns (5)-(7), the second-stage 2SLS coefficients on instrumented (δIndep*Info
Cost) are similar to those that DMO report on (predicted δIndep)*Info Cost. The t-statistics fall,
slightly for δROA and mean return, and more sharply for δQ, but remain significant. The average tstatistic falls from 5.33 to 4.25. Still, in another study, the forbidden regression used by DMO could
make a larger difference in coefficients or statistical significant (see Wooldridge, 2010, § 9.5.2, for
an example).
4.5. Adding Info Cost as a Covariate
DMO do not include Info Cost as a covariate in their main results (they include this variable
later as a robustness check). Omitting this variable from their main results is a clear error. Any
regression which includes an interaction term should include the non-interacted components. The
omitted non-interacted components will normally correlate with the interaction term. If the noninteracted components also predict the outcome, omitting them will lead to omitted variable bias. In
29
Table DMO-4, columns (1)-(5) we report 2SLS results, adding Info Cost as a covariate. The firststage weakens again, as a predictor of δIndep. The coefficient is still large, at 10.45, but the first
stage t-statistic falls to 3.33. In the second stage, the coefficients weaken for δROA and (more
strongly) for δQ. The t-statistics again fall for all three outcomes; the coefficient in the δROA
regression is now only marginally significant. The average t-statistic is now only 2.24. In columns
(6)-(8), we report the intent-to-treat DiD results that correspond to the IV results. The statistical
strength of the IV and DiD results is similar, as expected.
4.5. Imbalance on PctIndep: Graphical Evidence
Thus far, the reader’s reaction might be muted. DMO made two technical errors, which a
referee might have caught. But their results largely survive, with both 2SLS and intent-to-treat DiD
specifications. This muted view would overlook an important concern with IV designs. Technical
errors often have much larger consequences than in OLS. First, these errors can substantially inflate
first-stage t-statistics, and make an instrument appear much stronger than it is. Second, IV results are
vulnerable to “blowup”: IV mechanics, in attributing the effect of the instrument on the outcome to
the instrumented variable, magnify, often greatly, any bias in estimating that effect.
Also, there are larger problems to come. We saw above that treated and control firms have
very different means for percentage of independent directors (PctIndep). This percentage is an
important, endogenous firm characteristic that strongly predicts δIndep, and may also predict
outcomes. Yet, DMO omit PctIndep from their regressions.
We illustrate graphically the importance of PctIndep to DMO’s first-stage results in Figure
DMO-1. We show a scatter plot of PctIndep in 2000 versus δIndep, with (green) circles for control
firms, which met the audit committee rule in 2000, and orange triangles for treated firms, which did
not. Several features are apparent. First, there is a strong negative correlation between PctIndep and
30
δIndep (r = -0.69). Second, treated (control) firms mostly have low (high) PctIndep. Thus, Figure
DMO-1 confirms our initial concerns that PctIndep is both strongly imbalanced between treated and
control firms and strongly predicts δIndep. A first-stage regression that omits PctIndep will wrongly
attribute the change in board independence to the instrument, rather than to PctIndep.
Third, there is a serious lack of overlap problem. There are almost no control firms with
PctIndep < 25%, and almost no treated firms with PctIndep > 80%. Simply adding PctIndep to the
2SLS equations won’t solve that problem. Instead regression coefficients will be affected by
extrapolation beyond the region of common support. One response to this problem is to trim the
sample to a region of reasonably think thick common support. Sensible bounds might be PctIndep 
[0.25, 0.80]. Any results one might with the full sample, but not the trimmed sample, would be highly
suspect, because they rely on extrapolation beyond common support.
Fourth, if we look within the common support region, among firms with similar PctIndep, the
triangles and circles are well mixed. If the instrument was strongly inducing firms to add independent
directors to their boards, the triangles should be systematically above the circles, holding constant
PctIndep. They are not.
We investigate the overlap problem further in Figure DMO-2. We divide the sample into 20
bins based on PctIndep. The bins cover [0-5%], (5-10%], (10-15%], and so on. We show the number
of treated and control firms in each bin, side by side. Figure DMO-2 shows the imbalance on PctIndep
in a different way. There are no control firms with PctIndep < 15%, and only a handful with PctIndep
< 25%. Trimming the sample at 25% PctIndep excludes 24 treated firms, but only 3 control firms;
trimming at 80% excludes only 6 treated firms, but 153 control firms. This confirms our judgment
that one should trim the sample to firms with PctIndep  (25%, 80%].
In Figure DMO-3, we examine shock strength. We trim the sample to PctIndep  (25%, 80%]
31
and then plot mean PctIndep separately for treated and controls, within each 5% bin for PctIndep.
Dashed lines show the overall means for δIndep for treated and controls. The overall means is 7%
(well below the 10-12% difference implied by the first stages in Tables DMO-2 to DMO-4). Within
each bin, however, the differences are small. The mean difference of 7% is driven by a combination
of: (i) the strong tendency, even within the trimmed sample, for firms that don’t initially comply with
the audit committee rule to have fewer independent directors; and (ii) the general tendency for firms
with low PctIndep in 2000 to increase their independent directors by 2005.
The within-bin
comparison of means in Figure DMO-3 confirms the impression from Figure DMO-1 that, if one
controls for PctIndep, the audit committee shock loses much of its power to predict δIndep.
4.6. Addressing Imbalance on PctIndep
The audit committee shock is clearly much weaker, once one controls for PctIndep. Is it still
strong enough to be usable? We take a first pass at that question in Table DMO-5. We limit the
sample to PctIndep  (25%, 80%], include PctIndep and Info Cost as covariates in the 2SLS models
and report both stages. Column (1)-(4) report results for a simple specification, similar to Table
DMO-2, with one instrument (non-comply dummy) and one instrumented variable (δIndep). In
column (1), non-comply dummy takes a coefficient of 2.28 (t = 2.34) – still significant at the 5%
level, but likely too weak to be usable. The second-stage coefficients are insignificant, as expected.
In columns (5-(9), we report results instrumenting for both δIndep and δIndep * Info Cost. As
predictors of δIndep, the two instruments have an F-statistic of 2.58 (p = .09), well below the F > 10
rule of thumb for avoiding bias due to weak instruments. In column (2), the instruments are still
significant predictors of the interaction term δIndep * Info Cost. But the F-statistic is well below 10,
and we would be uncomfortable relying on this specification, without a good story for why the dual
instruments predict the interaction term, even though non-comply dummy only weakly predicts
32
δIndep, once we control for PctIndep. If we swallow those doubts and continue with the second stage,
the coefficients on the interaction term are insignificant for all three outcomes. Thus, the original
DMO results critically depend on not controlling for PctIndep in the regression.
If we did not trim the sample, the instruments would remain weak. In column (9), δIndep *
Info Cost would significantly predict mean return (coeff. = -0.120; t = 2.76); but would be
insignificant for the other outcomes. We view the one significant result as spurious –significance
depends on assuming that the linear regression model holds outside common support.
4.7. Summary for DMO and Further Comments
There are several problems with the DMO study. First, they use a “forbidden regression”
instead of two separate IVs. Their results weaken somewhat, but remain significant. Still, using two
separate instruments could be important in another study. Second, and more importantly, they omit
a non-interacted variable, Info Cost, in their main results, which involve instrumenting for Info
Cost*δIndep. Their results weaken substantially when we control for Info Cost. Most critically, they
have extreme imbalance on a core pre-shock covariate (PctIndep). That imbalance drives instrument
strength. Once we control for PctIndep, the nudge toward higher board independence provided by
their audit committee shock is too weak to be usable. Second-stage IV and (not reported) intent-totreat DiD estimates also become insignificant.
Further steps. Imagine, though, that the DMO results had survived. What else might one do
to strengthen inference? Omitted variable bias remains a major concern. After all, non-comply
dummy is an endogenous firm choice, just as much as PctIndep. Both DiD and its shock-IV cousin
rely on a parallel trends assumption -- that treated and control firms would have evolved similarly,
but for the shock, even though they made different pre-shock choices, for only partly observed
reasons. The defenses for this assumption include: (i) checking for covariate balance; (ii) using
33
extensive covariates (which also allows a more sensitive test for covariate balance); (iii) improving
balance (including trimming to common support); and (iv) showing parallel trends in the pretreatment period. DMO provide a covariate balance table and show pre-treatment trends for ROA
(one of their three outcomes), in their Figure 3. These steps strengthen the paper, and are part of why
we chose it for replication. But their covariates are thin (only board size, leverage, firm age, and
ln(market cap)). And while ROA trends are reasonably parallel over 1996-2000, they are not perfectly
so – there is a possible divergence between the controls and the high-Info-cost treated firms over
1998-2000. And no similar graphs are provided for Tobin’s q and share returns.
We would want to see many more covariates (which should be available), and graphs for all
three outcome variables, for a longer pre-treatment period. In our experience, mild non-parallel trends
that are apparent with, say, 6-7 pre-treatment periods, can be hard to detect with only 3-4 periods.
Given the great power of PctIndep in predicting δIndep, we would also worry about whether one can
assume a linear relation between PctIndep and δIndep, as we did by adding PctIndep as a regressor.
Figure DMO-3 suggests that any non-linearity is minor. Still, avoiding this assumption seems
warranted. Here is one balancing method that we pursued in unreported results. Our approach loosely
follows Imbens and Rubin, 2015, ch. 17, but the literature on pure observational studies includes
many others (e.g., Rosenbaum, 2009; Imbens, 2014). Trim the sample (as above). Use all covariates
to estimate the propensity to have a non-compliant audit committee in 2000. Then divide the sample
into blocks based on the propensity score (we use 5 blocks); run 2SLS (or intent-to-treat DiD) within
each block, and sum across blocks to obtain overall coefficients and t-statistics. In unreported results,
we the most important contributor to the propensity score is PctIndep, but other variables like Info
Cost and market cap are important. All intent-to-treat DiD and 2SLS coefficients are insignificant,
both within each block and summed across blocks.
34
IV Validity and the Blowup Problem. We also have serious doubts about the only-through
condition. The only-through condition requires that an increase in audit committee independence
affects the DMO outcome variables only through board independence. For this condition to hold,
one must believe that the shock to the audit committee has no direct effect on ROA, Tobin’s q, or
share returns. But why should this be true? Fully independent audit committees may affect firms’
accounting choices and hence reported profits. A fully independent audit committee could also reduce
fraud risk, and thus lead investors to pay more for the same reported earnings, which would affect
Tobin’s q and share returns. The other two papers we study, D&D and Iliev, address the only-through
condition with care. In contrast, DMO simply note near the end of their paper the possibility of other
channels. We believe every IV design should include a careful defense of the only-through condition.
Intent-to-treat DiD is usually an alternative to shock-based IV, which avoids this problem. DMO
could have assessed whether the audit committee change caused a change in their outcomes, while
remaining agnostic on the channel – a direct effect of the audit committee change, an indirect effect
through board independence, or a third channel (perhaps appointing directors with financial expertise,
to staff the audit committee) causes any observed change.
As we noted in Part 2, coefficient blowup, with 2SLS coefficients much larger than OLS
coefficients, is a strong warning of a likely violation of the only-through condition. So are results
that are significant in 2SLS but not in OLS (or intent-to-treat DiD). DMO report OLS results (as
every IV paper should), and they have both problems – their 2SLS coefficients are 4-9 times the OLS
coefficients (depending on the outcome variable), and in OLS, the δIndep * Info Cost interaction term
is significant only with ln(q) as the outcome. They comment only that the differences between OLS
and their IV-like results “suggest that endogeneity of board composition may be a significant
35
problem.” (DMO at 205). In our view, any IV paper with these issues needs a careful defense of why
endogeneity should lead to coefficient blowup, stronger t-statistics in 2SLS, or both.
5. Re-analysis of Iliev (2010)
5.1. Overview
SOX § 404, as implemented by the SEC and later amended by the Dodd-Frank Act, requires
firms with “public float” (the market value of shares not held by insiders) > $75M to have their
auditors certify that the firm has adequate internal controls, beginning in 2004. Complying with SOX
§ 404 surely increases firms’ auditing costs, but one would like to know by how much.
For larger public firms, there is no control group, so the best available research design is
interrupted time series (ITS), in which one would estimate the time trend in audit fees, before and
after 2004, and look for an unusual jump in 2004, relative to that trend. But audit fees might have
changed from 2003-2004 for reasons other than SOX § 404.
In a careful and clever paper, Iliev (2010) seeks to do better, for firms near the $75M (in free
float, which we omit below) threshold for SOX § 404 to apply. His central research design is RD, in
which he compares firms just above this threshold to firms just below it, and estimates a mean increase
in ln(audit fees) of 0.744 (t = 7.39). The corresponding percentage increase is e0.744 – 1 = 110%,
implying that audit fees more than doubled. See Table Iliev-1, regression (3).
Iliev finds evidence that firms manipulate their free float to avoid complying with SOX § 404
(below, “SOX compliance”). He argues that these firms might have especially high SOX compliance
costs, which would bias his estimate of average costs downward. He addresses this concern with an
IV strategy. The SEC adopted the rule containing the $75M threshold for SOX compliance in 2003.
The rule required firms to comply in 2004 if their float in 2002, 2003, or 2004 exceeded $75M. Firms
could not go back in time to manipulate their 2002 float, and in 2002, they had no reason to expect
36
$75M in float to become a magic number in the future. Iliev therefore uses (2002 float > $75M as an
instrument for SOX compliance. His two-stage-least-squares (2SLS) estimated increase in ln(fees)
is 0.983 (t = 3.65). The corresponding percentage increase is e0.983 – 1 = 167%, well above his OLS
estimate. See Table Iliev-1, regression (3)).
Iliev is a state-of-the-art finance paper. In many ways it is better than state-of-the-finance-art.
In AB-2014, we review 863 empirical corporate governance papers published in 22 major journals
over 2001-2011, and find 77 shock-based papers. Iliev is one of only two RD papers in our sample.
His RD design is carefully done, including checking for covariate balance, including a flexible control
for the running variable (float or, more generally, firm size), varying the bandwidth around the
discontinuity, checking for evidence that firms manipulate their float to avoid complying with SOX
§ 404, and running an array of placebo and robustness checks. He finds evidence of threshold
manipulation and addresses it with a clever and plausible instrument. Of the 74 shock-based papers
we studied in AB-2015, this is our second favorite, after only Bennedsen et al. (2007).
Iliev’s RD-only design is credible, even though we would supplement it by using DiD/RD
combined. His article is one of the very first finance papers to use an RD design, and one of only two
papers in the AB-2015 sample to do so.
And yet, Iliev’s 2SLS estimate is much too high. With our combined RD/DiD design, we
estimate a coefficient on a SOX-compliance dummy of 0.586 (t = 4.59). This corresponds to an e0.586
= 80% increase, or only about half of Iliev’s RD/IV estimate of 167%. We show below that his RD
design, for subtle reasons, does not produce covariate balance on firm growth. That alone does not
matter much. Controlling for growth modestly reduces the coefficient on a SOX-compliance dummy
from 0.744 to 0.706.
37
Even more subtly, Iliev’s IV design leads to gross imbalance on growth and to violation of
the only-through condition. In effect, his IV predicts audit fees both through the instrumented variable
(SOX 404 compliance) and through omitted growth variables. The “IV-compliers” (who comply
with SOX § 404 only because they are forced to do so by his instrument) are also a small, and likely
unrepresentative, subsample of all SOX-complier firms. We also explain why Iliev’s IV design is
unnecessary (firms that manipulate their float to avoid SOX compliance do not have significantly
higher compliance costs than other firms).
5.2. Data Manipulations
We start with Iliev’s dataset, which includes 1,492 unique firms over 2002-2004 (the period
we focus on here); of which 815 firms have full data on float, audit fees, and covariates, and 281 of
these firms are within his [$50M, 100M] bandwidth for 2004 float.10
Iliev mostly uses a fixed bandwidth around the $75M threshold of [$50M, $100] measured in
2004.11 In our re-analysis, we vary bandwidth systematically, in both 2002 and 2004. To do so, we
define a bandwidth parameter b, and a corresponding bandwidth of [
$75𝑀
𝑏
, $75𝑀 ∗ 𝑏]. In our main
analysis we use b = 1.5, which implies a bandwidth of [$50M, $112.5M], similar but not identical to
Iliev’s.
5.3. Replication and Extension of Iliev RD Results
We begin our re-analysis with replication and extending Iliev’s RD results. In Table Iliev-1,
10
Iliev (2010) also includes results through 2007. Following Iliev, we require that firms having non-missing data on
public float, audit fees, and covariates for 2002 and 2004. In unreported robustness checks, we obtain very similar results
if we also insist that this data be non-missing for 2003. To determine which firms must comply with SOX § 404, the SEC
uses float at the end of the second fiscal quarter (“SEC float”). Iliev’s dataset flags firm-year observations where float is
reported at a different date (generally year-end). In our extensions, to avoid dropping these observations, we assume that
reported float = SEC float, but drop two firms for which this assumption is incorrect – they reported float > $75M in 2002,
yet did not comply with SOX § 404. This implies that their SEC float for 2002 was < $75M.
11
Iliev finds similar (unreported) results with bandwidths of [$60m, $90M] and [$40M, $110M].
38
regression (1), we replicate Iliev’s core RD result, with no covariates other than a cubic in float and
industry fixed effects (“FE”), from his Table II. SOX compliance predicts an 0.866 (t = 7.57) increase
in ln(2004 audit fees). This estimate, however, is biased upward. Within the [$50M, $100M]
bandwidth, larger firms likely pay larger fees. To develop a better estimate, one must control for firm
size. Iliev does so with an admirably flexible functional form in his (and our) regression (2), He
controls for ln(sales), ln(assets), ln(market value) and a cubic in public float, all measured in 2004,
along with other covariates (leverage, receivables/assets, dummy for Big 4 auditor, number of
business segments, and number of geographic segments). His coefficient estimate falls to 0.744,
implying a 110% increase in fees (e0.744 = 2.10).
It is customary for an RD design to control flexibly for the “running” or forcing variable, as
Iliev does, and common to find that doing so changes the estimated jump at the threshold. The theory
behind RD implies, however, that (i) firms near the threshold should be similar on everything but the
running variable; and (ii) adding additional covariates should not greatly affect treatment effect
estimates. We verify differences in firm size, and similarity on Iliev’s other control variables, in two
ways. First, in Table Iliev-1, we include an additional regression (1A), in which we add only sizerelated covariates to regression (1). The coefficient on the SOX compliance dummy is 0.749. Adding
the non-size variables in regression (2) changes the estimate only trivially, to 0.744.
Second, we assess covariate balance in Table Iliev-2, both in 2002 (before $75M in float
became a magic number) and in 2004 (when Iliev builds his sample). In Panel A, we compare “future
treated” firms, with 2002 float  [$75M, $100M], to “potential future control” firms with 2002 float
 [$50M, $75M]. The two groups are similar on all of Iliev’s control variables except size and related
variables (number of geographic segments is related to size). As Panel B shows, in 2004, within
39
Iliev’s [$50M, $100M bandwidth], treated firms are tolerably similar to control firms on Iliev’s
covariates. So far, so good.
But are the SOX-complier and control firms really similar? In both panels, we include
variables for growth from 2002 to 2004 in float, market capitalization, sales, and assets.12 In 2002,
firms below the future $75M threshold are similar to firms above the threshold, as expected. But in
Panel B, SOX compliers grow more slowly than controls, significantly so for float and market cap.
These differences arise because the SEC rule is asymmetric with respect to how changes in float affect
whether SOX § 404 complies: Firms can grow into compliance, but cannot shrink out of compliance.
We discuss this asymmetry below.
A core RD assumption is violated. Iliev missed this. So did we, until we carefully thought
about which firms were the compliers, always-takers, and never-takers for his IV. Does omitting
growth covariates matter? The OLS answer is yes, at least somewhat. In Table Iliev-1, regression
(2A), we add growth variables to his regression. The treatment effect estimate drops from 0.744 to
0.706. More centrally, our confidence in the RD design also falls. Linear controls for growth may
be imperfect. Also, if the treatment and control groups are not balanced on growth, they may not be
balanced on unobserved variables. As we will see below, the IV-compliers (the SOX-compliers who
comply only because they have 2002 float > $75M) are much more unbalanced, versus the controls,
than the full set of SOX-compliers. A core task for our re-analysis will be to construct treatment and
control groups that are balanced on growth, and thus more likely to be balanced on unobservables.
One lesson from imbalance on growth, which will generalize to DMO, D&D, and many other
finance papers: Black et al. (2014) stress the importance of using extensive firm-level covariates,
12
Iliev’s dataset starts in 2002, so we cannot compute growth from earlier years through 2002.
40
even in studies with firm fixed effects (FE). Shock-based research designs should similarly include
far more covariates than is the norm today. Had Iliev done so, he might well have found the growth
imbalance and addressed it in some way, even if not by using the principal strata approach we develop
below.
5.4 Replication and Extension of Iliev’s RD/IV Results
We next replicate and extend Iliev’s combined RD/IV analysis, in Table Iliev-3. We show
the first stage and second stage of his 2SLS regressions in separate columns. In the first stage, he
uses the instrument (float > $75M in 2002), plus covariates, to predict the instrumented variable (SOX
compliance in 2004). We begin with regression (3) from Iliev’s Table II, which include no covariates
other than industry FE. In this RD/IV specification, Iliev omits the cubic in float that he includes in
his RD regressions.13 This is an odd choice – an RD design should control flexibly for the running
variable, as Iliev did in his straight RD regressions. In regression (3A), we therefore add the float
cubic. Adding the float cubic reduces the first stage coefficient from 0.466 to 0.325, but the instrument
remains reasonably strong (first-stage t-stat = 6.48). More problematically, adding the float cubic
increases the second-stage coefficient, already uncomfortably high at 1.171, even further to 1.332,
implying a near quadrupling of audit fees (278% increase). SOX § 404 compliance was expensive,
but no one, to our knowledge, thought it was that expensive. As we discuss in AB-2014, an
implausibly large second-stage coefficient is a strong warning sign that one’s instrument fails the only
through condition.
We next report the first and second stage for Iliev Table II, regression (4), which includes
13
In his Table II, Iliev says in one place that both stages use the “same controls” but in another that the first-stage, but
not the second, includes “public float terms.” The latter would be an incorrect use of 2SLS if true, but based on our
replication, Iliev excluded float terms in both stages.
41
covariates. The coefficient on instrumented SOX compliance falls to 0.983, which still implies a
167% increase in fees (e0.983 = 2.67). We then extend Iliev’s results by adding a cubic in float, in
regression (4A), and further adding growth variables in regression (4B). The first-stage becomes
progressively weaker, with a coefficient in regression (4B) of 0.237. At the same time, the coefficient
on instrumented SOX compliance rises to 1.179, implying an increase in audit fees of 225% as a
result of SOX 400 compliance.
5.5. Impact of Growth Imbalance on Coefficient Estimates
We next investigate the effect of imbalance in growth on estimates of SOX § 404 compliance
costs, relying primarily on graphical analysis. Note first that the IV-compliers – the firms which
comply with SOX § 404 only because they had float > $75M in 2002 – are a very particular subgroup
of the SOX-compliers (all firms which complied with SOX § 404 in 2004). The IV-compliers must
have suffered a drop in float to < $75M in each of 2003 and 2004. Could their growth trajectory over
2002-2004 explain their 2004 audit fees, at least in part?
We provide evidence that the answer is yes in Figure Iliev-1. We plot the relation between
ln(audit fees) and float in 2004 for the 281 firms used in Iliev’s OLS and IV analysis, but split them
into four groups:
(1) 41 “shrinker-compliers” (shown with orange triangles), defined as firms that are SOXcompliers, but have float < $75M in 2004, implying that their float exceeded $75M in 2002
or 2003;
(2) 80 grower-compliers (shown with green diamonds), defined as SOX-compliers with 2002
float < $75M but 2004 float > $75M;
(3) 117 small control firms (shown with black dots), these are firms with 2004 float < $75M,
that do not comply with SOX § 404 in 2004; and
42
(4) 43 “large-compliers” (also shown with black dots), firms with float > $75M in both 2002
and 2004.
We also add two regression lines showing predicted ln(2004 fees). The first line, at the lower
left, is for control firms, ends at 2004 float of $75M, and comes from a regression of ln(fees) on float
and constant term. The second line, which covers the full range of 2004 float, is for SOX-compliers;
we regress ln(fees) on float, constant term, and a “large dummy” (=1 if 2004 float > $75M). This
specification lets shrinker-compliers have a different constant term than other SOX-compliers.
The scatter plot and predicted fee lines show that, controlling for float, shrinkers have much
higher audit fees than other SOX-compliers (the coefficient on the shrinker dummy is 0.53). If we
use the regression lines to compare the audit fees of control firms to shrinker-compliers at 2004 float
just below $75M, the difference in predicted ln(fees) is 1.07, which is similar to Iliev’s IV estimate.
In contrast, if we compare the same control firms to other SOX-compliers with float just above $75M,
the difference in predicted fees is only 0.54. This is quite close to the DiD/RD estimate we develop
below.
We can now see how to control for growth lead Iliev astray. Shrinker-compliers had much
higher audit fees than other SOX-compliers, pre-SOX. This is plausible – these firms used to be
larger, and may retain the higher fees of larger firms. Some may also incur higher fees because they
have shrunk due to business troubles that might call for more intensive auditing. Iliev’s IV estimate
compares shrinkers (more precisely, shrinkers with 2002 float > $75M) to control firms. This leads
to a large upward bias in estimating the cost of SOX compliance.
Figure Iliev-2 illustrates the large differences in growth between IV-compliers and control
firms. The figure shows a scatter plot with 2002 float on the x-axis and 2004 float on the y-axis.
43
Vertical lines show 2002 float of $50M, $75M, and $100M; horizontal lines are similar for 2004. A
45-degree line indicates no change in float from 2002 to 2004. The IV-compliers are the 22 shrinkers
in the red-bordered box (2002 float > $75M but 2004 float < $75M).14 All fall below – often far
below – the 45-degree dotted line. The 117 control firms are in the green-bordered box (2002 and
2004 float both < $75M). Almost all (110 of 117) are above the 45-degree line, often far above it.
There are 10 control firms whose float shrank from 2002-2004, of these three have reported 2002
float>$75M. We drop these three firms because their reported 2002 float was not the float used by
the SEC to determine SOX 404 compliance. The largest negative change in ln(float) among the
remaining 7 control firms is -0.20. The other SOX-compliers are in the black-bordered box, above
the $75M line for 2004.
The IV-compliers are very different from the controls on growth.
There was already
imbalance on growth between the full set of SOX-compliers and the controls, as we saw in Table
Iliev-2, Panel B. That imbalance is far worse for IV-compliers versus controls. We show that
difference numerically in Table Iliev-2, Panel C. The mean change in ln(float) is -0.57 for IVcompliers versus +0.96 for controls. Across all four growth measures, controls grow, while IVcompliers shrink.
Standard advice in the causal inference literature is to avoid using regression to extrapolate
beyond the “common support” of the data – the area for which treated and control firms have
overlapping values (e.g., Imbens and Rubin, 2014, ch. 14). The great strength of RD is that it should
lead to covariate balance and nearly complete overlap, on both observed and unobserved covariates.
14
In Figure Iliev-2, we drop [*xx] shrinker-compliers with float in both 2002 and 2003 > $75M, but 2004 float <
$75M, in order to focus on the IV-compliers.
44
In this respect, Iliev’s IV design grossly departs from the RD design he began with. In trying
creatively to address one problem (some firms manipulated their 2004 float to avoid complying with
SOX), Iliev stumbled into a larger problem with covariate imbalance. That imbalance was pernicious,
because growth in float strongly predicts audit fees.
How much overlap is there between IV-compliers and controls on growth in ln(float)? Not
much. Figure Iliev-3 shows a histogram of the number of IV-compliers and controls within different
ranges for change in ln(float). The only overlap is for change in ln(float)  [-0.25, 0]. That region
of common support includes only 7 control firms and 6 IV-compliers. Iliev’s IV/RD estimate already
rests on only 22 IV-complier firms. If one insists on overlap on growth in ln(float), the number of
IV-compliers for which one could estimate a treatment effect is down to 6 firms -- manifestly too
small to support a credible estimate.
In sum, Iliev’s IV is fatally undermined by imbalance between IV-compliers and control firms
on growth. His RD design is also infected by imbalance on growth, but less severely. One would be
better off staying with an RD design, recognizing that SOX-avoidance by some firms could lead to
an underestimate of the average treatment effect, assessing how large that underestimate might be,
and estimate the cost of SOX 404 compliance for a sample of treated and control firms that is balanced
on growth. We turn to those tasks next.
But first, some remarks. Iliev’s IV fails due to severe imbalance on growth. It’s tempting to
think – why didn’t he see this? In fact, finding the imbalance is quite subtle. Iliev wasn’t looking at
2002 – he ran cross-sectional regressions in 2004. He had an RD design, which normally produces
covariate balance on all but the running variable, confirmed balance on levels in 2004, and likely
never suspected there could be imbalance in changes. We convinced ourselves, early on, that growth
was an important covariate, and that his IV estimate was much too high. But for a long time, we
45
didn’t understand why. After all, if we control for growth in the RD regression in Table Iliev-1,
regression (2A), the compliance cost estimate falls, but not by that much (from 0.744 to 0.706). And
in the IV regression (4B), in Table Iliev-3, the coefficient estimate increases. For us, the takeaway
messages from the failure of Iliev’s IV include:
(i) ensuring IV validity is a quite tricky business. An exogenous instrument (which Iliev has)
is necessary but not sufficient;
(ii) it’s crucial to use an extensive set of control variables, and check for covariate balance on
all of them;
(iii) it’s crucial to think carefully about who are the IV-compliers, and how they might differ
from the controls;15
(iv) subtle violations of the only-through condition can lead to large biases in IV results; and
(v) using regression to extrapolate beyond common support is dangerous.
5.6. Our Preferred Analysis: DiD/RD within Principal Strata
We start our own analysis in 2002, with a sample of firms with 2002 float  [$50M, $112.5M]
(using a bandwidth parameter b = 1.5, which we will later vary). We address the imbalance on growth
between all SOX-compliers and all controls by isolating subsamples of firms with similar change in
float from 2002 to 2004. Our first principal strata, shrinkers, are defined as firms with 2002 float
>2004 float.16 The shrinkers group includes two strata: shrinker-compliers with 2002 float > $75M,
who must comply with SOX § 404 regardless of their 2004 float, and “shrinker-controls” –firms with
2002 float < $75M. Since compliance is mandatory and no firms comply with SOX § 404 unless
required to, both strata are fully observed. Moreover, since firms cannot avoid SOX by shrinking,
15
[*analogy to come to Angrist quarter-of-birth instrument; and Buckles and Hungerman (ReStat 2011 or so) critique
for lack of balance [winter babies are different].
16
[*note to come explaining the concept of principal strata and how it generalizes the causal IV categories of always
takers, never takers, instrument compliers, and instrument defiers].
46
there is no reason to expect either stratum to include SOX-avoiders. Thus, the worry that SOXavoiders might have higher compliance costs than other firms does not apply to the shrinkers strata.17
We show the division of shrinkers into shrinker-compliers and shrinker-controls in Figure
Iliev-4. The average change in ln(float) from 2002 to 2004 (the average distance from each point to
the 45 degree dashed line) is similar for the two strata; we confirm balance on growth and other
covariates, except size covariates, in unreported regressions. A DiD comparison of the ln(audit fees)
of these two groups can provide an estimate of the effect of SOX on audit fees that is free of potential
bias due to lack of balance on growth in float.
The second pair of strata we define is for modest growers. These are firms that experience
growth in float but either start above $75 Million in 2002 and do not go above $112.5 Million in 2004
(these will be SOX complier firms), or start below $75 Million in 2002 and do not cross the $75
Million threshold in 2004 (these will be SOX non-complier firms). The modest grower stratum and
its two substrata of complier and non-complier firms are shown on Figure Iliev-5. One can easily
verify that the two substrata are well balanced on mean change of ln(float), which suggest another
DiD estimate of the effect of SOX compliance on audit fees, which will be free of the growth
imbalance concerns.
In Table Iliev-4, we report the results from DiD regressions comparing the log(audit fees) of
complier and non-complier firms in the shrinker strata, modest grower strata, and both strata
combined. All regressions include firm and year fixed effects. For each strata we report three
specifications –1) no controls; 2) with the control variables included in Iliev; and 3) Iliev controls
17
In unreported results, we confirm covariate balance between the shrinker-compliers and shrinker-controls on all
variables except size-related variables, including balance on growth variables.
47
plus cubic terms of float. The estimated treatment effects on log(audit fees) of SOX-404 compliance
range from 0.507 to 0.645. Taking the model with most control variables, the estimated effects for
shrinkers is 0.606; for modest growers it is 0.507, and for both strata combined, it is 0.586. This 0.586
estimate translates into an 80% increase in audit fees attributable to SOX 4004 compliance. , which
equals roughly half of the 167% estimate from Iliev’s IV specification. The combined-strata
coefficient is also a fair bit below Iliev’s RD-only estimate of 0.744, although our DiD/RD estimate
is within the 95% CI for Iliev’s estimate, and vice-versa.
5.7. DiD within Strata -- Further Tests
The principal strata framework also allows us to examine the extent of potential selection bias
that firms with higher compliance costs will manipulate their float to opt out of SOX 404 compliance.
Correcting for this bias was the main reason for Iliev’s IV design. In Figure Iliev-6 we show the strata
of growers and its two substrata – 1) growers-forced-compliers, which include firms that start with
2002 float>$75 Million; and 2) growers-voluntary-compliers, consisting of firms with 2002 float<$75
Million that grow their float above $75 Million in 2004. If Iliev’s selection bias concern is valid, one
would expect the audit fees of growers-voluntary-compliers to be lower that the fees of growersforced-compliers. We report the DiD comparisons of ln(audit fees) of these two substrata in Table
Iliev-5. We find small, insignificant differences between growers-voluntary-complier (treated) and
growers-forced-complier (control) firms. So, there is no evidence that the growers-voluntarycompliers have significantly lower cost of SOX 404 than the mix of grower-compliers and groweravoiders (if they could) in the control group. This suggests that the IV analysis in Iliev is addressing
a source of potential bias that is limited in practice.
For our main tests we chose a bandwidth of $50M-$112.5M in 2002 float. As our last
robustness test of the within-strata DiD analysis we systematically vary this bandwidth. Define a
48
bandwidth parameter b. For a given b, we calculate a 2002 float bandwidth as [$75M/b, $75M * b].
The original bandwidth choice corresponds to b = 1.5. We vary b from 1.10 to 3.00 and show the
estimated treatment effects as a function of b for the shrinker and modest grower strata in Figure 7
Panel A and B, respectively. To preserve degrees of freedom for narrower bandwidths, which have
significantly smaller number of treated and control firms, we present the estimated coefficients only
for the specification without covariates. The estimate treatment effects are relatively stable around
0.6 for large ranges of b, except when b approaches 1.1. The drop is treatment effect for narrower
bands could be driven by increased audit fees of firms that are very close but below the threshold and
prepare preemptively for expected SOX 404 compliance.
Notes on Iliev: We get very different results for covariates than he does, within our 3 comparison
groups. This suggests that there are meaningful differences between our three strata.
Ln(sales) is insignificant for Iliev’s base RD analysis. For us, it is significant and positive for
the rapid growers, insignificant for the shrinkers and modest growers.
No. of geographic segments is significant and positive for Iliev. For us, it is insignificant for
all three groups with varying coefficients and sign.
Iliev and we both find big auditors charge higher fees. But his point estimate is 0.37, while
our estimates differ among groups: 0.38 for modest growers; 0.55 or so for grower-compliers
and shrinkers.
49
6. Guidance: What One Needs for Reliable Shock-Based IV
6.1. An Extended IV Validity Checklist
As we discussed in Section 2, the modern checklist for instrument validity requires authors to
verify three conditions – 1) instrument exogeneity, 2) instrument strength, and 3) only-through. When
we started this study, we originally viewed the instruments in three shock-based papers we reexamine
as valid according to the modern IV checklist. Our reexamination identifies settings when apparently
valid shock-based instruments fail one or more of the IV validity conditions. First, constructing an
instrument as an interaction of the shock with post-shock covariates can violate the exogeneity
condition, as in the D&D case. Second, an instrument can appear strong only because of lack of
balance. Once covariate balance is improved, the instrument can fail the strength requirement, as in
the DMO case. Third, an instruments can fail the only through condition due to lack of balance on
post-shock covariates, as in the Iliev case.
Our analysis suggests the following instrument validity checklist that extends and refines the
three conditions for valid instruments.
Condition 1: Instrument Exogeneity
The key starting point for credible shock-based IV is the choice of a credibly exogenous shock.
The exogeneity of a shock is not statistically testable. Authors would need to argue that a shock is
exogenous from the point of view of the business entities in their sample. Such arguments can be
based on theory or institutional knowledge. For the three papers in our analysis, the shock in Iliev is
clearly exogenous because rule SOX 404 was passed in 2004 and used free float in 2002, which is
non-manipulable by firms. D&D carefully defend shock exogeneity with the argument that the change
in tax rules was designed to help small companies and had only the unintended consequences. The
variables they interact with the shock depend on unobserved firm characteristics, but the shock itself
50
does not. DMO are on more slippery ground here – before their 1999 shock, firms chose whether to
have < 100% independent audit committees. This not fatal for DiD, but puts stress on controlling for
a wide array of pre-shock covariates.
1. Instrument exogeneity
a. Construct instrument directly as a shock dummy or as interaction of shock dummy
and pre-shock firm characteristics only. Adding firm fixed effects will then ensure
automatically that firms have observations both pre and post (D&D lesson)
b. Thus, of our three papers, two satisfy exogeneity; the third does not, but one can
hope to limit the damage.
c. The vast majority of non-shock-IV papers in AB2015 and Larcker and Rusticus
(2010) do not.
2. Instrument strength
a. Reiterate general advice about weak instruments
b. If instrument appears weak, can perhaps find subsample with covariate balance, for
which instrument is both strong and plausibly satisfies only-through – DD as
example]. More formally, one could use Keele and Morgan-like methods to select an
optimal subsample where the instrument is stronger (but then see concerns about
lack of balance below).
c. If instruments appear strong, perform extensive covariate balance analysis of
variables in the first-stage regression, because lack of balance on particular covariate
can generate false instrument strength (DMO lessons).
d. If lack of balance is detected, use matching/balancing methods from observation
studies literature
51
3. Only-through condition
a. Briefly reiterate general discussion about the only-through condition (DD talk about
it, DMO don’t, Iliev does as well)
b. Analyze only-through argument made by DD and see if it holds
c. Then discuss the subtle violation of this condition in Iliev, mentioning collider
variables from Pearl and other analogies with famous IV papers e.g. Angrist & X
study using birth month as instrument for education
d. The Iliev violation is detectable if researchers can identify the subsample of
compliers and perform covariate balance tests comparing this sub-sample with the
remaining sample.
e. If imbalance is found, IV is suspect. Could use balancing methods to restore IV
validity, but the resulting sample will be likely very small (7 vs 6 firms in the Iliev
case).
6.2. Exploiting the Same Shock with Both IV and Other Methods
Our second advice: when doing causal inference authors should not exploit exogenous shocks
using solely IV. Instead, they should use other shock-based methods like DiD, ES, and RD, at least
first, and often last as well! For shock-based IV in which the shock is used directly as the IV, intentto-treat DiD will provide similar statistical strength with weaker assumptions. If one gets results with
DiD, and there is a strong case to be made that the causal effects from the shock to the outcome
variables flow through a single channel, then IV can make sense as a secondary analysis.
For IV designs where the instrument is constructed and an interaction of the shock and a preshock covariate, if the covariate is binary, you are back to DiD with a treated group and a control
group defined using the covariate. If the covariate is continuous, you have, in effect, a DiD52
continuous design (for discussion of such designs, see Atanasov and Black, 2015), and should start
with that.
If the causal question of interest involves an interaction between two variables of interest,
one can instead start with a version of DiDiD design. If both variables are binary, the design is true
DiDiD; if one variable is binary and the other continuous (as in the DMO case, in which the authors
interact a rule compliance dummy and a pre-treatment covariate measuring information costs), the
design is DiDiD-continuous; if both variables are continuous (as in the D&D case, in the which the
interacted variables are a measure of governance and measures of need for tax shields), one would
have a DiDiD-double-continuous design. As with any DiD- or DiDiD-continuous or design, it is
often useful to turn the continuous variable into a binary one as one part of the analysis. We
illustrate this approach above for D&D, but one could do this for DMO as well.
Other lessons from Iliev -- RD on cross-sectional data can be improved by combined DiD +
RD design using panel data.
7. Conclusion
Larcker and Rusticus (2010) remind us that efforts to use “bad” (non-shock-based)
instruments will rarely succeed. This paper provides evidence that apparently “good” (shock-based)
instruments will also rarely succeed. Researchers who begin with a plausible shock should verify
that it meets the conditions for a good shock, outlined above, and that IV is a sensible use of the
shock. Much of this work can be done in a design stage of the research, with outcomes hidden.
[*more to come]
53
References
Adams, Renee B., and Joao A.C. Santos, 2006, Identifying the Effect of Managerial Control on Firm Performance, Journal
of Accounting and Economics 41: 55-85.
Angrist, Joshua D., Guido W. Imbens, and Donald B. Rubin (1996), Identification of Causal Effects Using Instrumental
Variables, Journal of the American Statistical Association 91: 444-455.
Angrist, Joshua D., and Jorn-Steffen Pischke (2009), Mostly Harmless Econometrics: An Empiricist’s Companion.
Atanasov, Vladimir, and Bernard Black (2015), Shock-Based Causal Inference in Corporate Finance and Accounting
Research, Critical Finance Review, forthcoming, working paper at http://ssrn.com/abstract=1718555.
Atanasov, Vladimir, Bernard Black, Conrad Ciccotello, and Stanley Gyoshev (2010), How Does Law Affect Finance?
An Examination of Equity Tunneling in Bulgaria.” Journal of Financial Economics 96: 155-173.
Baiocchi, Mike, Dylan S. Small, Scott Lorch, and Paul R. Rosenbaum (2010), Building a Stronger Instrument in an
Observational Study of Perinatal Care for Premature Infants, 105 Journal of the American Statistical Association
1285-1296.
Bennedsen, Morten, Kasper Meisner Nielssen, Francisco Perez-Gonzalez, and Daniel Wolfenzon, 2007, Inside the Family
Firm: The Role of Families in Succession Decisions and Performance, 122 Quarterly Journal of Economics 647691.
Black, Bernard, Hasung Jang, and Woochan Kim (2006), Does Corporate Governance Affect Firms' Market Values?
Evidence from Korea.” Journal of Law, Economics and Organization 22: 366-413.
Black, Bernard and Woochan Kim (2012), The Effect of Board Structure on Firm Value: A Multiple Identification
Strategy Approach Using Korean Data, Journal of Financial Economics 103: 203-226.
Black, Bernard, Antonio Gledson de Carvalho, Vikramaditya Khanna, Woochan Kim and B. Burcin Yurtoglu (2014),
Methods for Multicountry Studies of Corporate Governance: Evidence from the BRIKT Countries, Journal of
Econometrics (forthcoming), working paper at http://ssrn.com/abstract=2219525.
Busso, Matias, John DiNardo, and Justin McCrary (2014), New Evidence on the Finite Sample Properties of Propensity
Score Reweighting and Matching Estimators, 96 Review of Economics and Statistics 885-897.
Catan, Emiliano, and Marcel Kahan (2014), The Law and Finance of Anti-Takeover Statutes,” working paper, at
http://ssrn.com/abstract=2517594.
Dharmapala, Dhammika, Fritz Foley, and Kristin Forbes, 2011, Watch What I Do, Not What I Say: The Unintended
Consequences of the Homeland Investment Act, 66 Journal of Finance 753-787.
Desai, Mihir, and Dhammika Dharmapala (2009), Corporate Tax Avoidance and Firm Value, 91 Review of Economics
and Statistics 537-546.
Duchin, Ran, John Matsusaka, and Oguzhan Ozbas (2010), When Are Outside Directors Effective?, 95 Journal of
Financial Economics 195-214.
Frangakis, Constantine E., and Donald B. Rubin (2002), Addressing Complications of Intention-to-Treat Analysis in the
Combined Presence of All-or-None Treatment-Noncompliance and Subsequent Missing Outcomes, 96
Biometrika 365-379.
Frangakis, Constantine E., and Donald B. Rubin (2002), Principal Stratification in Causal Inference. 58 Biometrics 2129.
Giannetti, Mariassunta, and Luc Laeven, 2009, Pension Reform, Ownership Structure and Corporate Governance:
Evidence from a Natural Experiment, 22 Review of Financial Studies 4092-4127.
Guner, Burak, Ulrike Malmendier and Jeffrey Tate, 2008, Financial Expertise of Directors, 88 Journal of Financial
Economics 323-354.
Holland, Paul (1986), Statistics and Causal Inference, Journal of the American Statistical Association 81: 945-960.
54
Iliev, Peter (2010), The Effect of SOX Section 404: Costs, Earnings Quality, and Stock Prices, 65 Journal of Finance
1163-1196.
Imbens,
Guido W. (2014), Matching
http://ssrn.com/abstract=2417602.
Methods
in
Practice:
Three
Examples,
working
paper,
at
Imbens, Guido W., and Donald B. Rubin (2015), An Introduction to Causal Inference in Statistics, Biomedical and Social
Sciences.
Karpoff, Jonathan M., and Micahel D. Wittry (2014), Test identification with legal changes: The case of state antitakeover
laws, Working paper, at http://ssrn.com/abstract=2493913.
Keele,
Luke, and Jason Morgan (2013),
http://ssrn.com/abstract=2280347.
Stronger
Instruments
by
Design,
working
paper,
at
Larcker, David F., and Tjomme O. Rusticus (2010), On the Use of Instrumental Variables in Accounting Research, 49
Journal of Accounting and Economics 186-205.
Roberts, Michael R., and Toni M. Whited (2013), Endogeneity in Empirical Corporate Finance, in George M.
Constantinides, Milton Harris, and Rene M. Stulz., eds., Handbook of the Economics of Finance, vol. 2A, 493572.
Rosenbaum, Paul R., 2009, Design of Observational Studies.
Rubin, Donald B., 2008, For Objective Causal Inference, Design Trumps Analysis, 2 Annals of Applied Statistics, 808840.
Small, Dylan, and Paul R. Rosenbaum (2008), War and Wages: The Strength of Instrumental Variables and Their
Sensitivity to Unobserved Biases,” 103 Journal of the American Statistical Association, 924–933.
Stock, James H., Jonathan H. Wright, and Motohiro Yogo (2002), A Survey of Weak Instruments and Weak Identification
in Generalized Method of Moments, 20 Journal of Business and Economic Statistics 518-529.
Wooldridge, Jeffrey M. (2010), Econometric Analysis of Cross Section and Panel Data.
55
Table D&D-1. OLS Results (Table 3 in D&D) with Original and Balanced Samples
Firm and year fixed effects regressions of tax adjusted Tobin’s q (defined in text) on indicated variables over 1993-2001.
Model 1: Regression of tax-adjusted Tobin’s q on BTG (book-tax-gap, defined in text) and covariates. Odd-numbered
regressions use D&D sample (862 firms; 762 with two or more observations, 4,392 effective observations). Evennumbered regressions use “balanced” sample of 487 firms (3,466 observations) with data in both 1996 and 1997. Model
2 adds institutional ownership and interaction between BTG and institutional ownership. Covariates are NOLs (net
operating losses), total accruals, Long term debt, and current debt, all divided by assets; R&D dummy, foreign losses
dummy, ratio of option compensation to total compensation for top 5 executives, sales ($ millions), and implied BlackScholes share price volatility. t-stats clustered on firm in parentheses. *, **, *** indicate significance at the 10%, 5%,
and 1% levels. Significant results (at 5% or better) are in boldface.
Dependent variable
tax-adjusted Tobin’s q
Model
Direct Effect
Mediated by Inst. Ownership
(1)
(2)
(3)
(4)
Sample
original
balanced
original
balanced
0.578
0.645
-2.166
0.131
BTG
(1.03)
(1.06)
(1.45)
(0.10)
5.669*
0.873
BTG * Inst. Ownership
(1.70)
(0.36)
0.682*
0.771**
Institutional Ownership
(1.89)
(2.34)
1.327***
0.891***
1.269***
0.810**
Total Accruals
(3.48)
(2.62)
(3.51)
(2.45)
0.439***
0.465***
0.437***
0.446***
Ratio of option to total compensation
(3.64)
(3.76)
(3.64)
(3.65)
0.044*
0.047*
0.060***
0.063***
Sales
(1.78)
(1.88)
(2.65)
(2.78)
-2.114***
-1.557***
-1.947***
-1.401***
Implied volatility
(3.16)
(2.81)
(3.01)
(2.62)
0.236
0.177
0.237
0.206
Net operating losses
(0.73)
(0.47)
(0.75)
(0.56)
5.545***
5.489***
5.169***
5.338***
Foreign losses
(3.40)
(3.85)
(3.34)
(3.79)
-2.317***
-2.214***
-2.250***
-2.158***
Long term debt
(5.82)
(5.47)
(5.58)
(5.14)
-2.446***
-2.576***
-2.472***
-2.459***
Current debt
(4.28)
(4.92)
(4.63)
(4.56)
-0.089
-0.551
-0.107
-0.432
R&D
(0.04)
(0.37)
(0.05)
(0.29)
No. of firms
862
487
862
487
No of observations
4,492
3,466
4,492
3,466
56
Table DD-2. DiD and DiDiD Analysis
Difference-in-difference (DiD) and triple difference (DiDiD) regressions of tax adjusted Tobin’s q on indicated variables,
with firm and year fixed effects, over 1993-2001. “Treated” firms in DiD analysis have lowBTG96 =1 (below-median
BTG in 1996). Treated firms in DiDiD analysis have lowBTRG96 = 1 and highInstOwn96 =1 (above-median institutional
ownership in 1996); post = 1 for 1997 and after. Noninteracted dummies are absorbed by firm and year effects. Covariates
are same as in Table 1. Sample is “balanced” sample of 487 firms (3,466 observations) with data in both 1996 and 1997.
t-stats errors clustered on firm in parentheses. *, **, *** indicate significance at the 10%, 5%, and 1% levels. Significant
results (at 5% or better) are in boldface.
Dependent variable
tax-adjusted Tobin’s q
DiDiD: Mediated by
Model
DiD: Direct Effect
Inst. Ownership
(1)
(2)
(3)
(4)
lowBTG96 * post
-0.006
-0.003
-0.039
0.033
(0.05)
(0.03)
(0.26)
(0.25)
highInstOwn96 * post
0.051
0.133
(0.35)
(1.01)
lowBTG96 * highInstOwn96 * post
0.076
-0.054
(0.35)
(0.27)
Total Accruals
1.019***
1.009***
(3.10)
(3.07)
Ratio of option to total
0.461***
0.456***
compensation
(3.76)
(3.71)
Sales
0.060***
0.060***
(2.67)
(2.64)
Implied volatility
-1.592***
-1.622***
(2.82)
(2.87)
NOLs
0.083
0.081
(0.22)
(0.22)
Foreign losses
5.433***
5.461***
(3.83)
(3.86)
LT debt
-2.263***
-2.261***
(5.73)
(5.72)
Current debt
-2.668***
-2.653***
(5.32)
(5.34)
R&D
-1.047
-1.108
(0.80)
(0.85)
57
Table DD-3. First-Stage IV (Table 4 in D&D) with Original and Balanced Samples
First-stage instrumental variable regressions of BTG (book-tax gap, defined in text) on indicated instrumental variables,
with firm and year fixed effects, over 1993-2001. Model 1: Instruments are NOL, Long Term Debt and Current Debt,
each interacted with post dummy. Model 2: Adds as additional instruments, the interactions between these instruments
and institutional ownership. Covariates are same as in Table DD-1, variables are defined in Table DD-1. Noninteracted
post dummies are absorbed by firm and year effects. Odd-numbered regressions use D&D sample (862 firms; 762 with
two or more observations, 4,392 effective observations). Even-numbered regressions use “balanced” sample of 487 firms
(3,466 observations) with data in both 1996 and 1997. t-stats clustered on firm in parentheses. *, **, *** indicate
significance at the 10%, 5%, and 1% levels. Significant results (at 5% or better) are in boldface.
Dependent variable
BTG
Model
Direct Effect
Mediated by Inst. Ownership
(1)
(2)
(3)
(4)
Sample
original
balanced
original
balanced
-0.080*
-0.016
-0.092
-0.035**
NOL*post
(1.74)
(0.45)
(0.83)
(2.51)
-0.013
-0.010
-0.047
-0.045
LongTermDebt*post
(0.80)
(0.67)
(1.23)
(1.07)
-0.088*
-0.047
-0.241
-0.310***
Current Debt*post
(1.73)
(1.27)
(1.34)
(2.59)
-0.058
0.037
NOL*post *inst. ownership
(0.43)
(0.19)
LongTermDebt * post * inst.
0.055
0.055
ownership
(0.96)
(0.87)
CurrentDebt * post * inst.
0.334
0.415**
ownership
(1.24)
(2.52)
Covariates
Y
Y
Y
Y
No. of firms
862
487
862
487
No of obs.
4,492
3,466
4,492
3,466
F-test (joint significance of
1.48
1.05
3.33**
3.00**
instruments) (p-value)
(0.22)
(0.39)
(0.02)
(0.01)
58
Table DD-4. DiD/DiDiD Assessment of Instrument Strength
DiD/DiDiD regressions, with firm and year fixed effects, of BTG (book-tax gap, defined in text), on indicated variables.
In Models (1), (2), (5) and (6) treated firms are defined as firms with Low Tax Shield96 =1 and HighInstOwn96=1.
LowTaxShield96 = firm has below-median sum of ranks for NOLs, long term debt, and current debt in 1996.
HighInstOwn96 is defined in Table DD-2. Noninteracted LowTaxShields96, high Inst. Own96, and post are absorbed by
firm and year effects. Even-numbered regressions include covariates; In Models (3), (4), (7) and (8) treated firms are
defined as firms with lowNOL96 =1, and HighInstOwn96=1. Low NOL96 = firm that has below-median NOLs in 1996.
Sample is “balanced” sample of 487 firms (3,466 observations) with data in both 1996 and 1997. t-stats clustered on firm
in parentheses. *, **, *** indicate significance at the 10%, 5%, and 1% levels. Significant results (at 5% or better) are
in boldface.
Dependent variable
BTG
Model
DiD: Direct Effect
DiDiD: Mediated by Inst. Ownership
lowTaxShield96*post
(1)
0.003
(0.72)
(2)
-0.002
(0.52)
lowNOLl96*post
(3)
(4)
-0.002
(0.36)
-0.000
(0.09)
-0.005
(0.66)
0.002
(0.20)
highInstOwn96*post
lowTaxShield96*highInstOwn
96*post
lowNol96*highInst
Own96*post
Total Accruals
Ratio of option to total
compensation
Sales
Implied volatility
NOLs
Foreign losses
Long Term debt
Current debt
R&D
(5)
0.002
(0.42)
0.200***
(5.69)
-0.007
(1.06)
0.000
(0.19)
-0.056**
(2.29)
-0.147***
(6.38)
-0.086
(1.20)
-0.076***
(3.93)
-0.144***
(3.00)
-0.770***
(8.30)
0.200***
(5.69)
-0.007
(1.04)
0.000
(0.19)
-0.056**
(2.29)
-0.147***
(6.37)
-0.087
(1.20)
-0.076***
(3.91)
-0.143***
(2.99)
-0.770***
(8.28)
59
(6)
-0.001
(0.21)
0.002
(0.37)
-0.002
(0.28)
0.197***
(5.57)
-0.008
(1.21)
0.000
(0.41)
-0.050**
(2.18)
-0.146***
(6.37)
-0.091
(1.27)
-0.073***
(3.79)
-0.138***
(2.86)
-0.762***
(8.18)
(7)
(8)
-0.004
(0.69)
-0.008
(0.84)
0.002
(0.33)
0.004
(0.59)
0.004
(0.43)
-0.004
(0.49)
0.196***
(5.56)
-0.007
(1.17)
0.000
(0.38)
-0.050**
(2.20)
-0.146***
(6.39)
-0.092
(1.28)
-0.073***
(3.79)
-0.137***
(2.86)
-0.762***
(8.18)
Table DD-5. Second-Stage IV (Table 5 in D&D) with Original and Balanced Samples
Second stage instrumental variables regression of tax-adjusted Tobin's q or market/book ratio on instrumented BTG,
instrumented (BTG * institutional ownership), and covariates, with firm and year fixed effects. Instruments for BTG in
regression (1) are NOLs*post, long term debt * post, and current debt * post. Additional instruments in regressions (2)
and (3) are these instruments interacted with institutional ownership. Variables are defined in Table DD-1. Oddnumbered regressions use D&D sample (862 firms; 762 with two or more observations, 4,392 effective observations).
Even-numbered regressions use “balanced” sample of 487 firms (3,466 observations) with data in both 1996 and 1997.
t-stats clustered on firm in parentheses. *, **, *** indicate significance at the 10%, 5%, and 1% levels. Significant results
(at 5% or better) are in boldface.
Dependent Variable
tax-adjusted Tobin’s q
Market/Book
Model
Direct Effect
Mediated by Inst. Ownership
(1)
(2)
(3)
(4)
(5)
(6)
Sample
Original
Balanced
Original
Balanced
Original
Balanced
14.523
3.623
-5.871
-6.464
-6.931
-8.712
instrumented BTG
(1.18)
(0.48)
instrumented BTG*institutional
ownership
Institutional ownership
Total accruals
Ratio of option to total
compensation
Sales
Implied volatility
NOLs
Foreign losses
Long Term debt
Current debt
R&D
-2.831
(0.78)
0.349
(1.13)
0.043
(1.57)
-1.022
(1.04)
1.879
(1.29)
6.519***
(2.87)
-1.093
(0.88)
0.998
(0.34)
11.035
(1.06)
0.295
(0.19)
0.485***
(3.49)
0.059***
(2.63)
-1.391**
(2.17)
0.616
(0.51)
5.747***
(3.55)
-1.988***
(2.96)
-2.150*
(1.84)
1.741
(0.29)
60
(1.14)
32.820**
(2.52)
(1.17)
14.255
(1.34)
(1.41)
31.446**
(2.45)
(1.19)
16.230
(1.17)
1.033**
(2.36)
-1.359
(0.57)
0.484**
(2.26)
0.047*
(1.79)
-1.210
(1.57)
1.191
(1.42)
4.570**
(2.50)
-1.401*
(1.66)
-1.042
(0.61)
6.327
(0.97)
0.925**
(2.24)
0.742
(0.50)
0.461***
(3.40)
0.063***
(2.76)
-1.459**
(2.37)
0.195
(0.18)
5.143***
(3.54)
-2.198***
(3.39)
-2.568**
(2.46)
-0.963
(0.17)
1.059**
(2.49)
-0.445
(0.19)
0.553***
(2.88)
0.058**
(2.37)
-1.220
(1.55)
1.019
(1.30)
3.730**
(2.09)
-2.341***
(2.71)
-2.604
(1.55)
4.791
(0.72)
1.157**
(2.55)
1.344
(0.98)
0.494***
(3.62)
0.067***
(2.71)
-1.541**
(2.25)
0.031
(0.03)
4.461***
(3.20)
-3.150***
(5.01)
-4.104***
(4.76)
-2.549
(0.48)
Table DD-6. Intent-to-Treat Estimates
Difference-in-difference (DiD) and triple difference (DiDiD) regressions of tax adjusted Tobin’s q on indicated variables,
with firm and year fixed effects, over 1993-2001. LowTaxShield96 is defined in Table DD-4; HighInstOwn96 is defined
in Table DD-2. Noninteracted LowTaxShields96, high InstOwn96, and post are absorbed by firm and year effects. Evennumbered regressions include same covariates as in Table DD-1. Sample is “balanced” sample of 487 firms (3,466
observations) with data in both 1996 and 1997. t-stats clustered on firm in parentheses. *, **, *** indicate significance
at the 10%, 5%, and 1% levels. Significant results (at 5% or better) are in boldface.
Dependent variable
tax-adjusted Tobin’s q
DiDiD: Mediated by
Model
DiD: Direct Effect
Inst. Ownership
(1)
(2)
(3)
(4)
lowTaxShield96 * post
0.035
-0.016
0.063
-0.007
(0.31)
(0.16)
(0.42)
(0.05)
highInstOwn96 * post
0.117
0.119
(0.60)
(0.70)
lowTaxShield96 * highInstOwn96 *
-0.057
-0.023
post
(0.26)
(0.11)
Total Accruals
1.022***
1.010***
(3.10)
(3.06)
Ratio of option to total compensation
0.460***
0.455***
(3.75)
(3.69)
Sales
0.060***
0.060***
(2.66)
(2.64)
Implied volatility
-1.596***
-1.622***
(2.83)
(2.86)
NOLs
0.082
0.079
(0.22)
(0.22)
Foreign losses
5.435***
5.460***
(3.83)
(3.86)
LT debt
-2.262***
-2.255***
(5.73)
(5.76)
Current debt
-2.675***
-2.660***
(5.29)
(5.34)
R&D
-1.049
-1.111
(0.80)
(0.85)
61
Table DD-7. Covariate Balance
Summary statistics on covariate balance for “balanced” sample of 487 firms (3,466 observations) with data in both 1996
and 1997. Table shows means for three possible ways to define treated and control firms. Panel A. treated if lowBTG96
(defined in Table DD-2) =1. Panel B. treated if highInstOwn96 (defined in Table DD-2) = 1. Panel C. treated if
lowTaxShield96 (defined in Table DD-4) = 1. Sample is “balanced” sample of 487 firms (3,466 observations) with data
in both 1996 and 1997. Covariates are defined in Table 1 and measured in 1996. Table shows t-test for differences in
covariates xj (indexed by j),
| t j || x jt  x jc | /[(s 2jt / Nt  s 2jc / Nc )]1/2 where sjt and sjc are standard deviations for
treated and control groups. *, **, *** indicate significance at the 10%, 5%, and 1% levels. Table also shows absolute
values of “normalized differences”, suggested by Imbens and Rubin (2014),
ND j | x jt  x jc | /[(s 2jt  s 2jc ) / 2]1/2 .
Panel A. Treated = Below Median BTG in 1996; Control = Above-Median
Variable
Tax-adjusted q
Total Accruals
Options to Total Comp.
Sales
Implied Volatility
Foreign Losses
R&D
No. Firms
Mean (Controls)
2.406
-0.031
0.355
3.753
0.312
0.030
0.039
248
Mean (Treated)
2.311
-0.039
0.352
3.017
0.365
0.034
0.049
239
Norm. Difference
0.040
0.078
0.007
0.076
0.252
0.097
0.120
t-statistic
0.62
1.23
0.10
1.19
4.07***
1.53
1.89*
Panel B. Treated = Above Median Institutional Ownership in 1996; Control = Below-Median
Variable
Tax-adjusted q
Total Accruals
Options to Total Comp.
Sales
Implied Volatility
Foreign Losses
R&D
No. Firms
Mean (Controls)
2.357
-0.039
0.323
3.613
0.350
0.030
0.042
244
Mean (Treated)
2.361
-0.032
0.384
3.169
0.326
0.034
0.045
243
Norm. Difference
0.002
0.073
0.173
0.046
0.119
0.077
0.035
t-statistic
0.03
1.14
2.74***
-0.72
1.86*
1.21
0.54
Panel C. Treated = Below Median Tax Shields in 1996; Control = Above-Median
Variable
Tax-adjusted q
Total Accruals
Options to Total Comp.
Sales
Implied Volatility
Foreign Losses
R&D
No. Firms
Mean (Controls)
2.774
-0.032
0.368
2.171
0.366
0.033
0.054
241
Mean (Treated)
1.954
-0.038
0.339
4.587
0.311
0.030
0.033
246
62
Norm. Difference
0.329
0.054
0.083
0.247
0.265
0.063
0.250
t-statistic
5.46***
0.85
1.29
3.97***
4.28***
0.98
4.03***
Table DMO-1. Comparison of Compliers and Non-Compliers as of 2000
Treated (control) firms are firms which lack (have) 100% independent directors on the audit committee as of 2000.
Sample is 905 firms included in the Tobin’s q regressions in Table DMO-3. Table shows two-sample t-test for differences
in means and “normalized differences”, suggested by Imbens and Rubin (2014), ND j | x jt  x jc | /[( s 2jt  s 2jc ) / 2]1/2 ,
where xjt and xct are values for treated and control firms, and sjt and sjc are the corresponding standard deviations. *, **,
*** indicate significance at the 10%, 5%, and 1% levels. Significant results (at 5% or better) are in boldface. Amounts
in $ millions.
Variable
Mean
Mean
Norm.
t-statistic
(Controls)
(Treated)
Difference
Core Variables
Pct. Independent Directors
69.69
53.04
-0.586
15.70***
Pct. Independent Directors on Audit Committee
100.00
63.57
-0.903
53.60***
Information Cost Index
0.486
0.459
-0.099
2.03**
Number of Analysts
16.076
16.683
0.038
0.77
Analyst dispersion
0.085
0.067
-0.120
2.35**
Analyst Forecast Error
0.162
0.152
-0.026
0.56
Pre-treatment outcome variables
ROA
0.149
0.152
0.028
0.58
Q
2.157
2.365
0.064
1.41
Annual Return
0.013
0.012
-0.024
0.51
Covariates used by DMO
MV of equity
8,471
12,451
0.081
1.83*
Assets
12,096
15,205
0.038
0.84
Board Size
9.657
9.989
0.079
1.68*
Book leverage
0.412
0.325
-0.061
1.41
Firm Age
26.9
26.30
-0.029
0.61
Other potential covariates (in DMO dataset)
Annualized std. dev. of returns
0.152
0.145
-0.073
1.53
Intangible assets
0.716
0.700
-0.049
1.04
Market/book ratio
3.506
3.752
0.031
0.63
Number of business segments
2.909
2.855
-0.020
0.40
63
Table DMO-2. DMO Table 3, cols. (1)-(4) and Intent-to-Treat DiD
Dependent variables δROA and δQ are difference between 2005 and 2000 values of Q and ROA respectively. Dependent
variable mean return equals the average monthly return from 2000 to 2005. Non-comply dummy = 1 if firm did not have
100% independent audit committee in 2000. The reported first-stage regression corresponds to Model (3) ( δq). DMO’s
reported first-stage differs slightly (coefficient = 11.383, s.e. = 1.021) because they include in their first stage regression
observations with missing second stage dependent variable. FF Industry Dummies = dummies for each of the 48 FamaFrench industries. T-stats, clustered on industries in parentheses. . *, **, *** indicate significance at the 10%, 5%, and
1% levels. Significant results (at 5% or better) are in boldface.
Dependent variable
Non-comply dummy
First-Stage
IV
δIndep.
Directors
(1)
11.399***
(9.40)
δIndep
Info Cost
PctIndep
Board Size
Book Leverage
Age
Market Cap
FF Industry Dummies
Number of obs.
R2
not incl.
not incl.
-0.186
(0.95)
0.149
(0.30)
-0.073*
(2.00)
0.275
(1.42)
Yes
990
0.14
Second-Stage IV
Intent-to-Treat DiD
δROA
δq
(2)
(3)
mean
return
(4)
0.001
(0.03)
not incl.
not incl.
-0.021
(0.17)
0.967**
(2.53)
0.010
(0.54)
-0.366***
(2.63)
Yes
983
0.06
-0.252
(1.20)
not incl.
not incl.
1.393*
(1.93)
5.090***
(4.26)
0.495***
(4.08)
-13.873***
(7.33)
Yes
990
0.33
0.005
(0.93)
not incl.
not incl.
0.002
(0.10)
0.045
(0.77)
0.002
(0.81)
-0.361***
(7.65)
Yes
880
0.29
64
δROA
δq
(5)
0.011
(0.03)
(6)
-2.874
(1.13)
mean
return
(7)
0.057
(0.91)
not incl.
not incl.
-0.021
(0.16)
0.967**
(2.44)
0.010
(0.51)
-0.365**
(2.52)
Yes
983
0.06
not incl.
not incl.
1.440*
(1.89)
5.053***
(4.41)
0.513***
(3.88)
-13.942***
(6.99)
Yes
990
0.33
not incl.
not incl.
0.001
(0.07)
0.046
(0.79)
0.002
(0.72)
-0.361***
(7.36)
Yes
880
0.29
Table DMO-3. DMO Model versus IV Model without Control for Information Cost
Results for DMO model and analogous 2SLS regressions. Dependent variables δROA and δQ are differences between 2005 and 2000 values of Q and ROA
respectively. Mean return is average monthly return from 2000 through 2005. Non-comply dummy =1 if firm lacks 100% independent audit committee in 2000;
0 otherwise. Reported first-stage regressions are for sample with δQ as outcome variable. δIndep is predicted in a separate regression in the DMO model, and
instrumented in our IV results. FF Industry Dummies = dummies for 48 Fama-French industries. t-statistics, with industry clusters, in parentheses. *, **, ***
indicate significance at the 10%, 5%, and 1% levels. Significant results (at 5% or better) are in boldface.
0
(1)
δROA
DMO Model
(2)
δQ
(3)
Mean return
0.269***
(2.72)
1.918***
(5.82)
0.056***
(6.15)
Non-comply dummy
Non-comply dummy * Info Cost
Predicted δIndep
First Stage IV
(4)
(5)
δIndep
δIndep*Info
Cost
-1.714*
12.126***
(1.79)
(4.43)
-0.548
15.436***
(0.13)
(6.29)
Instrumented δIndep
(Predicted δIndep) * Info Cost
-0.587***
(3.10)
-4.714***
(7.90)
Book Leverage
Age
Market Cap
FF Industry Dummies
1st stage F-statistic
Number of obs.
R2
not incl.
not incl.
-0.000
(0.00)
1.000***
(2.93)
0.011
(0.49)
-0.442***
(2.95)
Yes
-897
not incl.
not incl.
1.307*
(1.83)
5.167***
(8.18)
0.562***
(3.89)
-14.985***
(6.83)
Yes
-905
Second Stage IV
(7)
(8)
δQ
Mean return
0.258***
(2.72)
1.754***
(4.90)
0.062***
(5.59)
-0.556***
(2.79)
not incl.
not incl.
-0.041
(0.29)
0.935**
(2.58)
0.022
(1.04)
-0.447***
(2.73)
Yes
-897
0.03
-4.281***
(5.38)
not incl.
not incl.
1.010
(1.42)
4.673***
(4.47)
0.647***
(4.95)
-15.035***
(8.47)
Yes
-905
0.28
-0.113***
(4.59)
not incl.
not incl.
-0.014
(0.72)
0.030
(0.49)
0.006*
(1.76)
-0.391***
(8.34)
Yes
-805
0.20
-0.103***
(4.98)
Instrumented (δIndep * Info Cost)
Info Cost
PctIndep
Board Size
(6)
δROA
not incl.
not incl.
-0.003
(0.17)
0.045
(0.67)
0.004
(1.07)
-0.384***
(7.42)
Yes
-897
not incl.
not incl.
-0.118
(0.58)
0.225
(0.48)
-0.073*
(1.88)
0.242
(1.03)
Yes
50.94
905
0.15
65
not incl.
not incl.
-0.129
(1.33)
0.054
(0.23)
-0.015
(0.82)
0.032
(0.22)
Yes
43.26
905
0.17
Table DMO-4. Adding Information Cost as Control: IV and Intent-to-Treat DiDiD-Continuous
Results for 2SLS and analogous intent-to-treat DiD regressions. Dependent variables, instruments, instrumented variables, and sample are same as in Table DMO3. Covariates are same, except we add Info Cost as a covariate. Reported first-stage regressions are for sample with δQ as outcome variable. t-statistics, with
industry clusters, in parentheses. *, **, *** indicate significance at the 10%, 5%, and 1% levels. Significant results (at 5% or better) are in boldface.
Dependent Variable:
Non-comply dummy
Non-comply dummy * Info Cost
First Stage IV
(1)
(2)
δIndep
δIndep*Info
Cost
-0.299
10.490***
(0.21)
(3.23)
2.869
12.482***
(0.46)
(3.49)
Instrumented δIndep
Instrumented δIndep * Info Cost
Info Cost
PctIndep
Board Size
Book Leverage
Age
Market Cap
FF Industry Dummies
1st stage F-statistic
Number of obs.
R2
-4.025
(0.87)
not incl.
-0.127
(0.64)
0.237
(0.54)
-0.073*
(1.89)
0.191
(0.78)
Yes
49.08
905
0.16
3.480
(1.42)
not incl.
-0.122
(1.18)
0.044
(0.17)
-0.015
(0.81)
0.076
(0.55)
Yes
45.38
905
0.17
Second Stage IV
(4)
(5)
δQ
Mean return
(3)
δROA
0.234*
(1.72)
-0.507*
(1.72)
-0.914
(0.29)
not incl.
-0.039
(0.27)
0.940***
(2.67)
0.021
(0.97)
-0.456***
(3.07)
Yes
-897
0.04
66
1.025
(1.63)
-2.758**
(2.15)
-28.171*
(1.89)
not incl.
1.060
(1.49)
4.835***
(5.83)
0.617***
(4.95)
-15.263***
(7.85)
Yes
-905
0.33
0.063***
(3.20)
-0.116***
(2.85)
0.049
(0.11)
not incl.
-0.014
(0.72)
0.029
(0.49)
0.006*
(1.69)
-0.390***
(8.57)
Yes
-805
0.19
(6)
δROA
Intent-to-Treat DiDiD
(7)
(8)
δQ
Mean return
2.573*
(1.76)
-5.596*
(1.83)
11.575*
(1.81)
-31.487**
(2.53)
0.613***
(3.03)
-1.109**
(2.57)
-3.639
(1.68)
not incl.
-0.004
(0.14)
0.973***
(0.36)
0.011
(0.02)
-0.457***
(0.15)
Yes
-897
0.08
-41.892***
(4.84)
not incl.
1.266*
(0.74)
4.956***
(0.67)
0.584***
(0.15)
-15.277***
(2.23)
Yes
-905
0.38
-0.660**
(2.30)
not incl.
-0.005
(0.02)
0.042
(0.06)
0.003
(0.00)
-0.388***
(0.05)
Yes
-805
0.32
Table DMO-5. Adding Pct. Independent Directors as Control: IV Results
2SLS regressions. Sample is trimmed to PctIndep  (0.25 , 0.80]. Columns (1)-(4) report first- and second-stage results using non-comply dummy to instrument
for δIndep. Columns (5)-(9) report results using non-comply dummy to instrument for δIndep. In these columns, dependent variables, instruments, and instrumented
variables are same as in Table DMO-4, and covariates are the same, except we add PctIndep. Reported first-stage regressions are for sample with δQ as outcome
variable. t-statistics, with industry clusters, in parentheses. *, **, *** indicate significance at the 10%, 5%, and 1% levels. Significant results (at 5% or better)
are in boldface.
Sample
PctIndep  (0.25 , 0.80]
Dependent Variable:
Non-comply dummy
First Stage IV
(1)
δIndep
(2)
δROA
Second Stage IV
(3)
(4)
δQ
Mean return
2.277**
(2.34)
Non-comply dummy *
Info Cost
-0.037
(-0.16)
-2.514*
(-1.74)
0.001
(0.03)
-1.487
(-0.52)
-0.589***
(-13.30)
0.064
(0.35)
0.569**
(2.41)
0.050
(1.34)
0.495
(1.64)
Yes
-6.371***
(-3.05)
-0.042
(-0.29)
-0.005
(-0.04)
1.300***
(3.70)
0.006
(0.22)
-0.355
(-1.29)
Yes
-53.387***
(-5.18)
-1.542*
(-1.73)
1.651*
(1.93)
6.723***
(4.99)
0.752***
(4.32)
-14.998***
(-7.48)
Yes
-1.109***
(-3.12)
-0.003
(-0.09)
-0.005
(-0.17)
-0.018
(-0.42)
0.004
(0.75)
-0.371***
(-8.62)
Yes
719
0.33
712
0.10
719
0.01
638
0.30
Instrumented δIndep
First Stage IV
(5)
(6)
δIndep
δIndep*
Info Cost
-1.263
-3.346***
(0.61)
(2.90)
7.519*
10.083***
(1.82)
(3.45)
Instrumented δIndep *
Info Cost
Info Cost
PctIndep
Board Size
Book Leverage
Age
Market Cap
FF Industry Dummies
1st stage F-stat
(p-value)
Number of obs.
R2
67
-4.654
(1.38)
-0.586***
(13.46)
0.054
(0.31)
0.573**
(2.45)
0.048
(1.27)
0.488
(1.65)
Yes
4.09
(.023)
5.574***
(2.74)
-0.278***
(11.41)
-0.045
(0.39)
0.267*
(1.91)
0.045**
(2.58)
0.173
(1.18)
Yes
6.14
(.0043)
(7)
δROA
0.412
(0.77)
-0.696
(1.06)
1.161
(0.17)
0.026
(0.15)
-0.060
(0.37)
1.227***
(3.61)
0.016
(0.56)
-0.441
(1.32)
Yes
--
Second Stage IV
(8)
(9)
δQ
Mean return
-1.017
(0.36)
-2.433
(0.74)
-27.261
(0.75)
-1.346
(1.31)
1.477
(1.63)
6.507***
(4.55)
0.791***
(5.01)
-15.294***
(7.75)
Yes
--
0.170
(0.94)
-0.256
(1.39)
1.641
(0.74)
0.026
(0.40)
-0.030
(0.64)
-0.056
(0.80)
0.007
(1.18)
-0.403***
(7.95)
Yes
--
Table Iliev-1. Extended Version of Iliev (2010) Table 2
Ordinary least squares (2SLS) regressions of ln(audit fees in 2004) on dummy for SOX § 404 compliance, indicated
covariates (measured for, or at end of, fiscal 2004), 10 industry dummies, and constant term. Sample is 281 firms
with 2004 free float  [$50M, 100M]. Non-growth variables are defined in Iliev (2010); growth variables are
measured from 2002-2004. The first-stage regressions have the same controls and fixed effects as the second stage.
t-statistics, with heteroskedasticity-consistent standard errors, are in brackets. 95% confidence interval (CI) for
compliance dummy shown below t-statistic. *, **, and *** denote significance at the 10%, 5%, and 1% levels,
respectively. Significant results (at 5% or better) in boldface. Amounts in $M.
Dependent variable
ln(2004 audit fees)
Our regression
(1)
(1A)
(2)
(2A)
Iliev regression
(1)
(2)
0.866***
0.749***
0.744***
0.706***
Compliance dummy
[7.57]
[7.36]
[7.39]
[6.40]
95% CI
[0.64, 1.09]
[0.55, 0.95]
[0.55, 0.94]
[0.49, 0.92]
Implied % increase in fees
138%
112%
111%
103%
Cubic in float
Yes
Yes
Yes
Yes
0.020
0.050
0.088
Ln(market cap)
[0.21]
[0.51]
[0.87]
0.042
0.031
0.047
Ln(sales)
[1.52]
[1.09]
[1.32]
0.354***
0.235***
0.183**
Ln(assets)
[5.43]
[3.35]
[2.33]
0.612***
0.653***
Leverage
[2.62]
[2.69]
0.086
0.105
Receivables/assets
[0.35]
[0.43]
0.370***
0.361***
Big auditor
[3.94]
[3.80]
0.040
0.043
No. of business segments
[1.45]
[1.53]
0.070***
0.072***
No. of geographic segments
[2.91]
[3.03]
Growth variables
0.012
Change in ln(float)
[0.55]
-0.096*
Change in ln(market cap)
[-1.95]
-0.020
Change in ln(sales)
[-0.45]
0.078
Change in ln(assets)
[0.85]
F-stat for growth variables
10.93***
Industry dummies, constant
Yes
Yes
Yes
Yes
Sample
281
281
281
275
R2
0.32
0.49
0.55
0.56
68
Table Iliev-2. Summary Statistics for Covariate Balance
Panel A. Covariate balance in 2002, for 180 firms in Iliev dataset with 2002 free float  [$50M, $75M] versus 110
firms with 2002 free float  [$50M, $75M]. Panel B. Covariate balance in 2004, for 164 SOX compliers and 117
controls with 2004 free float  [$50M, $100M]. Panel C. Panel C. Covariate balance in 2004, for 22 shrinkers (IVcompliers; with 2002 float > $75M and 2004 float < $75M), and 117 controls.
All panels. Growth variables are measured from 2002-2004. Tables show t-statistic for differences in means between
treated and control group and “normalized difference”, (see Imbens and Rubin, 2014), defined as
ND j | x jt  x jc | /[(s 2jt  s 2jc ) / 2]1/2 , where xj is a covariate and sjt and sjc are standard deviations for the treated
and control groups. *, **, *** indicate significance at the 10%, 5%, and 1% levels. Significant differences (at 5%
level or better) are in boldface.
Panel A. Covariate Balance in 2002: Free Float  [$50M, $75M] versus [$75M, $100M]
Variable
Size variables
ln(float)
ln(market cap)
ln(sales)
ln(assets)
Iliev’s other covariates
Leverage
Receivables/ assets
Big auditor
No. of business segments
No. of geographic segments
Growth variables
Change in ln(float)
Change in ln(market cap)
Change in ln(assets)
Change in ln(sales)
Float <
$75M
Means
Float > $75M
Norm. Difference
t-statistic
4.474
4.819
4.173
4.646
4.769
5.027
4.527
4.982
0.194
0.277
0.107
0.238
2.14**
3.34***
1.28
2.86***
0.135
0.257
0.783
1.739
1.600
0.159
0.298
0.791
1.845
2.145
0.086
0.132
0.013
0.057
0.229
1.02
1.57
0.15
0.68
2.77***
0.353
0.252
0.121
0.134
0.315
0.124
0.158
0.159
-0.025
-0.117
0.057
0.026
0.28
1.30
0.68
0.31
Panel B. Covariate Balance in 2004: SOX-compliers versus Controls
Variable
Size variables
ln(float)
ln(market cap)
ln(sales)
ln(assets)
Iliev’s other covariates
Leverage
Receivables/ assets
Big auditor
No. of business segments
No. of geographic segments
Growth variables
Change in ln(float)
Change in ln(market cap)
Change in ln(assets)
Change in ln(sales)
Controls
Means
SOX-compliers
Norm. Difference
T-Test Value
4.128
4.385
4.221
4.467
4.386
4.783
4.138
4.701
0.762
0.533
-0.027
0.154
13.43***
7.42***
0.31
1.855*
0.177
0.304
0.744
2.154
1.974
0.152
0.261
0.829
1.774
1.976
-0.085
-0.141
0.146
-0.182
0.000
1.00
1.67*
1.75*
2.19**
0.01
0.960
0.508
0.142
0.121
0.303
0.159
0.073
0.008
-0.331
-0.272
-0.113
-0.089
4.33***
3.25***
1.30
1.00
69
Table Iliev-2. (Cont.)
Panel C. Covariate Balance in 2004: Shrinkers (IV-compliers) versus Controls
Variable
Size variables
ln(float)
ln(market cap)
ln(sales)
ln(assets)
Iliev’s other covariates
Leverage
Receivables/ assets
Big auditor
No. of business segments
No. of geographic segments
Growth variables
Change in ln(float)
Change in ln(market cap)
Change in ln(assets)
Change in ln(sales)
Controls
Means
SOX-compliers
Norm. Difference
T-Test Value
4.128
4.385
4.221
4.467
4.125
4.616
3.653
4.818
-0.015
0.353
-0.138
0.242
0.09
2.18**
1.12
1.36
0.177
0.304
0.744
2.154
1.974
0.090
-0.224
0.298
-0.223
0.153
0.589
-1.295
1.697*
-1.333
0.954
0.09
0.22
0.30
0.22
0.15
0.960
0.508
0.142
0.121
-0.566
-0.582
-0.105
-0.245
-0.652
-0.700
-0.348
-0.144
4.19***
5.76***
2.61**
1.43
70
Table Iliev-3. Extended Version of Iliev (2010) IV Analysis in His Table 2
Two-stage least squares (2SLS) regressions of ln(audit fees in 2004) on dummy for SOX § 404 compliance, indicated covariates (measured for, or at end of, fiscal
2004), 10 industry dummies, and constant term. Sample is 281 firms with 2004 free float  [$50M, 100M]. Non-growth variables are defined in Iliev (2010);
growth variables are measured from 2002-2004. First-stage regressions have the same covariates as the second stage. t-statistics, with heteroskedasticity-consistent
standard errors, are in brackets. *, **, and *** denote significance at the 10%, 5%, and 1% levels, respectively. Significant results (at 5% or better) in boldface.18
Stage
Dep. variable
Our regression
Iliev regression
IV (2002 float > $75M)
First
(3)
(3)
0.466***
[7.28]
First
First
First
SOX Compliance
(3A)
(4)
(4A)
(4)
0.325***
0.376***
0.287***
[6.48]
[6.10]
[5.88]
First
Second
(4B)
(3)
(3)
Log market cap
Log sales
Log assets
Leverage
Receivables/assets
Big auditor
No. of business segments
No. of geographic segments
No
Yes
No
0.328***
[5.59]
-0.002
[0.13]
0.030
[0.72]
-0.063
[0.38]
-0.165
[1.15]
-0.014
[0.22]
-0.029
[1.51]
0.004
[0.31]
Yes
0.148***
[3.03]
-0.007
[-0.43]
0.024
[0.74]
0.054
[0.39]
-0.230**
[-2.11]
0.009
[0.18]
-0.031**
[-2.30]
0.006
[0.60]
Second
Second
Ln(2004 audit fees)
(3A)
(4)
(4A)
(4)
Second
(4B)
0.237***
[4.53]
Instrumented SOX compliance
95% CI
Implied % increase in fees
Cubic in float
Second
Yes
0.156***
[3.09]
0.004
[0.25]
-0.001
[-0.02]
0.095
[0.65]
-0.237**
[-2.21]
0.002
[0.05]
-0.031**
[-2.35]
0.008
[0.86]
1.171***
[4.95]
[0.71,1.64]
223%
No
1.332***
[3.98]
[0.67,1.99]
279%
Yes
0.983***
[3.65]
[0.45,1.51]
167%
No
-0.052
[-0.38]
0.034
[1.09]
0.218***
[2.79]
0.647***
[2.75]
0.129
[0.55]
0.373***
[3.86]
0.047*
[1.71]
0.069***
[2.77]
1.070***
[3.11]
[0.39,1.75]
192%
Yes
-0.006
[-0.05]
0.036
[1.14]
0.218***
[2.78]
0.598**
[2.51]
0.160
[0.64]
0.364***
[3.81]
0.051*
[1.82]
0.069***
[2.71]
1.179**
[2.45]
[0.23,2.12]
225%
Yes
0.001
[0.01]
0.048
[1.34]
0.180**
[2.17]
0.598**
[2.32]
0.208
[0.80]
0.357***
[3.69]
0.057*
[1.90]
0.067**
[2.58]
Growth variables
-0.029***
[-2.91]
-0.030
Change in ln(float)
Change in ln(market cap)
18
0.038
[1.14]
-0.073
We replicated Iliev’s first- and second-stage coefficients, and his second-stage, but not first-stage standard errors. For regression (3), he reports t = 10.15 (we
find 7.28); for regression (4), he reports t = 7.74 (we find 6.10).
71
Stage
Dep. variable
Our regression
Iliev regression
First
First
(3)
(3)
(3A)
First
SOX Compliance
(4)
(4)
First
First
Second
(4A)
(4B)
(3)
(3)
Yes
Yes
281
0.60
[-1.00]
-0.015
[-0.30]
-0.022
[-0.98]
3.96***
Yes
Yes
275
0.62
Change in ln(sales)
Change in ln(assets)
F-test for growth vars.
Free float cubic
Industry dummies, constant
Observations
R2
No
Yes
281
0.21
Yes
Yes
281
0.56
No
Yes
281
0.54
72
No
Yes
281
0.28
Second
Second
Second
Ln(2004 audit fees)
(3A)
(4)
(4A)
(4)
Yes
Yes
281
0.28
No
Yes
281
0.54
Yes
Yes
281
0.54
Second
(4B)
[-1.26]
0.081
[0.84]
-0.009
[-0.17]
9.99***
Yes
Yes
275
0.53
Table Iliev-4. DiD Analysis Using Strata (sample with 2002 float  [$ 50M, $112.5M)
Firm fixed effects regressions of ln(audit fees) on SOX compliance dummy and indicated covariates, using combined DiD/RD research design (DiD for firms with
2002 float with a [$50M, $112.5M] bandwidth around the SOX compliance threshold of $75M). Sample period is 2002-2004. For the Shrinkers strata (regressions
(1)-(3)), sample is firms with ln(2004 float/2002 float)  [-1.3, 0] (maximum drop in float of 73%). Treated firms are firms with 2002 float > 75M; control firms
have 2002 float < 75M. in the Modest Grower strata comparison, We focus only on modest growers (firms with higher float in 2004 than 2002) and exclude firms
with 2002 float < $75M but 2004 float > $75M and firms with 2002 float > $75M and 2004 float > 112.5M. Treated firms are modest growers with 2002 float >
75M. Control firms are firms with 2002 float < 75M. Some control firms may have limited 2004 float to avoid SOX § 404 compliance. In the combined strata
comparison, we pool the treated and control firms from the Shrinker and Modest Grower strata. Standard errors clustered on firm in parentheses. *, **, *** indicate
significance at the 10%, 5%, and 1% levels. Significant results (at 5% or better) are in boldface.
Dependent variable
Regression
Cubic Float Terms
SOX Compliance Dummy
(Treated * Post)
95% CI
Implied % increase in fees
(1)
No
0.616***
[3.53]
[0.27,0.97]
85%
Log sales
Log assets
Log market size of equity
Leverage
Receivables/ total assets
Big auditor
No. of business segments
No. of geogr. segments
No. Obs.
No. Firms
No. Treated Firms
197
67
39
Shrinker Strata
(2)
No
0.590***
[3.17]
[0.22,0.96]
80%
0.128**
[2.49]
0.108
[0.61]
0.023
[0.29]
0.162
[0.52]
-0.228
[-0.27]
0.484***
[4.09]
-0.024
[-0.22]
0.036
[0.47]
197
67
39
(3)
Yes
0.606***
[3.25]
[0.23,0.98]
83%
0.141**
[2.11]
0.130
[0.71]
0.015
[0.18]
0.130
[0.45]
-0.326
[0.35]
0.465***
[3.51]
-0.021
[0.19]
0.038
[0.50]
197
67
39
Ln(audit fees)
Modest Grower Strata
(4)
(5)
(6)
No
No
Yes
0.645***
0.632***
0.507**
[3.48]
[3.75]
[2.29]
[0.34,1.03]
[0.29,0.97]
[0.06,0.96]
91%
88%
66%
-0.304
-0.184
[-0.98]
[-0.49]
0.996**
0.960**
[2.54]
[2.41]
-0.078
-0.090
[-0.79]
[-0.89]
-0.450
-0.342
[-0.93]
[-0.70]
0.981
0.898
[1.52]
[1.28]
0.379***
0.363**
[2.84]
[2.64]
0.024
0.051
[0.25]
[0.51]
0.096
0.083
[0.53]
[0.42]
120
120
120
40
40
40
20
20
20
73
(7)
No
0.627***
[4.97]
[0.38,0.88]
87%
317
107
59
Both Strata
(8)
No
0.578***
[4.43]
[0.32,0.84]
78%
0.105
[1.37]
0.270*
[1.67]
0.010
[0.14]
-0.038
[-0.14]
0.247
[0.44]
0.434***
[4.77]
-0.021
[-0.22]
0.045
[0.65]
317
107
59
(9)
Yes
0.586***
[4.58]
[0.33,0.84]
80%
0.120
[1.46]
0.285*
[1.72]
0.001
[0.02]
-0.056
[0.21]
0.177
[0.29]
0.424***
[4.57]
-0.018
[0.18]
0.048
[0.70]
317
107
59
Table Iliev-5. Selection Bias Check: Grower-compliers vs. Growers Forced to Comply
DiD model of log(2004 audit fees) for grower-complier firms versus growers forced to comply. Sample period is 20022004. We start with a sample of firms with 2002 float between $50M and $112.5M. We focus only on growers (firms
with higher float in 2004 than 2002) and exclude firms with an increase in ln(float) from 2002 to 2004 greater than 1.3
(increase of 267%). “Grower-compliers are firms that move from 2002 float < 75M to 2004 float > 75M. Growers force
to comply are firms with 2002 float > 75M in 2002 and 2004 float > $112.5M. Models 1 and 2 use firms with data in
2002 and 2004; Model 3 uses a balanced sample of firms with data in all three years. Standard errors clustered on firm
in parentheses. *, **, *** indicate significance at the 10%, 5%, and 1% levels. Significant results (at 5% or better) are
in boldface.
Dependent Variable
Cubic Float Terms
growerComplier * Post
Ln(Audit Fees)
No
-0.041
[0.53]
0.124***
[2.64]
0.117
[1.05]
-0.069
[1.30]
0.198
[0.67]
-0.009
[0.02]
0.557***
[3.56]
0.028
[0.39]
0.043
[1.02]
546
187
106
No
-0.064
[-0.81]
Log sales
Log assets04
Log market size of equity
Leverage
Receivables scaled by total assets
Big auditor
No. of business segments
No. of geogr. segments.
No. Obs.
No. Firms
No. Treated Firms
546
187
106
74
Yes
-0.037
[0.48]
0.124***
[2.62]
0.116
[1.01]
-0.070
[1.28]
0.198
[0.66]
-0.005
[0.01]
0.556***
[3.54]
0.028
[0.40]
0.043
[1.02]
516
172
93
Figure DD-1. Average Tobin’s q of Treated vs. Control firms through Time
Average Tobin’s q for treated and control firms over the 1993-2001 period. Treated firms have below-median LowTaxShield96 (defined in Table DD-4),
indicating greater need to shelter taxable income. Control firms have above-median values for this variable.
75
Figure DMO-1. Scatter Plot of PctIndep versus δIndep
Scatter plot of percentage of independent directors in 2000 (PctIndep) and change in this percentage from 2000 to 2005 (δIndep). Complier (non-complier) firms
are firms that do (do not) have 100% independent audit committee in 2000. Compliers (noncompliers) are shown with (green circles) (orange triangles). We add
vertical lines at PctIndep=25% and 80% to highlight the limited overlap between treated and control firms outside these bounds. Sample = 905 firms used in
Tobin’s q regressions in Table DMO-3. Correlation between PctIndep and δIndep is r = -0.67.
76
Figure DMO-2. Number of Treated and Control Firms Within by PctIndep Bins
Histogram plot of number of complier (control) and non-complier (treated) firms, by percentage of independent directors in 2000 (PctIndep). We add vertical
lines at PctIndep=25% and 80% to highlight the limited overlap between treated and control firms outside these bounds. Firms with exactly 25% independent
directors are included in 20-25% bin, and similar for other bins. Sample is same as Figure DMO-1.
77
Figure DMO-3. δIndep for Complier ant and Non-Compliant Firm by Bins of Percent Independent Directors in 2000
Figure shows mean δIndep, separately for complier (control) and non-complier (treated) firms, within bins for percentage of independent directors in 2000
(PctIndep). Firms with exactly 30% independent directors are included in 25-30% bin, and similar for other bins. Sample is 711 firms with PctIndep  (0.25,
0.80].
78
Figure Iliev-1. Audit Fees vs. Free Float and SOX 404 Compliance in 2004
Natural logarithm of 2004 audit fees versus 2004 free float. Graph shows four groups: (i) 41 “shrinker-complier” firms which shrink from float > $75M (in 2002
or 2003) to < $75M in 2004 [red triangles]; (ii) 80 “grower-complier” firms which grow from float < $75M in 2002 to > $75M in 2004 [green diamonds]; (iii) 117
control firms with float < $75M over 2002-2004 [black circles]; and (iv) 42 “large-complier” firms with float > $75M in 2004 [black hollow circles]. Red line is
from regression, for the 163 SOX-complier firms, of ln(2004 audit fees) on 2004 float, large dummy (=1 if 2004 float > $75M), and constant term. Equation is:
12.25 + .013 * Float2004 -0.527 [t= 2.14] * (large dummy). Green line ending at float of $75M is from regression for 117 control firms of ln(2004 fees) on 2004
float and constant term. Equation is: 12.23 + .006 * Float2004. For firms with 2004 float just below (above) $75M, predicted difference in ln(2004 fees) between
shrinker-compliers (other SOX-compliers) and control firms = 1.07 (0.54).
79
Figure Iliev-2. Growth Trajectories for Instrument-compliers, Other SOX-compliers, and Control Firms
Scatter plot of free float in 2002 and 2004 and representation of three groups of firms included in Iliev’s study: 22 IV-compliers with 2002 float > $75M but
float in 2003 and 2004 < $75M, shown with red triangles and red border; 119 control firms, with 2002 float and 2004 float < $75M, shown with green
diamonds and green border; and 142 SOX-complier firms with 2002 float > $75M, shown with black circles and black border. The remaining firms
represented with black dots are firms included in the dataset provided by Iliev, but fall outside his $50-$100M 2004 Free Float band. We truncate the sample of
firms not used by Iliev at 2002 and 2004 free float between $20M and $250M. Dotted line is 45-degree line; firms above (below) the line have an increase
(decrease) in float from 2002 to 2004.
80
Figure Iliev-3. Change in Ln(Float) for IV Compliers versus Controls
Histogram of number of firms within indicated ranges for change in ln(float) from 2002-2004, for 22 “IV-complier” firms (2002 float > $75M but 2004 float <
$75M) and 116 SOX-exempt firms (float < $75M over 2002-2004).We drop three of the 119 SOX-exempt firms in the Iliev sample, because they report float in
2002>$75 Million. There is overlap only in the [-0.25, 0] bin of change in ln(float). This bin includes 7 IV-compliers and 6 control firms.
81
Figure Iliev-4. Ln(Float) in 2004 vs 2002 for Shrinkers Strata
Scatter plot of free float in 2002 and 2004 and visual representation of the treated and control firms included in the DiD analysis of shrinkers reported in Table
Iliev-4. The sample is confined to firms with 2002 Float between $50 and 112.5M. Treated firms, represented by red triangles, have 2002 Float>$75M and 2002
Float>2004 Float. Control firms, represented by green diamonds, have 2002 Float <$75M and 2002 Float>2004 Float. The red and green areas represent the
additional imposed filter that the difference between ln(float) 2004 and Log(Float) 2002 should be higher than -1.3 (less than 73% drop in float). We truncate the
sample of remaining firms represented by black dots at 2002 and 2004 free float between $20M and $250M.
82
Figure Iliev-5. Log(Float) 2004 vs log(Float) 2002 and the Modest Grower Strata
Visual representation of the treated and control firms included in the DiD analysis of modest growers reported in Table Iliev-3. The sample is confined to firms
with 2002 float between $50 and 112.5M. Treated firms, represented by red triangles, have 2002 float > $75M and 2004 float < 112.5M. Control firms, represented
by green diamonds, have 2002 float <$75M and 2004 float < $75M. We truncate the sample of remaining firms represented by black dots at 2002 and 2004 free
float between $20M and $250M.
83
Figure Iliev-6. Log(Float) 2004 vs log(Float) 2002: Grower-Compliers Versus Similar Growers Forced to Comply
Visual representation of the treated and control firms included in the DiD analysis of grower compliers vs. growers forced to comply in Table Iliev-4. Sample is
limited to firms with $50M < 2002 Float < $112.5M. Grower-compliers represented by red triangles, have 2002 Float < $75M and 2004 Float > $75M. Growers
forced to comply, represented by green triangles, have 2002 float >$75M and 2004 float > $112.5M. Red and green shaded areas represent additional filter that
ln(2004 float) - ln(2002 float) < 1.3 (< 267% increase in float). We truncate the sample of remaining firms represented by black dots at 2002 and 2004 free float
between $20M and $400M.
84
Figure Iliev-7. Effect of Bandwidth on Estimated Treatment Effects
Panel A. Shrinkers strata. Graph presents the estimated coefficient on ln(2004 audit fees) for the shrinkers strata, for different bandwidth parameters b (b > 1).
We estimate Regression (1) of Table Iliev-4 for firms with 2002 free float  [$75M/b, $75M*b] (incremented by 0.1) and record the coefficient on Treated *
Post. Panel B. Modest growers strata. Similar to panel A, except sample is modest growers; we estimate Regression (3) of Table Iliev-4. Both panels.
Dotted lines indicate upper and lower 90% confidence bounds.
Panel A: Shrinkers
85
Figure Iliev-7. (Cont.)
Panel B. Modest Growers
86
Figure Iliev-7. (Cont.)
Panel C. Shrinkers and Modest Growers Combined
87
Figure Iliev-8. Treatment Effects over Time
Top panel shows the mean change from 2002 in ln(audit fees) for SOX-compliers vs. noncompliers within shrinkers
strata (defined in Figure Iliev-4) in 2003 and 2004, together with 90% confidence interval for specifications with and
without covariates. Bottom panel is similar, for modest growers strata (defined in Figure Iliev-5).
88
Appendix. Other Shock-Based IV Papers in the AB-2015 Sample
AB-2015 found eight shock-based IV papers, in their sample of 863 empirical corporate
governance papers over 2001-2011, published in major journals. This appendix explains why we
chose to re-examine D&D, DMO, and Iliev, rather than other papers.
We excluded Bennedsen et al. (2007), who rely on a truly random shock. Their data would
likely have been proprietary in any case. We spoke with Dhammika Dharmapala, who was a coauthor
of two of these papers, and he recommended that we review D&D rather than Dharmapala, Foley,
and Forbes (2011), for which the data was proprietary. We discuss here the remaining three papers,
and why we did not select them for review: Adams and Santos (2006); Giannetti and Laeven (2009);
and Guner, Malmendier and Tate (2008). Of note: In none of these three papers were the IV results
statistically significant at the conventional 5% level.
Adams and Santos (2006)
Adams and Santos (2006) is the first shock-based IV paper in our sample (by time of
publication). It is a strong and careful paper in many ways. The authors study whether the “wedge”
between managers’ voting rights and their cash flow rights (wedge) affects Tobin’s q. They study
banks with trust departments that hold the bank’s own shares. The bank’s managers control the voting
of these shares, but have no associated cash flow rights. The main empirical design is cross-sectional
OLS regressions of Tobin’s q on bank voting rights in 1966 (the year for which they were able to
obtain ownership data, from a U.S. House of Representatives report). The authors use a shock-based
IV design in a secondary analysis, intended to address concerns about possible endogeneity between
the bank’s holdings of its own shares in fiduciary accounts and Tobin’s q. They use, as instruments
for wedge, dummy variables for the four types of state laws that regulate whether and how a bank
89
trust department can vote the bank’s own shares. These laws range from no restriction on voting to
a ban on voting. These laws were in place well before 1966.
One might question whether these laws affect Tobin’s q only through managerial voting of
trust shares. For example, tight regulation of voting might well be associated with tight regulation of
banks generally, which might predict Tobin’s q. The authors are aware of this issue, and discuss why
the only through condition is plausibly satisfied.
Adams and Santos (2006) use two different measures of wedge. For the first measure, they
report that the F-statistics for their four instruments are quite low, and in most specifications, are not
statistically insignificant. They therefore instrument only for the second measure. The F-statistic for
this measure in the first-stage regressions range from 3.02 to 6.94, which suggests that the
instruments, although jointly significant, remain vulnerable to a weak instruments problem. The
authors do not report their 2SLS results in a table, but do provide limited results in a footnote. The
coefficients on instrumented wedge are much smaller than the OLS specification and are not
statistically significant. The authors run a Durbin-Wu-Hausman test for endogeneity, which does not
reject the null of no endogeneity, in which case OLS is a preferred specification.
We did not select this paper to replicate because the 2SLS results were statistically
insignificant, not reported in text, and not the focus of the paper. We also had concerns about the
only through condition.
Giannetti and Laeven (2009)
Giannetti and Laeven (2009) examine the effect of pension fund ownership on firm market
value. They use pension reform in Sweden as a shock to ownership of public companies by public
pension funds. The reform proceeded in two phases. In the first phase, the one public pension fund
with significant equity holdings was required to divest most of these holdings. In the second phase,
90
four other public pension funds, which had previously invested primarily in debt, were required to
increase their equity holdings. IV is their main research design. The authors use two sets of
instruments for changes in ownership by pension funds – one for the first phase of the reform; the
other for the second phase. We focus here on the first phase; which (potentially) has a clean shockbased IV, which the second phase does not. The instruments are a dummy indicating whether the
divesting fund held shares in the company pre-shock, plus three non-shock instruments: the cash
flow rights of the other public pension funds in the firm, the cash flow rights of private pension funds,
and the firm’s market capitalization. The authors do not discuss the only through condition. We treat
this as a shock-based IV paper because the divestment shock is the strongest of the four instruments
in their first stage.
We did not select this paper to replicate because the authors combined a shock-based IV with
several non-shock IVs, and because in the second stage, the instrumented variable (ownership by the
divesting fund) is only marginally significant. The IV results are significant for the second reform
phase, but for this phase, the authors lack a clean shock, because the other public funds could choose
which companies to invest in.
Guner, Malmendier and Tate (2008)
Guner, Malmendier, and Tate (2008) examine the effect of bank-affiliated directors on the
sensitivity of investment to cash flow over 1988-2001. Their main design is panel data with firm
fixed effects. They use shock-based IV as a robustness test. The shock is a commercial banking
crisis in the US in the late 70s and early 80s. The instrument for bank-affiliated directors is the total
number of directors hired during 1976-1985. The authors argue that bank failures during the crisis
reduced the availability of bank-affiliated directors. They show that, in contrast, the overall rate of
new director appointments did not change. The authors discuss the only through condition with care.
91
They are aware that their instrument could generate imbalance on other firm characteristics,
specifically board turnover and state that in unreported regressions, their main results are robust to
adding board turnover as a control variable. They do not, however, report on covariate balance. They
report that a placebo test using number of directors hired during 1966-1975 “fails to replicate the
results”, but do not specify how.
We did not select this paper to replicate because the 2SLS coefficient of principal interest, in
instrumented (no. of bank-affiliated authors * cash flow) is only marginally insignificant with industry
fixed effects. The authors do not report 2SLS results with firm fixed effects, likely because the results
were statistically insignificant. Also, the coefficient on instrumented (no. of bank-affiliated authors
* cash flow) is -0.820, almost 10 times the -0.085 coefficient with firm fixed effects. In our
experience, this level of “blowup” of the 2SLS coefficient is a strong warning sign for violation of
the only through condition.
92