Regression Discontinuity Designs Based on

Regression Discontinuity Designs
Based on Population Thresholds:
Pitfalls and Solutions
Andrew C. Eggers – University of Oxford
Ronny Freier – DIW Berlin and FU Berlin
Veronica Grembi – Copenhagen Business School
Tommaso Nannicini – Bocconi University
First Version: October 2013
This version: April 2015
Abstract
In many countries, council size, mayoral salaries, and various other aspects of
municipal policy depend discontinuously on a city’s population. Several papers
have used a regression discontinuity design (RDD) to measure the effects of these
threshold-based policies on political and economic outcomes. Using evidence from
France, Germany, and Italy, we highlight two common pitfalls that arise in exploiting population-based policy changes (confounded treatment and sorting) and
provide guidance for detecting and addressing these pitfalls. Even when these
problems are severe, policies based on population thresholds offer an appealing
opportunity to study the effects of policies and political institutions.
Thanks to Janne Tukiainen for providing data.
I. Introduction
Researchers attempting to estimate the effects of political institutions (or more generally,
policies) face serious endogeneity problems: it is usually impossible to run an experiment in
which consequential policies are randomized, and in most observational data it is difficult to
locate or construct valid counterfactuals given the various strategic and contextual factors
that affect policy choices. In recent years, many researchers have attempted to address these
problems by exploiting cases in which policies at the subnational (usually municipal) level depend discontinuously on population thresholds. By studying systems where important policies
(e.g. the salary of mayors, the electoral system used in local governments, the extent of fiscal
transfers) are imposed on municipalities according to arbitrary population cutoffs rather than
chosen by the municipalities themselves, researchers can sidestep the central challenge of measuring the effects of policies, which is that political units that adopt different policies typically
differ not just in the policy of interest but also in other ways that may be difficult to control
for in a regression. In applying the regression discontinuity design to policies determined by a
population threshold, researchers both in economics and political science attempt to isolate a
comparison where cities that are identical in expectation nevertheless adopt a different policy
(examples include Pettersson-Lidbom 2006, 2012; Fujiwara 2011; Hopkins 2011; Gagliarducci
and Nannicini 2013). We survey existing studies in this vein in table 4 in the appendix and
provide a detailed discussion of the literature in Section II.
Although population-based policies offer an unusual chance to study the effects of policies
and political institutions, the opportunity comes with two pitfalls that (based on our study of
France, Germany, and Italy) are likely to plague many such studies. The first pitfall is that the
same population threshold is often used to determine multiple policies, which makes it difficult
to interpret the results of RDD estimation as measuring the effect of any one particular policy.
We show the extent of confounded treatment in the three countries we study, emphasizing that
extensive institutional background research is necessary before one can interpret the results
of a population threshold-based RDD as evidence of the effect of a particular policy. We
also highlight several approaches to estimating the effects of individual policies in settings
with various kinds of confounded treatment, all of which rely on assumptions of additive
1
separability.
The second pitfall is sorting, i.e. the tendency of municipalities to strategically manipulate their official population (whether through recruiting/repelling residents or overcounting/undercounting residents) in order to fall on the desired side of a consequential population
threshold. Again focusing on France, Germany, and Italy, we show conclusive evidence of
sorting and highlight the ways in which this sorting could invalidate the identifying assumptions of the RDD. We emphasize that even when extensive strategic sorting is taking place, an
RDD approach based on population thresholds may still be an attractive approach to studying
policy effects; after all, the alternative is to study settings where all political units (not just
those that are able to manipulate their population figures) choose which policies to adopt.
In settings with strategic sorting, it is important to identify and measure confounding factors
that help explain the sorting and may be related to potential outcomes. We carry out such
analysis in Italy, highlighting some of the challenges of pooling data from several censuses and
thresholds in a setting with strategic sorting.
Although our focus is on recent decades in France, Germany, and Italy, we emphasize
that the use of population thresholds to assign local policies extends much more broadly in
both space and time. The existing literature has evaluated population-based policies in twelve
countries on four continents, including the U.S., Spain, Brazil, Morocco, India and Japan;
our own casual search quickly yielded examples of similar policies in several other countries
where to our knowledge no such study has been carried out (e.g., the U.K., Belgium, Austria,
Norway, Poland, Slovakia and Mongolia). While existing work overwhelmingly focuses on
recent decades, the use of population thresholds in fact has a long history. The first law
on municipal government in revolutionary France (passed in December of 1789) includes six
provisions dictating features of municipal government as a function of population, including a
rule specifying six population thresholds determining the council size.1 A law that reformed
the constitutional rights of municipalities in Prussia as early as 1808 created categories of
municipalities based on size (with cutoffs of 3,500 and 10,000 inhabitants) which were used to
1
“Loi relative `
a l’organisation des communes du royaume de France” of 14 December 1789. Other provisions
specify population thresholds that determine the number of constituent assemblies, the number of citizens
necessary to call a new general assembly, whether there would be a substitute “procureur”, the number of
municipal districts, and transparency rules for municipal budgets.
2
assign different rules on council size, voting rights, and budget transparency (among others)2
In Italy, statutory rules with population cutoffs applying to council size, executive committees
and voting rights can be dated to at least the time of the Italian unification in 1865.3 As we
document in the next section, many studies have already appeared that make use of population
thresholds, but given how many such policies exist (and for how long they have existed), we
expect many more such studies to appear in coming years. We hope this paper helps future
researchers to make use of this opportunity.
II. Literature review
The use of regression discontinuity designs based on population thresholds was first suggested
and used by Pettersson-Lidbom (2006, 2012). He evaluated the effect of an increase in council
size, changing deterministically at certain population thresholds, on overall expenditures in
Finnish and Swedish municipalities. Following his methodological innovation, researchers have
used these types of regression discontinuity designs to study a multitude of issues; we survey
twenty-eight papers in Table 4 in the appendix. On the whole, this literature can be divided
into two categories, based on the type of policy whose effects are being studied: political
institutions (within which council size, mayor salary, and electoral institutions have received
the most attention) and fiscal resources.
Within the category of political institutions, several papers have followed PetterssonLidbom (2006)’s lead by studying the effects of changing the size of the municipal council
(motivated ultimately by earlier work such as Weingast, Shepsle and Johnsen (1981)): Egger
and Koethenbuerger (2010); Koethenbuerger (2012) study the effects of council size in Germany; focusing on urban land development as an outcome as well as local spending, Hirota
and Yunoue (2012) and Hirota and Yunoue (2013) also evaluate changes council size for the
case of Japan. Other papers have used the same design to study the effects of changing the
salary of the mayor or other elected officials. Gagliarducci and Nannicini (2013); Ferraz and
Finan (2009); van der Linde et al. (2014) study how wage increases affect the performance
2
The definition of city types appears in Article 10 of “Preußsche St¨adteordnung von 1808”; policies pertaining
to the types appear in Articles 70, 74, 142, 152, 153 and 183.
3
A law named Legge Lanza from 1865 (which was drafted after the Royal Decree n.3702 from 1859) regulated
the organization of local administrations for the former Kingdoms Piedmont and Sardinia.
3
of politicians; De Benedetto and De Paola (2014) examines the effect of wages on candidate
quality. Baques and Campa (2012); Casas-Arce and Saiz (2015) study the electoral and fiscal
effects of a gender quota on municipal councils.
Several papers have examined the effects of elements of the electoral system that vary
with population thresholds. Fujiwara (2011) shows the validity of Duvenger’s law in a setting
where the electoral rule (single round versus runoff system) depends directly on a population
threshold, while Barone and De Blasio (2013) study the relationship of such electoral rules with
voter turnout. Bordignon, Nannicini and Tabellini (2013) highlight that a single-ballot versus
dual-ballot system is of significant influence for the presence of extreme parties. Hopkins (2011)
studies the provision of ballot sheets in Spanish on voting outcomes. Gulino (2014) studies
the effects of changes in electoral rules on the re-election probabilities of incumbent mayors.
The effect of majoritarian versus proportional systems have been evaluated by Eggers (2015)
for turnout and by Pellicer and Wegner (2013) for the success of clientelistic parties.
Several recent papers also examine the effect of direct democracy using population thresholds
that affect the nature of representation or the difficulty of initiating referendums. Hinnerich
and Pettersson-Lidbom (2014) studies the impact of representative democracy versus direct
democracy on government spending. Arnold and Freier (2015) substantiate the relationship
between signature requirements and the number of popular initiatives. Asatryan et al. (2013);
Asatryan, Baskaran and Heinemann (2014) study the effects of direct democracy on spending
and taxes.
Within the category of fiscal resources, researchers have studied the effect of changes in fiscal
transfers on local elections (Litschig and Morrison (2010)), education and poverty (Litschig
and Morrison (2013)), corruption and the quality of politicians (Brollo and Nannicini (2012)),
and school enrollment (Mukherjee (2011)). Baskaran (2012) studies the flypaper effect (i.e.
the tendency of transfers and tax revenues to have different effects on fiscal policy).
An analysis of particular interest for our paper is the work by Lyytik¨ainen and Tukiainen
(2013). They study changes in turnout related to the increase in the probability of being
pivotal (due to changes in council size at population thresholds) in Finland. Importantly, the
4
paper identifies two treatments that are triggered by the same changes in council size.4 The
paper is, thus, directly confronted with the issue of confounded treatments, an issue which
also we highlight in our analysis. Note that also the papers by Arnold and Freier (2015) and
Eggers (2015) recognize issues of confounded treatments and provide extended robustness tests
respectively to deal with those problems.
Another related strand of literature discusses sorting around thresholds and the particular
consequences to RDD applications. The assumption of continuity (see Imbens and Lemieux
(2008); Lee and Lemieux (2009)) has been criticized and discussed by Urquiola and Verhoogen
(2009) for class size, Barreca et al. (2011) for birth weight or Caughey and Sekhon (2011) and
Eggers et al. (2015) for close elections. A formal test for sorting in RDD settings has been
developed by McCrary (2008). The paper that is most related to our work in this strand of
research is the study by Litschig (2012), which looks at the manipulation of population figures
in Brazil.
III. Multiple treatments
A. Documenting the extent of confounded treatment
The appeal of studying a setting in which a policy depends on a population threshold is the
fundamental comparability of political units above and below the threshold: if cities above a
threshold experience policy A while those below the threshold experience policy A0 , and the
two groups of cities are comparable in all other respects, then we should be able to use an
RDD to compare the two sets of cities and thus learn about the effect of applying policy A
as opposed to policy A0 . This research design is attractive because typically political units
that choose policy A differ in numerous other ways from political units that choose policy A0 ,
which makes it difficult in the typical observational study to disentangle the effect of applying
policy A (as opposed to A0 ) from the effect of other differences between cities that choose A
and those that choose A0 .
4
Lyytik¨
ainen and Tukiainen (2013) show that changes in council size can effect turnout directly, as more
council members are elected and the seat allocation rule makes a single vote more probable to be pivotal.
At the same time, council size also determines the length of the party lists in the election. The number of
politicians running for office is, thus, also directly influenced by council size, which can have an independent
effect on turnout. The authors solve this identification problem by using an instrumental variable strategy that
disentangle the effects from multiple council size thresholds.
5
The first common problem we highlight in this paper is that often the population threshold
that determines whether policy A or A0 is applied will also determine whether other policies (B
or B 0 , C or C 0 ) are applied; the policy change we hope to study (A vs A0 ) is thus confounded
with other policy changes, undermining the appeal of the RDD. Figure 1 summarizes the
problem in France, Italy, and German states. Each dot indicates a population threshold at
which at least one policy changes; solid dots indicate that more than one policy changes at
the same threshold. In every case (i.e. France, Italy and every German state) there are some
thresholds where just one policy changes, but such thresholds are in the minority.
Figure 1: Population thresholds at which municipal policies change: France, Italy, and German
states
●
France
One policy change
●
●
●
Italy
●
●
●
●
SH
●
●
●
●
NRW
●
●
Hes
●
●
NiedS
●
RP
●
BW
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●●
●
●
250 500
●
●
●
●
●
●
●
2500
●
● ● ● ●
●
●
10000
●
●●
●
●
●
●
●
●
●
●
5 or more policy changes
●
●
100
●
●
●
●
●
●
●
●
Saan
●
●
●
MeckP
Sachsen
●
●
Saar
Brand
2−4 policy changes
●
● ●
Bay
Thuer
●
●
●
●
●
●
●
50000
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
● ● ●
●
●●
●
●
●
●
250000
Population
Note: Each dot indicates a population threshold at which a policy changes; solid dots indicate more than one
policy changing at the same threshold.
6
Table 1 provides detail on the various policies that change at population thresholds up to
50,000 in France; the appendix provides details about population-dependent policies in Italy
and Germany (see tables 5 and 6). As Table 1 indicates, at every threshold at which council size
increases, the maximum number of deputy mayors also increases, which makes it impossible
to disentangle the effect of council size from the effect of additional paid council staff. There
is only one threshold (1,000 inhabitants) at which the salary of mayors and deputy mayors
increases without the council size also increasing. Many of the most interesting policies change
at a single threshold of 3,500 inhabitants at which several other policies (including council size
and mayor’s wage) also change: the electoral rule used to elect the council, the requirement of
gender parity in party electoral lists, and the requirement that the council debate the budget
before adopting it. In total, we identified 19 different policy issues that depend on at least one of
the 14 thresholds in France. Correspondingly, we have 10 policy issues at 12 thresholds in Italy
and an astounding 65 policy issues at 51 thresholds in Germany.5 In all three countries that
we examine, our ability to study the effects of policies that depend on population thresholds
is constrained by the tendency for multiple policies to depend on the same threshold.
B. Practical challenges in detecting confounded treatment
Detecting whether a given treatment is confounded with another treatment can simply be a
matter of scouring the legal code for policies that explicitly depend on population thresholds.
To the extent that researchers study historical time periods this institutional research is further
complicated by the constraints to identify all relevant changes over time (here especially with
policies that where only temporarily enacted6 ).
Apart from searching the past and current laws, the issue of identifying all relevant policy
issues can be more subtle for a number of reasons. First, it may be the case that further
policies ,e.g. size of committees, legally depends on the size of the council, which means that
5
The German case is particular as the municipal level is under control of state legislation. This implies that
the rules are state-specific. No state uses fewer than 14 different policies and, which the exception of NorthRhine-Westphalia, every state has policy topics that are used uniquely in that state. We provide detailed
descriptions of the content of the policies and the respective thresholds in the tables ??-?? in the online
appendix.
6
For Germany, e.g., we identified a number of policies that related to population thresholds that were used
in one-shot boundary reforms that determined the size of merged municipalities.
7
Table 1: Population thresholds in French municipalities 1
Policy changes at k inhabitants (in tsd)
Council size
0.1
0.5
x
x
Salary of mayor and deputy mayors
x
x
Max. number of non-resident councilors
x
x
1.5
2
x
x
Max. number of deputy mayors
1
2.5
3
x
x
5
x
x
9
x
x
Must have a cemetery
3.5
x
x
x
10
20
30
50
x
x
x
x
x
x
x
x
x
x
x
x
Prohibition on commercial water supply
x
Campaign leaflets subsidized
x
Council must approve property sales
x
Electoral system – PR or plurality
x
Gender parity
x
Outsourcing scrutiny
x
Council must debate budget prior to vote
x
Committees follow PR principle
x
Amount of paid leave for council work
x
Commission on accessibility
x
x
x
Max. electoral expenditure
x
Outsourcing commission
x
Max. municipal tax on salaries
x
Debt limit
x
x
Note: The table identifies population thresholds (in thousands) at which given policies change. This is a
partial list of policies, chosen to highlight the variety of policies that depend on population thresholds and the
extent to which the same threshold often determines multiple policies. Source: French legal code.
searching for population cutoffs would not give a complete picture of policies that change at the
same population thresholds as council size. An example of this type of second-order policies
which can be equally important is given in Lyytik¨ainen and Tukiainen (2013). Second, when
there are multiple tiers of government, it may not suffice to understand the regulation of the
local municipal code. Also legislation from other tiers and different governmental organization
may steer into the effects at certain thresholds.7 Third, even policies that do not depend on
7
In Germany, e.g., multiple policies at the state and federal level implement discontinuous changes of policies
at population thresholds (see election laws for state and federal elections). Also, other federal organizations
may use population thresholds to discriminate political units. Again in Germany, the federal statistical office
8
population thresholds directly may have an effect through interactions with population-based
rules. An example of this is used by Baskaran and Lopes da Fonseca (2014), who study vote
share hurdles (which do not depend on population thresholds) in German municipalities and
utilize the fact that those rules are ineffective when the council size (which depends on such
thresholds) are too small.8 To summarize, a researcher should know a setting intimately before
concluding that a given policy (and not other policies) changes at a given population threshold.
C. Addressing confounded treatments
What can a researcher do when a policy of interest changes at a threshold where other policies
change? We highlight a few options.
One viable option is to broaden the focus to include other policies that change in tandem.
If two policies move in perfect lockstep, then what initially may seem like an opportunity to
learn about the effect of policy A vs A0 is at best an opportunity to learn about the effect of
policy combination AB vs A0 B 0 . In some cases it is worth studying the effect of this bundle of
policies, however. In France, for example, changes in council size always coincide with changes
in the number of deputy mayors. The perfect confounding of these two policies means that it
is impossible in this setting to separate the effect of the two treatments, but as long as this pair
of policies does not perfectly coincide with other policy changes then one could use RDD to
learn about the effect of that pair. The downside, of course, is that as one combines additional
policy changes it becomes more difficult to motivate and interpret the analysis beyond the
immediate setting being considered.
Grembi, Nannicini and Troiano (2012) combine the advantages of regression discontinuity
design and difference and difference in a new methodology they call “difference-in-discontinuity”.
Suppose that at a given point in time there is a population threshold at which policy B changes
to policy B 0 ; after a reform, the same threshold produces a change from policy AB to policy
implemented the last census in 2011 using two systems of population accounting for towns above and below
10,000 inhabitants respectively which resulted in very different population adjustments above and below the
threshold.
8
A small example shall highlight the problem. A 5% hurdle for small parties has no bit when there are only
10 seats in a council (and a party effectively needs around 10% to win a seat). When council size increases
from 16 to 20 seats at 5,000 inhabitants the regime change compares a system with an ineffective vote share
hurdle to one in which the hurdle poses an actual constraint.
9
A0 B 0 . The basic idea of the difference-in-discontinuity approach is to apply RDD both before and after the reform; under the assumption that the effects of A vs. A0 and B vs. B 0
are additively separable, the difference between these estimates yields a consistent estimate of
the effect of A vs. A0 . The Italian example in the paper by Grembi, Nannicini and Troiano
(2012) is a new policy that relaxed existing fiscal constraints for towns above 5,000 inhabitants
in 2001. As the threshold is used for multiple other treatments throughout the entire time
period, a standard RD approach is not suitable, but Grembi, Nannicini and Troiano (2012)
use the difference-in-discontinuity approach (comparing the period before and after the policy
change) to estimate the effect of relaxing the fiscal constraints.
A similar approach can be used in a single cross-section in cases where a treatment of
interest coincides with other “nuisance” treatments, but the nuisance treatments also change
at other thresholds. Suppose there is a threshold ab where policy A changes to A0 and policy
B changes to B 0 ; at another threshold b, policy B 0 changes to B 00 . Under the assumptions that
(i) the effect of the B 0 → B 00 change on a given outcome is the same as the B → B 0 change,
and (ii) the effect of the A → A0 change does not depend on policy B vs. B 0 , the effect of A vs.
A0 can be estimated by subtracting the RDD estimate at threshold b from the RDD estimate
at threshold ab. Arnold and Freier (2015) and Eggers (2015) provide evidence in this spirit.
Eggers (2015) evaluates the effect of proportionality and turnout in French municipalities,
using an electoral system change at the 3,500-inhabitant threshold where the council size and
the salary of mayors also change. To make the case that the effects observed at this threshold
are due to the electoral system and not the other treatments, Eggers (2015) shows that the
outcome does not respond in the same way at other thresholds where the other treatments
change. Arnold and Freier (2015) use a similar test to compare small and big thresholds where
the policy of interest (the number of signatures necessary for local initiatives) changes only at
big thresholds while other confounding policies also change at small cutoffs. Extending this
approach, researchers could make use of a setting in which it is not at different thresholds but
in different regions (such as German states) that the policy space differs; that is, if a given
policy coincides with a nuisance policy in one region but the nuisance policy changes on its own
in another region, then (under the assumption that the nuisance policy’s effects are additively
10
separable and comparable across regions) one could obtain consistent estimates by taking the
difference in the estimates across regions.
Extending the same logic even further, one can also consider disentangling treatment effects
through the careful use of alternative outcomes. It is perhaps easiest to explain this approach
using the example of Arnold and Freier (2015), who study how the rate of voter-initiated
referendums responds to changes in the number of signatures required to initiate a referendum.
Having found an effect at thresholds where signature requirements change, they argue that this
effect is not due to other policies that change at the same thresholds by showing that there
is no change at these same thresholds in the rate of council-initiated referendums, which do
not require any signatures to be collected. The argument, therefore, is that because the
bundle of policies does not seem to affect an outcome that we would expect to be affected
by the nuisance treatments, those nuisance treatments likely do not affect the main outcome
of interest either.9 More generally, one can imagine carrying out a difference-in-discontinuity
design at a single threshold using multiple outcomes, where the differences are taken across the
different outcomes: the key “parallel trends” assumption is that the effect of the full bundle of
policies (A0 B 0 vs. AB) on outcome z provides a valid estimate of the effect of a reduced bundle
of policies (B 0 vs B) on outcome y; thus under the additional assumption that the effects are
additively separable, the difference between the two effects provides a valid estimate of the
effect of A0 vs. A on outcome y. These are strong assumptions but they may be appropriate
in some settings.
Finally, when the treatment effects relate to several thresholds, researchers may also consider identifying the effects using variation in treatment intensity along with functional form
assumptions. Consider the example of council size, which in many systems jumps by different
amounts at different thresholds, often along with changes in mayor salary or other policies.
In this situation it may be possible to use variation in treatment intensity (i.e. the number of
seats added at each threshold) to identify the effect of a marginal council member. The crucial
assumption is that the effect of additional seats is linear in (or otherwise predictably related
to) the size of the jump, and (again) that this effect is additively separable from other effects
9
One could argue, of course, that in equilibrium a change in the cost of voter-initiated referendums would
change the rate council-initiated referendums.
11
that occur at the same thresholds. Part of the identification in Lyytik¨ainen and Tukiainen
(2013) is based on utilizing the different steps in the council size changes to separate the pure
council size policy from increased party lists in the turnout equation.
IV. Sorting
As mentioned above, the appeal of a population threshold-based RDD is partly that the
political unit does not choose the policy, which suggests that (controlling for population) units
above and below the threshold should be comparable in all respects other than the policy of
interest. As is well known, such an RDD (like any RDD) is less appealing when the units can
influence the variable that determines treatment assignment (i.e. population). At an extreme,
one could imagine that cities near a population threshold could perfectly control whether they
end up above or below the threshold, and thus cities that have policy A differ from cities that
have policy A0 not just in that policy but also in a whole host of background characteristics
that affected their decision to get policy A or policy A0 .10 In such a situation, carrying out an
RDD may be little better than a typical observational study in which political units choose
their policies. (Below, we compare the attractiveness of an RDD marred by sorting with that
of a typical observational study.)
The problems of strategic sorting in RDD applications are well known (see Imbens and
Lemieux (2008); Lee and Lemieux (2009); Urquiola and Verhoogen (2009); Barreca et al.
(2011)). Strategic sorting in population figures has been documented in the work by Litschig
2012 for Brazil and it has been briefly mentioned in the paper by Gagliarducci and Nannicini
2013. One of our contributions here is to provide evidence that sorting in RDD studies based
on population thresholds is an issue in all three countries that we study. We also demonstrate
techniques for diagnosing and explaining manipulation in the following two sections as well as
in a technical appendix.
10
Alternatively, it may be that only certain cities are able to control whether they end up above or below
the threshold, in which case cities that have policy A may differ from cities that have policy A0 not only in the
factors that affect their policy preferences but also in the factors that affect their ability to manipulate their
population figures.
12
Figure 2: Sorting in municipal population in France, 1962-2011 pooled
Estimated density, all thresholds pooled
0.0015
Threshold
1,500+
0
100
0.0020
Estimated density
300
Threshold =
500, 1,000
200
Count
400
500
Threshold =
100
0.0025
0.0030
600
Histograms (bin width = 1)
−200
−100
0
100
200
−200
Distance from threshold in population
−100
0
100
200
Distance from threshold in population
Note: The left plot depicts three histograms, one for each group of thresholds (100; 500 & 1,000; 1,500 and
larger). In each case the bin width is 1, meaning that the top of the line indicates the number of data points
(municipality-years) with population that is exactly a given amount (e.g. 50 inhabitants) from the threshold.
The right plot depicts the McCrary analysis for all cases pooled.
A. Aggregate graphical evidence
The basic pattern of sorting is documented in Figures 2 (France), 3 (Italy), and 4 (Germany).
Because the figures use the same format and reflect the same analysis, we explain the French
case in detail and subsequently note only the relevant differences between the French case and
the others.
In France, we have population data for eights censuses between 1962 and 2011.11 For each
census, we calculate the difference in population between each city and each major population
threshold (i.e. one affecting a policy listed in Table 1) that was in force at the time of the
census; we store all municipality-years in which a city’s population was within 250 inhabitants
11
The census years are 1962, 1968, 1975, 1982, 1990, 1999, 2006, and 2011. After 1999, France introduced a
new census system that produces annual population estimates for all municipalities; the 2006 census was the
first such census.
13
of a threshold. In the left panel of Figure 2 we plot three histograms of these population
differences, one for each group of relevant population thresholds (100; 500 or 1,000; and 1,500
and larger). Because there are so many municipality-years, we plot histograms with bin widths
of 1, meaning that Figure 2 indicates the number of cases in which a municipality was exactly
e.g. 25 inhabitants below a threshold. The key evidence of sorting is given by the jumps in
each histogram at 0. For example, based on the histogram for the 100-inhabitant threshold,
we can see that there were just under 500 cases in which a city was two or three inhabitants
short of the 100-inhabitant threshold at which the council size increases, but there were just
under 600 cases in which a city narrowly cleared that hurdle. The jump is if anything more
striking for the 500- and 1,000-inhabitant thresholds (where the mayor’s salary increases), with
about 300 cases of a municipality recording exactly 500 people and only about 150 cases of a
municipality recording exactly 499.
In the right panel of Figure 2 we depict the McCrary test for all thresholds pooled. This
procedure estimates the density of the running variable (i.e. absolute distance in inhabitants
to a population threshold) separately on the left and right of the threshold and tests for a
jump or drop in the density at the threshold. Not surprisingly (given the histograms in the
left panel), the McCrary test indicates a large jump in the estimated density at the threshold.
Note that the shape of the estimated density reflects the aggregation of data from various
thresholds, including the 100-inhabitant threshold around which the density of municipalities
is the largest. In the technical appendix, we explain in detail what issues researchers can
face when pooling different thresholds, why we use absolute (and not relative) deviation as a
score variable and further issues in correctly implementing the McCrary test when the forcing
variable is discrete (see Technical Appendix).
Figure 3 indicates an even more striking pattern for Italy. Based on the five decennial
censuses from 1961 to 2001, we find about 90 cases in which a city cleared the 1,000 or 3,000
population threshold (at which the mayor’s wage increases, among other changes) by fewer
than 10 inhabitants, but we find only about 20 cases in which a city fell short by fewer than 10
inhabitants; in over 300 cases a city cleared one of the thresholds by fewer than 30 inhabitants,
but in fewer than 100 cases did a city fall short by fewer than 30. The pattern of sorting is
14
Figure 3: Sorting in municipal population in Italy, 1961-2001 pooled
Estimated density, all thresholds pooled
0.0020
Estimated density
60
80
Small thresholds
(1000, 2000, 3000)
40
Count
0.0030
100
120
Histograms (bin width = 10)
0.0010
20
Medium thresholds
(4000, 5000)
0
Large thresholds
(> 5000)
−200
−100
0
100
200
−200
Distance from threshold in population
−100
0
100
200
Distance from threshold in population
Note: In the left plot the bin width is 10, meaning that the top of the line indicates the number of data points
(municipality-years) with population that is in a given interval (e.g. 40-49 inhabitants) from the threshold.
Otherwise see notes to Figure 2.
just as clear (if not as dramatic) at larger thresholds. Again, the McCrary test aggregating all
thresholds (right panel) indicates a very large jump in the estimated density at the threshold.
In the aggregate, there is not much graphical evidence of sorting in German municipalities
(Figure 4). Here we have annual administrative data from 1998-2007 for municipalities from
all German states, and our analysis is based on a comparison of each municipality’s population
to all thresholds in force in that municipality’s state. Neither the histogram nor the McCrary
plot shows much evidence of a jump in the density above the threshold when we aggregate all
thresholds and all years. As we will see in the next section, there is nonetheless significant
evidence of sorting when we focus on thresholds where more substantial policies are determined.
15
Figure 4: Sorting in municipal population in German states, 1998-2007 pooled
1400
Histograms (bin width = 10)
Estimated density, all thresholds pooled
200
0.0020
0
0.0016
Large thresholds
(5000+)
0.0018
Estimated density
600
800
Medium thresholds
(1000−4000)
400
Count
1000
0.0022
1200
0.0024
Small thresholds
(< 1000)
−200
−100
0
100
200
−200
Distance from threshold in population
−100
0
100
200
Distance from threshold in population
Note: In the left plot the bin width is 10, meaning that the top of the line indicates the number of data points
(municipality-years) with population that is in a given interval (e.g. 40-49 inhabitants) from the threshold.
Otherwise see notes to Figure 2.
B. Formal tests at different types of thresholds
We now carry out the McCrary (2008) test for different types of thresholds within countries,
still pooling population figures from the various censuses we have collected. Table 2 reports
the results. In the top row we assess evidence of sorting in all thresholds, reporting the
point estimate and standard error (along with the number of thresholds and observations,12
as explained in the table note) for each test. This test corresponds to the hypothesis that the
discontinuity in the right panel of Figures 2, 3, and 4 is zero. As seen in the previous figures, we
see very clear evidence of substantial sorting in France and Italy (with the latter being quite a
bit larger); the test also indicates small, yet significant evidence of sorting in Germany. In the
other rows we assess sorting at particular types of thresholds, e.g. thresholds where the salary
12
We count only thresholds for which we observe cities within 250 inhabitants of the threshold, which explains
why some of the counts differ from the analysis above.
16
of the mayor increases, or thresholds where the council size increases, or thresholds where both
increase. In France, we find significant sorting at all types of thresholds, with the smallest
effect (and weakest evidence against the null) at thresholds where council size increases (but
not mayor’s wage) and thresholds at 3,500 inhabitants or higher. In Italy the estimated effects
are much larger, with (as in France) smaller effects at larger population thresholds. To give a
sense of magnitude, a McCrary effect size of 1.3 implies that the density on the right of the
average threshold is over 3.5 times larger than on the left.13 In Germany the effects are much
smaller in magnitude and only statistically significant at some thresholds. Notably, the largest
effects are generally found when both council size and salary of the political personnel are
changing at the same time. Even for Germany, where the overall effects are small, the sorting
effect increases to 0.15, which is no longer negligible.
Comparing the effects by threshold size shows larger effects for smaller thresholds in Italy
and France. This is consistent with the idea that population size is more easily manipulated in
smaller towns. Intriguingly, in Germany the pattern is reversed, with somewhat larger effects
at larger thresholds, which may be partly explained by the fact that the salary of mayors in
Germany often only increases at larger population thresholds.
At the bottom of Table 2 we conduct the McCrary tests at thresholds at which no policy
changes as far as we are aware. We generated placebo thresholds by taking the midpoint
between each actual threshold in each setting (e.g. in France the smallest placebo threshold is
300, which is halfway between 100 and 500) and adding an arbitrary number (117 was picked).
In none of the countries do we find discontinuities in the density at these placebo thresholds.
13
log(d1 ) − log(d0 ) = 1.3 implies d1 /d0 = exp(1.3) ≈ 3.7.
17
Table 2: Summary of McCrary sorting tests
Sample
France
Italy
Germany
# of thresholds
(# of close obs)
McCrary
Test statistic
# of thresholds
(# of close obs)
McCrary
Test statistic
# of thresholds
(# of close obs)
McCrary
Test statistic
21
(311,392)
0.238***
(0.014)
7
(4,756)
1.328***
(0.136)
195
(101,520)
0.068***
(0.025)
14
(140,421)
7
(35,329)
15
(267,558)
12
(162,466)
21
(302,887)
7
(105,092)
0.497***
(0.026)
0.533***
(0.049)
0.215***
(0.015)
0.139***
(0.018)
0.240***
(0.014)
0.475***
(0.029)
6
(4,730)
3
(2,125)
3
(2,605)
0
(0)
6
(4,730)
3
(2,605)
1.331***
(0.134)
0.840***
(0.211)
1.909***
(0.197)
n.a.
n.a.
1.331***
(0.134)
1.909***
(0.197)
78
(11,579)
21
(447)
120
(81,669)
63
(70,537)
141
(82,116)
57
(11,132)
0.135***
(0.061)
0.001
(0.321)
0.071***
(0.026)
0.063***
(0.029)
0.072***
(0.027)
0.149***
(0.063)
7
(306,520)
14
(4,872)
0.237***
(0.014)
0.239*
(0.122)
2
(3,295)
5
(1,461)
1.644***
(0.178)
0.700***
(0.247)
61
(93,873)
134
(7,647)
0.054***
(0.026)
0.216***
(0.077)
20
(215,986)
0.008
(0.018)
9
(2,800)
-0.044
(0.133)
186
(85,326)
-0.009
(0.025)
Total
All years, all thresholds
Specific thresholds
Salary increase
Salary increase (no council)
Council increase
18
Council increase (no salary)
Council and/or salary increase
Council and salary increase
Threshold size
Small thresholds (<3500)
Big thresholds (>=3500)
Placebo thresholds
Notes For each test, we report four numbers: the number of unique population thresholds (e.g. 14 in the first test for France) at which we observe
municipalities with populations within 250 inhabitants of the threshold; the number of observations within 250 inhabitants of these thresholds (e.g.
273,274); the estimated difference log frequency above vs. below the threshold (e.g. 0.256) and the standard error of that estimate (0.015).
V. Sorting and covariate imbalance: the case of Italy
As McCrary (2008) pointed out, a discontinuity in the density of the running variable at the
relevant threshold is neither a necessary condition nor a sufficient condition to create bias in
RDD analysis. The key assumption of RDD, after all, is the continuity of potential outcomes
at the threshold. This assumption may hold even in the presence of substantial sorting (if e.g.
units’ ability to manipulate their population is random, and the resulting measurement error
in the running variable does not bias the results), and it may not hold even in the absence
of sorting (if e.g. there is manipulation in both directions, such that the density remains
continuous). Thus while discontinuities of the kind we documented in the previous section
should certainly raise red flags about the validity of the RDD, the most direct test of the
key RDD assumption is to check for discontinuities in pre-treatment covariates. Locating
imbalanced covariates is also important because these covariates can be included in the RDD
to guard against bias caused by strategic sorting, as we document below. These covariates
may also help us to explain how manipulation is taking place, which could be viewed as an
interesting research question in its own right. We focus on Italy mainly because it showed by
far the largest extent of aggregate sorting.
Figure 5 shows RDD analysis in which the dependent variable is the lagged treatment,
meaning an indicator for whether a municipality was above a given threshold in the previous
census, given that it was close to that threshold in the current census. The top two plots show
this analysis for thresholds at which the salary of the mayor increases. The top left panel
shows that the probability of lagged treatment increases with the current running variable
(as one might expect) but jumps at the threshold, indicating that cities that are narrowly
above the threshold now are more that 10 percentage points likely to have been above the
threshold in the previous census. The top right panel shows how the estimated jump varies
with the (triangular) bandwidth employed for the local linear regression; the black dot shows
the bandwidth suggested by the Imbens-Kalyanaraman algorithm (and employed in the figure
at left). The analysis thus indicates that, for this covariate at least, differences between
treated and untreated cities do not disappear when we focus on the threshold. Although
random variation in the population from one census to the next should produce continuity in
19
this covariate across the threshold, we do not find this in the Italian case.
Note that this persistence in treatment status could simply be a result of population figures
being very sticky from census to the next: we would expect to see differences in whether cities
were above or below any arbitrary population number if a substantial proportion of cities
simply did not change in population from one census to the next. The bottom two plots of
Figure 5 indicate that the imbalance in lagged treatment status we see at thresholds where
salary changes is not simply due to stickiness. In those plots we carry out the same analysis
where the thresholds we consider are “placebos” at which neither salary nor any other policy
changes, and we find no evidence of similar persistence around these thresholds.
Table 3 takes this analysis further. Each of the point estimates in this table is an RDDbased estimate of the effect of crossing population thresholds on the covariate given at left; for
example, the estimated effect of crossing thresholds where salary changes on lagged treatment
(examined graphically in Figure 5 is reported as 0.138 with a standard error of .037. Columns
2 to 5 carry out the same analysis but include covariates in the RDD analysis: in column
2 we include dummies for each year of the census; in column 3 we include dummies for each
threshold; in column 4 we include both; and in column 5 we include a set of covariates describing
Italian municipalities around the year 2002.14 Columns 6 to 8 show the models from columns
1, 3, and 5 but focusing on “placebo” thresholds where no policy changes. It should be clear
that there is no reason why crossing a placebo threshold should affect the probability that a
city is in the South or any of the other outcomes we consider in Figure 5; to the extent that
these variables describe background features of the municipalities rather than outcomes that
could be affected by the mayor’s salary or any other policy choices, there is also no reason why
crossing the threshold where salary changes should produce any effects.
We have already seen (in Figure ??) that crossing salary thresholds seems to “affect” the
probability of treatment in the previous census (but that the same is not true for placebo
thresholds). In the top row of Table 3 we see that the imbalance at salary thresholds persists
when we control for the year of the census and the threshold being considered, though the
14
The covariates are the (log) number of nonprofits per person, the proportion of inhabitants who give blood,
the ratio of young to old inhabitants, an indicator for the South, an indicator for whether the municipality is
on the seaside, and the proportion of second homes in the municipality.
20
imbalance mostly disappears when we include municipal covariates in column 5. (The null
effect at placebo thresholds is robust to the inclusion of covariates.) The second row indicates
that we do not find a similar effect for the lagged running variable.
In the third and fourth rows of Table 3 we see evidence of imbalance in whether the
council size changes at the threshold as well as in the year of the census being considered.
To understand these findings, note that our analysis pools cases across thresholds and years;
thus near the threshold we have cases where one city was just below a threshold where salary
changed in 1971 and another city was just above a threshold where both salary and council size
changed in 2001. In an RDD environment without fine manipulation of the running variable,
the distribution of characteristics just above the threshold and just below the threshold should
be identical. How could the observed imbalance arise? In the case of the threshold type (council
size change or not), we think the reason is that sorting is more pronounced at thresholds
where both salary and council size change (as seen above); this means that when we pool cases
that were near different thresholds, we find a greater concentration of those near council size
thresholds above than below. Similarly, the reason why the year of the census is unbalanced
is that sorting was worse in earlier censuses (as we have confirmed in separate analyses).
21
Figure 5: Imbalance in lagged treatment in Italy
BW sensitivity: Lagged treatment
(Salary thresholds)
1506
2783
0.4
0.2
−0.2
0.0
Estimated effect
0.8
0.6
0.4
0.0
−0.4
0.2
−100
0
100
200
0
50
100
150
Bandwidth of local linear regression
RD Plot: Lagged treatment
(Placebo thresholds)
BW sensitivity: Lagged treatment
(Placebo thresholds)
151
Obs:
1000
1817
0.0
0.2
0.0
−0.6
−0.4
0.2
0.4
0.6
Estimated effect
0.8
0.4
0.6
1.0
Distance to threshold
−0.2
Lagged treatment
−200
Lagged treatment
173
Obs:
0.6
0.8
1.0
RD Plot: Lagged treatment
(Salary thresholds)
−200
−100
0
100
200
0
Distance to threshold
50
100
150
Bandwidth of local linear regression
Note: The dependent variable in the RDD analysis above is “lagged treatment” – an indicator for whether a
municipality was above a given threshold in the previous census, given that it was close to that threshold in
the current census. The left panel in each pair shows the dependent variable in binned means of the running
variable (gray dots) and the local linear regression estimate at the Imbens-Kalyanaraman optimal triangular
bandwidth; the right panel shows the sensitivity of the estimated effect to the bandwidth, where the optimal
is shown with a black dot.
22
Table 3: “Effects” of crossing threshold on covariates (Italy)
Obs
[BW]
2592
[132]
(1)
.138∗∗∗
(.037)
Lagged running variable
2522
[128]
80.115
(63.630)
55.194
(63.642)
Both salary and council size change
1590
[81.7]
.268∗∗∗
(.049)
.207∗∗∗
(.046)
Year of census
3056
[158.8]
-7.091∗∗∗
(1.045)
Population at threshold (in 1000s)
1678
[86.4]
.147
(.401)
-.091
(.396)
Log nonprofits/person
1647
[84.5]
.001
(.057)
.011
(.058)
.026
(.058)
.023
(.058)
-.000
(.058)
.004
(.045)
.002
(.045)
-.052
(.045)
Proportion donating blood
1570
[80.5]
-.003
(.002)
-.003
(.002)
-.003
(.002)
-.003
(.002)
-.004∗
(.002)
-.001
(.002)
-.001
(.002)
-.001
(.002)
Log young/old ratio
1815
[93.8]
.045
(.056)
-.089∗
(.040)
-.045
(.047)
-.089∗
(.040)
-.086∗
(.039)
.027
(.053)
-.007
(.045)
-.035
(.034)
South
1777
[91.3]
.037
(.048)
.037
(.048)
.036
(.048)
.033
(.048)
-.023
(.040)
-.012
(.050)
-.010
(.050)
-.031
(.041)
Seaside
2077
[106.9]
-.026
(.023)
-.033
(.023)
-.031
(.023)
-.032
(.023)
-.033
(.024)
-.012
(.020)
-.017
(.020)
-.044∗
(.020)
Proportion vacation homes
1897
[97.8]
.025
(.022)
.043†
(.022)
.033
(.022)
.037†
(.022)
.037†
(.022)
.032
(.022)
.040†
(.021)
.021
(.020)
X
X
X
X
X
X
X
X
X
X
Lagged treatment
23
Year dummies:
Threshold dummies:
Other municipal characteristics:
Thresholds where salary changes
(2)
(3)
(4)
∗∗
∗∗∗
.113
.128
.108∗∗
(.037)
(.037)
(.037)
84.583
(58.183)
63.567
(57.660)
(5)
.041
(.036)
27.862
(53.600)
-3.172∗∗∗
(.889)
“Thresholds” w. no changes
(6)
(7)
(8)
.024
.028
-.055
(.041)
(.041)
(.040)
95.758
(97.363)
124.452
(79.532)
-2.871†
(1.620)
-1.988
(1.386)
1.985
(72.847)
.346
(.532)
X
Notes: Each point estimate comes from a different RDD analysis in which the row variable is the dependent variable and we pool data
from multiple censuses and population thresholds. Model (1) includes no extra control variables; Model (2) includes a dummy for
threshold type (e.g. mayor’s salary, council size, both); Model (3) includes a dummy for each threshold (e.g. 1000, 2000); Model (4)
includes controls for municipality characteristics (e.g. South, blood donations) other than the dependent variable; Model (5) includes
threshold dummies and municipal characteristics.
To reiterate, these two imbalances – in whether the council size changes and in the year of
the census – arise because we are pooling data from multiple thresholds and multiple censuses.
It should be clear that neither of these particular problems would arise if we focused on a single
threshold in a single year (although imbalance in other covariates might). Pooling across
thresholds and/or time is often an attractive strategy in RDD studies based on population
thresholds, and our analysis shows that when sorting is a possibility, covariate imbalances
can arise in pooled data because of differences in the rate of sorting across thresholds and/or
time. Note also that these imbalances could cause serious biases in analysis if e.g. outcomes
vary across different types thresholds or across years. Suppose for example that the outcome
one is studying is trending upward over time, and that sorting at a given threshold is also
getting worse over time. Then pooled RDD analysis at that threshold will show an apparent
positive effect even if the true effect is zero, simply because more recent years (and thus higher
outcomes) will be over-represented on the right side of the threshold. The clear implication is
that studies that pool data across thresholds and years should control for the threshold and
year whenever sorting seems like a possibility.
The rest of Table 3 reports similar analysis for a set of covariates we selected because
we thought they might explain the aggregate sorting in Italy. For example, we speculated
that measures of social capital might be imbalanced at thresholds because places with lower
social capital manage to manipulate census numbers (perhaps through active housing policy)
more effectively; we therefore tested for imbalance in the number of nonprofit organizations
per person and the rate of blood donations. We find no imbalance in these covariates in
the raw RDD; when we control for other municipal characteristics we do (perhaps oddly)
find some imbalance. We do find some imbalance in the proportion of young to old citizens
when we control from the year of the census: the analysis indicates that cities just to the
right of a threshold currently have a somewhat older population than cities just to the left of
the threshold; this difference is robust to the additional inclusion of threshold dummies and
municipal covariates. We find no imbalance in an indicator for whether the city is in the south
of Italy. Finally, we checked for imbalances in an indicator for whether the city is located by
the sea and in the proportion of second homes (partly because we thought that one way to
24
increase the official number of residents would be to encourage some part-time residents to
be registered in the municipality); we find no imbalances in the seaside proportion, but once
we control for the year we find borderline significant (p < .1) imbalance in the proportion of
second homes, consistent with our speculation.15
What can we conclude from this evidence on covariate imbalance at salary thresholds in
Italy? The optimistic conclusion is that researchers can productively conduct RDD in Italy
using multiple thresholds and/or multiple censuses as long as they include appropriate controls,
which based on this analysis would include an indicator for lagged treatment, indicators for the
year and the threshold, and controls for the age structure of the population and the proportion
of second homes. The pessimistic conclusion is that there are many covariates we have not
tested (and many that are unobservable and thus untestable), and thus RDD analysis is likely to
be biased even after controlling for the set of covariates we have tested here. Clearly, strategic
sorting undermines the appeal of the RDD, and one would prefer RDD without sorting to
RDD with sorting (no matter how many covariates one can include). The practical research
design question, in our view, is whether to prefer an RDD to a pure observational study in
which the units freely choose their treatments. We consider this question in the next section.
VI. Discussion: when is an RDD with sorting still better than the
alternative?
Suppose one has the opportunity to use RDD to study the effect of a municipal policy that
depends on a population threshold, but there is evidence of strategic sorting; alternatively, one
can study the same policy in another setting where there is policy variation but cities choose
their policies freely. Which should one choose?
We offer the following thought experiment as a way to clarify some of the subtle tradeoffs
involved in the comparison between a flawed RDD and a pure observational study. Suppose
we wish to study the effect of policy A vs. A0 ; we have the choice of examining outcomes in one
15
An important caveat is that these covariates are measured at one point in time: the analysis tells us that
cities that have more nonprofits per capita now are not over-represented on the right side of salary thresholds
on average over the censuses in our dataset. If we had measures of the this and other covariates across time,
we would expect to find imbalance for the same reason we find imbalance in the year of the census, which is
that the sorting also varied over time.
25
setting where the policy is assigned to cities based on a population threshold but there is some
strategic sorting or in another setting where the policy is freely chosen by the cities themselves.
To make the comparison as simple as possible, we assume that we would be studying cities
around the same size in either setting. In the setting where the policy depends on a threshold
(the “RDD setting”), cities get policy A unless their reported population is c or higher (in
which case they get policy A0 ), but a proportion p of the cities are able to add or subtract
up to m inhabitants at a small cost; thus for p > 0 we expect to see strategic sorting in the
(reported) population interval [c − m, c + m), as some cities with true populations above the
threshold will have reported a population below the threshold and vice versa. In the other
setting (the “pure observational setting”), cities simply choose their own policy, and choices
are diverse such that in the population interval [c − m, c + m) we see both A and A0 . The
question is, under what conditions do we prefer the estimate from the RDD setting (where for
p > 0 we have strategic sorting) to the estimate from the pure observational setting?
We begin by making the most general argument for RDD that we can. Suppose that we
prefer the RDD estimate when p = 0 to the pure observational estimate, perhaps because there
are unobserved variables in the assignment of treatment in the pure observational case. If we
assume in addition that the potential outcomes are unrelated to population in both the RDD
and the pure observational case, then we can state that the RDD is always preferred to the
pure observational study, regardless of the amount of strategic sorting in the RDD case. To
see this, note that (given the independence of potential outcomes and population) the optimal
estimator in the RDD case when p = 0 (i.e. there is no sorting) is simply the difference in
means between units above and below the threshold; the optimal estimator in the RDD case
when p = 1 and no city is indifferent between policies (i.e. sorting is complete) is the optimal
estimator in the pure observational study. If the optimal estimator in the pure observational
study is simply the difference in means between the two treatment groups, then in the RDD
case the difference in means will yield a weighted average of the RDD estimate and the pure
observational estimate, with weights 1−p and p respectively. Because we have assumed that the
RDD estimate is preferred to the pure observational estimate, we can conclude that the RDD
setting is strictly preferred to the pure observational setting for all p < 1. Put simply, under
26
the assumptions we have made above, the worst case for RDD analysis with strategic sorting
is the estimate we would get in the corresponding pure observational study. This suggests
that RDD analysis may be preferred to a pure observational study even in the presence of
substantial sorting – assuming that the researcher accounts for sorting as well as she would
account for treatment assignment in the pure observational study.
When we deviate from the assumptions employed in the previous paragraph, however, the
case for RDD in the presence of sorting becomes less clear. The crucial assumption is the
independence of potential outcomes from population. Even if we maintain the assumption
that the clean RDD estimate is preferred to the pure observational estimate, the dependence
of potential outcomes on population could introduce problems such that there is a level of p
above which the pure observational setting would yield better answers. To see this, note that
the pure observational setting has one clear advantage over the RDD setting with substantial
sorting: in the pure observational setting we observe the true population. The problem with
the RDD setting when p is high is that, while we can control for other covariates that affect
treatment assignment in the pure observational study, we cannot control for true population;
as a result, RDD estimates with high p may be inferior to the pure observational estimates due
to what we may think of as either omitted variable bias or non-random measurement error.
What does this analysis suggest about how researchers might choose between a problematic
RDD analysis and a pure observational study? We note three considerations:
• Clearly, if the pure observational study would be preferred to the RDD analysis even in
the absence of sorting (e.g. because it is less local or has lower variance) then there is
even more reason to prefer the observational study when there is sorting.
• If the RDD would be preferred in the absence of sorting and the researcher is equally able
to observe covariates that determine treatment in the two settings, then the RDD setting
may be preferable to the pure observational setting even in the presence of substantial
sorting.
• The argument of the previous point weakens as the dependence of potential outcomes
on the (unmanipulated) running variable gets stronger and the bias from measurement
27
error in the problematic RDD becomes more severe.
VII. Conclusion
We have documented two major challenges that require careful attention when using population
thresholds in regression discontinuity designs: the same population cutoff is often used to
determine several policies (making identification and interpretation challenging), and in some
settings it appears that municipalities with populations near thresholds manage to manipulate
their populations to achieve desirable policies. We find evidence of both problems in France,
Italy, and Germany, three countries that have used such thresholds since the 19th century
or before. We expect these problems to appear in the many other countries in which similar
population-based policies are found.
Although both problems diminish the appeal of the RDD as a means of learning about the
effect of policies, we emphasize that the RDD may still be an attractive approach even when
both problems are present. We highlight a number of approaches for disentangling the effects
of policies that change at some of the same thresholds. We examined covariate imbalance in
Italy and discussed the importance of controlling for factors that are associated with sorting,
which may include the type of threshold or the vintage of the census in analysis that pools
data from several thresholds. Finally, we presented assumptions under which an RDD with
sorting is always preferable to a pure observational study, and we discussed deviations from
those assumptions that could lead to the opposite conclusion.
Our work leaves open important questions for future study. One such question is how
the strategic sorting that we document could be accomplished. In the French case, where
the municipality has always played a role in the collection of census data (e.g. recruiting and
training the staff that carries out the census), one can easily imagine how local officials could
encourage more effort to identify local inhabitants when it is known that the municipality
would benefit from doing so. (Consistent with this, sorting has declined since 1999 as the
central government has taken a more active supervisory and training role at the local level.)
In Italy, where the census is carried out in a much more centralized way, it is less clear.
Our analysis suggests a kind of “ratchet effect” at salary thresholds (producing persistence in
28
treatment over time); it also indicates that sorting was more severe in the past and (controlling
for the vintage of the census) municipalities with older populations and more second homes
are more effective at strategic sorting. It remains to explore the mechanisms by which this
was achieved. While our focus has been on how researchers can use population thresholds to
learn about the effects of political institutions and municipal policies, it may be enlightening
to study further the incentives these thresholds induce and what determines the local response
to these incentives.
29
References
Arnold, Felix and Ronny Freier. 2015. “Signature requirements and citizen initiatives: Quasiexperimental evidence from Germany.” Public Choice 162(1):43–56.
Asatryan, Zareh, Thushyanthan Baskaran and Friedrich Heinemann. 2014. The effect of direct
democracy on the level and structure of local taxes. Technical report ZEW Discussion Papers.
Asatryan, Zareh, Thushyanthan Baskaran, Theocharis Grigoriadis and Friederich Heinemann.
2013. “Direct Democracy and Local Public finances Under Cooperative Federalism.” ZEW
Working Paper 13-038 .
Baques, Manuel and Pamela Campa. 2012. “Gender Quotas, Female Politicians, and Public
Expenditures: Quasi Experimental Evidence.” Working Paper .
Barone, Guglielmo and Guido De Blasio. 2013. “Electoral rules and voter turnout.” International Review of Law and Economics 36:25–35.
Barreca, Alan I, Melanie Guldi, Jason M Lindo and Glen R Waddell. 2011. “Saving Babies? Revisiting the effect of very low birth weight classification.” The Quarterly Journal of
Economics 126(4):2117–2123.
Baskaran, Thushyanthan. 2012. “The flypaper effect: evidence from a natural experiment in
Hesse.”.
Baskaran, Thushyanthan and Mariana Lopes da Fonseca. 2014. “Electoral competition and
endogenous political institutions: quasi-experimental evidence from Germany.” Working
Paper Econstor .
Bordignon, Massimo, Tommaso Nannicini and Guido Tabellini. 2013. “Moderating Political
Extremism: Single Round vs. Runoff Elections under Plurality Rule.” WP .
Brollo, Fernanda and Tommaso Nannicini. 2012. “Tying Your Enemy’s Hands in Close Races:
The Politics of Federal Transfers in Brazil.” American Political Science Review 106:742–761.
Casas-Arce, Pablo and Albert Saiz. 2015. “Women and power: unpopular, unwilling, or held
back?” Journal of Political Economy, forthcoming .
Caughey, Devin and Jasjeet S Sekhon. 2011. “Elections and the regression discontinuity design:
Lessons from close us house races, 1942–2008.” Political Analysis 19(4):385–408.
De Benedetto, Marco Alberto and Maria De Paola. 2014. Candidates’ Quality and Electoral
Participation: Evidence from Italian Municipal Elections. Technical report IZA Discussion
Paper.
Egger, Peter and Marko Koethenbuerger. 2010. “Government spending and legislative organization: Quasi-experimental evidence from Germany.” American Economic Journal: Applied
Economics pp. 200–212.
30
Eggers, Andrew. 2015. “Proportionality and Turnout: Evidence from French Municipalities.”
Comparative Political Studies, forthcoming 48(2):135–167.
Eggers, Andrew C, Anthony Fowler, Jens Hainmueller, Andrew B Hall and James M Snyder.
2015. “On the validity of the regression discontinuity design for estimating electoral effects:
New evidence from over 40,000 close races.” American Journal of Political Science 59(1):259–
274.
Ferraz, Claudio and Frederico Finan. 2009. Motivating politicians: The impacts of monetary incentives on quality and performance. Technical report National Bureau of Economic
Research.
Fujiwara, Thomas. 2011. “A regression Discontinuity Test of Strategic Voting and Duverger’s
Law.” Quarterly Journal of Political Science 6(3-4):197–233.
Gagliarducci, Stefano and Tommaso Nannicini. 2013. “Do better paid politicians perform better? Disentangling incentives from selection.” Journal of the European Economic Association
11(2):369–398.
Grembi, Veronica, Tommaso Nannicini and Ugo Troiano. 2012. “Policy Responses to Fiscal
Restraints: A Difference-in-Discontinuities Design.” IZA Working Paper n.6952 .
Gulino, Giorgio. 2014. “Do Electoral Systems Affect the Incumbent Probability of Re-election?
Evidence from Italian Municipalities.” Working Paper presented at the EEA/ESEM .
Hinnerich, Bj¨orn Tyrefors and Per Pettersson-Lidbom. 2014. “Democracy, Redistribution, and
Political Participation: Evidence From Sweden 1919–1938.” Econometrica 82(3):961–993.
Hirota, Haruaki and Hideo Yunoue. 2012. “Local Government Expenditures and Council Size:
Quasi Experimental Evidence from Japan.”.
Hirota, Haruaki and Hideo Yunoue. 2013. “Does Local Council Size Affect Land Development
Expenditures? Quasi Experimental Evidence from Japanese Municipal Data.”.
Hopkins, Daniel J. 2011. “Translating into Votes: The Electoral Impact of Spanish-Language
Ballots.” American Journal of Political Science 55(4):814–830.
Imbens, Guido W and Thomas Lemieux. 2008. “Regression discontinuity designs: A guide to
practice.” Journal of Econometrics 142(2):615–635.
Koethenbuerger, Marko. 2012.
“Do Political Parties Curb Pork-Barrel Spending?
Municipality-Level Evidence from Germany.” CESifo Discussion Paper .
Lee, David S. and Thomas Lemieux. 2009. Regression Discontinuity Designs in Economics.
Working Paper 14723 National Bureau of Economic Research.
Litschig, Stephan. 2012. “Are Rules-based Government Programs Shielded from SpecialInterest Politics? Evidence from Revenue-Sharing Transfers in Brazil.” Journal of Public
Economics 96:1047–1060.
31
Litschig, Stephan and Kevin M. Morrison. 2013. “The Impact of Intergovernmental Transfers
on Education Outcomes and Poverty Reduction.” American Economic Journal: Applied
Economics 5(4):206–240.
Litschig, Stephan and Kevin Morrison. 2010. “Government spending and re-election: Quasiexperimental evidence from Brazilian municipalities.” UPF Discussion Paper .
Lyytik¨ainen, Teemu and Janne Tukiainen. 2013. “Voters are Rational.” Government Institute
for Economic Research Working Papers (50).
McCrary, Justin. 2008. “Manipulation of the running variable in the regression discontinuity
design: A density test.” Journal of Econometrics 142(2):698–714.
Mukherjee, Mukta. 2011. “Do Better Roads Increase School Enrollment? Evidence from a
Unique Road Policy in India.” Syracuse University .
Pellicer, Miquel and Eva Wegner. 2013. “Electoral Rules and Clientelistic Parties: A Regression
Discontinuity Approach.” Quarterly Journal of Political Science 8(4):339–371.
Pettersson-Lidbom, Per. 2006. “Does the Size of the Legislature Affect the Size of Government?
Evidence from Two Natural Experiments.” Stockholm University Working Paper .
Pettersson-Lidbom, Per. 2012. “Does the Size of the Legislature Affect the Size of Government:
Evidence from Two Natural Experiments.” Journal of Public Economics 98(3–4):269–278.
Urquiola, Miguel and Eric Verhoogen. 2009. “Class-size caps, sorting, and the regressiondiscontinuity design.” The American Economic Review 99(1):179–215.
van der Linde, Daan, Swantje Falcke, Ian Koetsier and Brigitte Unger. 2014. “Do Wages Affect
Politicians’ Performance? A regression discontinuity approach for Dutch municipalities.”
Utrecht University School of Economics Discussion Paper Paper 14-15.
Weingast, Barry R, Kenneth A Shepsle and Christopher Johnsen. 1981. “The political economy
of benefits and costs: A neoclassical approach to distributive politics.” The Journal of
Political Economy pp. 642–664.
32
VIII. Technical appendix
A. Two issues with discrete variables in the McCrary sorting test
The McCrary test based on the work by McCrary (2008) has become a standard method to test for sorting
in RDD settings. The test checks for manipulation of the RDD running variable by closely examining the
distribution of this variable around the threshold. Importantly, the test has been designed for continuous
variables. In this appendix, we illustrate two difficulties that researchers need to consider when applying this
test to a setting where the underlying RDD running variable is discrete.
Issue 1: Bin size selection with discrete variables
The first issue relates to the selection of one of the key parameters in the test, i.e. bin size. The McCrary
test proceeds in two steps. The bin size is important in the first step of the McCrary test which produces an
undersmoothed histogram from the data. Given this bin size selection, the histogram is in the second step
smoothed using a local linear regression (involving a selection of the bandwidth h).
McCrary (2008) suggests the following bin size selection procedure:
bb = 2b
σ n−1/2
(A.1)
where σ
b is the sample standard deviation of the forcing variable. Naturally, bb is not confined to the values of
the discrete variable distribution.
Let us first consider how the bin size works in discrete distributions. Assume that we have a discrete distribution
which takes the values -10 to 9 in increments of 1. The threshold is at 0 (which is also counted into treatment).
Further assume that we observe 100 observations for each discrete value. Note that bin size, x, in the McCrary
is defined in the following way: {..., [−2x, −x), [−x, 0), [0, x), [x, 2x), ...}.
Now consider that we use a cut-off point exactly at 0 and choose a bin size of 1. That would give us 20 equal
sized bins with 100 observations each. If we now vary the bin size to 0.5, we observe a crucial imbalance. To
the right of the threshold (the treatment side) we first have a bin [0,0.5) with 100 observations, followed by an
empty bin [0.5,1). On the left of the threshold, we observe the reverse. Here, the first bin [-0.5,0) is empty, and
the second bin [-1,0.5) holds 100 observations. A bin size smaller than the discrete steps in the distribution,
thus, creates empty bins which are distributed asymmetrically to the right and left of the threshold.
Even for non-integer bin sizes larger than one you will tend to have more possible population values per bin
to the right than to the left. Consider a bin size of 1.5 in the above example. The first bin to the right [0,1.5)
now includes 200 observations while the second bin [1.5,3) holds 100 observations only. Again, the reverse is
true for the left side. Here, the first bin [-1.5,0) includes only 100 obs, and the second bin [-3,1.5) includes 200
obs. When doing a McCrary on a discrete running variable, if you use a non-integer bin size there will be a
33
bias toward finding a jump in population. That is because for any bin size x and any k > 0 the number of
integer values in [0, kx) is weakly larger than the number of integer values in [−kx, 0).
To illustrate the problems in the estimation of the McCrary statistic, we simulated a data set consisting of
200,000 observations (perfectly) uniformly distributed to 500 discrete values [-250,249] in increments of one.
We set the threshold to 0 (zero is included in the treatment). Given this setup, the data are constructed such
that no sorting of any magnitude should be found. Implementing the McCrary Stata routine (using the cut-off
point at 0 and leaving the bin size and bandwidth to be calculated by the algorithm), we obtain an estimate
of 0.031 (0.011), a bin size of 0.65 and a bandwidth of 191.3 indicating a positive sorting result.
In the following we manipulated the bin size. In the left panel of figure 6, we vary the bin size between 0.5 –
3. The figure shows that depending on the exact bin size the estimated McCrary statistic varies significantly
and often signals a false positive sorting result.
In panel 2 of figure 6, we illustrate a similar problem of discrete bins when the cut-off value in the McCrary
test is manipulated. Note, that due to the discrete values there is no true cut-off point. Any value between
(-1,0] could be said to be the cut-off point. In the graph we highlight that the estimated McCrary statistic can
vary from significant negative to significant positive depending on the exact positioning of the cut-off value.
In figure 7, we again highlight the problem that certain bin sizes create artifically jumps in the density of the
running variable (by construction). In the left upper panel, we use a bin size of 0.5 which creates empty bins.
With a bin size of 1 (upper right panel) all bins are of similar size. Increasing the bin size to 1.5 (lower left
panel) again highlights that the choice of bin size creates distinct sets of bins with more or less observations in
it.
It is important to understand how the bin size interacts with the second important parameter in the test
statistic, the bandwidth h. For the case of continuous variables, McCrary (2008) finds that the choice of bin
size is unimportant as long as h is large compared to b (he suggests h/b > 10). Even for discrete variables,
any asymmetric grouping in the bins will become less important with a larger bandwidth. However, in our
simulations we found that while an increased bandwidth had a mitigating effect on the bias from choosing a
particular bin size, a false positive test result could be obtained even with h/b > 100 or more.
We see two solutions to this problem. First, one can restrict the bin size in the McCrary test to the set of
integers. Guidance as to which multiple of the increment to chose can be obtained from the original McCrary
test. For a bin size of below 1 in the original McCrary, the minimum bin size of 1 should be set. For larger bin
sizes, the closest multiple of an increment value of the discrete running variable distribution should be used.
A second solution works as follows. Redefine the breakpoint, c, in the McCrary away from zero to half the
distance between 0 and the next integer in the discrete distribution (in our above example this would be -0.5).
Now, define the bins to the right and left of this new breakpoint to be {..., [c − 2x, c − x), [c − x, c), (c, c + x], (c +
x, c + 2x], ...}. This solves the asymmetry problem, does not throw out any data, and allows the McCrary
process to pick the bin size and bandwidth with the original algorithm.
34
−.05
−.05
0
Estimated McCrary statistic
.05
0
Estimated McCrary statistic
.1
.05
Figure 6: Simulating different bin sizes and cut-off values
.5
1
1.5
2
2.5
Selecting a bin size
3
−1
−.8
−.6
−.4
−.2
0
Position of the threshold in the McCrary code
Note: In this figure, we manipulate the bin size (left panel) or the exact position of the cutoff (right panel)
in the McCrary routine. The patterns signify that the choice of those parameters can be crucial in a discrete
settinng. Source: Own calculations.
35
Figure 7: Issues with bin size and discrete values in McCrary tests
Bin size = 1
0
0
.0005 .001 .0015 .002
.001 .002 .003 .004
Bin size = 0.5
−500
0
500
−400
−200
0
200
400
0
.001
.002
.003
Bin size = 1.5
−400
−200
0
200
400
Notes: The figure shows the graphical output of the McCrary routine. The focus is on the distinct sets of
observations when there is a bin size of 0.5 (upper left panel) or a bin size of 1.5 (lower left panel). Only
when there is a bin size of 1 do we see that all the simulated observations are correctly bined. Source: Own
calculations.
A third, but inferior solution is to drop the observations at 0 from the analysis. This, however, comes at the
cost of losing (potentially crucial) information just around the threshold.
Issue: Pooling different thresholds and using a log transformation
A more specific problem arises when data are pooled over different thresholds and distance to the threshold is
measured in a relative scale.
Consider the case of population thresholds. Say, a policy changes at 10 different thresholds: 1,000, 2,000, ...,
10,000. Also assume that we have a universe of towns in the size of 500 to 10500 with exactly one town at each
particular value (1 town of size 500, 1 of size 501,... 1 of size 10499, 1 of size 10500).
Now, consider a pooling strategy with absolute deviations. For each town we measure the shortest absolute
distance to the next threshold and stack the samples on top of each other. We end up with a distribution
(similar to the case above) in which we have values between -500 to 500 and exactly 10 observations for each
integer (including the zero). We are back to the problem described above, however, no additional issue arises.
Let us now turn to the case of relative scaling. Say, we measure the distance to the next threshold in the form
(z−c)
c , where z is the size of the town and c is the closest threshold (a log transformation will give a similar
representation). This type of relative transformation might be preferable to the absolute distance, as larger
towns might indeed be comparable within a larger absolute margin.
36
.4
.2
0
0
5
Frequency
10
Estimated McCrary statistic
.6
.8
Figure 8: Pooling different thresholds
−.5
0
Distribution of the pooled data
.5
0
.05
.1
.15
Different bin size
.2
Note: This graph highlights the uneven distribution of an initially uniformly distributed universe of towns
when pooling is done on a relative scale (left panel). Also, we show how the McCrary routine estimates in this
flawed pooling exercise which turn out to give false positive test results when the bin size is very small (right
panel). Source: Own calculations.
Using this relative transformation results in a particular type of discrete distribution. Each town that has a
size of exactly a threshold value will get the relative distance of 0. There are 10 such towns. Now, the town
that has 1001 will be at 1/1000, so will be the towns at 2002, 3003, 4004,..., 10010. Thus, there are also 10
observations at this discrete value in the distribution. However, the town at 2001 will be at the 1/2000 point
in the distribution. Here, there are only towns at 4002, 6003, 8004 and 10005. Hence, there are only 5 towns in
this discrete bin. In the extreme, consider the town at 10001 which stands alone at the 1/10000 bin. Similarly,
towns at 9001, 8001, 7001, and 6001 stand alone at their particular distribution value. The same pattern occurs
on the other side of the thresholds (here, the observations at 9999, 8999, 7999, 6999, 5999 stand by themselves).
Estimating the sorting in this case is problematic when the zeros count into either of the sides. Assume
treatment starts at zero (to the right side). We now have a bin at zero which has (by construction) observations
from all thresholds. To the left of this bin (on the non-treatment side) there are 5 bins which relate to only one
threshold respectively and in our example hold only one observation. For small bin sizes, the relative pooling
creates an asymmetry exactly at the threshold.
37
We illustrate the issue by simulating the above example. The left panel of figure 8 illustrates the resulting
histogram. Noticeable, the histogram is mirrored to the right and left of the zero. Depending on whether the
zero is counted to the right or to the left, we can see that there exists a sorting issue right at the threshold.
In the next step, we estimated the McCrary test statistic on a sample where we duplicated every observation
above 20 times (this brings us to a comparable sample size as above). To estimate the McCrary and avoid the
first issue (see above) we set the breakpoint to -0.005. In panel 2 of figure 8, we vary the bin sizes between
0.01 and 0.2.16 We find that for small bin sizes the McCrary test signals a false positive and significant sorting
result which is entirely driven by the particular issue related to the zeros.
While the first issue (see above) can be remedied without loss of information, the second problem concerning
the zeros in pooled data, when the distance is measured in relative terms, cannot be solved. The only technical
solution to obtain correct inference on the scope of sorting in this case is to drop the observations at zero
altogether. However, in doing this, we lose (potentially critical) information.
16
Here, we also manipulated the bandwidth to be exactly 10 times the bin size. If we let the algorithm choose
a bandwidth, we do find positive point estimates, however, they are not statistically different from zero. The
reason is that the algorithm chooses a bandwidth which is more than 200 times larger than the bin size and
thus smooths away any difference at the threshold.
38
Appendix: Additional tables and Graphs
Table 4: RDD studies using population thresholds
Authors
Publication
status
Research
focus
Country
Time
Thresholds
Panel 1: Studies using a simple RDD
Egger &
Koethenbuerger
Pettersson-Lidbom
2010, AEJ:App
Council size
Germany
1984-2004
1’000, 2’000, 3’000
5’000, 10’000, 20’000
30’000, 50’000, 100’000
200’000
2011, JPubE
Council size
Finland
1977-2002
2’000, 4’000, 8’000
15’000, 30’000, 60’000
120’000, 250’000, 400’000
12’000, 24’000, 36’000
Sweden
1977-2002
Fujiwara
2011, QJPS
Single round
vs. runoff
Brazil
1996-2008
200’000
Hopkins
2011, AJPS
Ballots
US
2005-2006
5% of citizen, 10,000
Litschig
2012, JPubE
Sorting
Brazil
1991
17 brackets
Barone &
de Blasio
2013, IRLE
Single round
vs. runoff
Italy
1993-2000
15’000
Brollo, Nannicini,
Perotti & Tabellini
2013, AER
Transfers
Brazil
2001-2008
10’189, 13’585, 16’981
23’773, 30’564, 37’356
44’148
Gagliarducci &
Nannicini
2013, JEEA
Wage of mayors
Italy
1993-2001
5’000
2013, AEJ:App
Transfers
Brazil
1982-1988
10’189, 13’585, 16’981
2013, QJPS
Election rules
Morocco
Litschig & Morrison
Pellicer & Wegner
39
25,000
continued on next page . . .
. . . continued from previous page
Authors
Publication
status
Research
focus
Country
Time
Thresholds
2003, 2009
Hinnerich-Tyrefors &
Pettersson-Lidbom
2014, Econometrica
Direct democracy
Sweden
1918-1938
1’500
Arnold & Freier
2015, PuCh
Signature
requirements
Germany
1995-2009
10’000, 20’000, 30’000,
50’000, 100’000
Eggers
2015, CPS
Electoral
rule
France
2001-2008
3’500
Bordignon, Nannicini &
Tabellini
R&R, AER
Single round
vs. runoff
Italy
1993-2007
15’000
Ferraz & Finan
2012, WP
Wage of mayor
Brazil
2004-2008
10’000, 50’000, 100’000
300’000, 500’000
Mukherjee
2011, WP
Infrastructure
India
2001-2009
500
Baskaran
2012, WP
Transfers
Germany
2001-2010
5’000, 7’500, 10’000
15’000, 20’000, 30’000
50’000
Hirota & Yunoue
2012, WP
Council size
Japan
2’000, 5’000, 10’000
20’000, 50’000, 100,000
200’000, 300’000, 500’000
900’000
Egger &
Koethenbuerger
2013, WP
Council size
Germany
1984-2004
1’000, 2’000, 3’000
5’000, 10’000, 20’000
30’000, 50’000, 100’000
200’000
Lyytik¨
ainen & Tukiainen
2013, WP
Council size
Finland
1996-2008
2’000, 4’000, 8’000
15’000, 30’000
De Benedetto, &
2014, WP
Wage of mayors
Italy
40
1’000, 5’000, 50’000
continued on next page . . .
. . . continued from previous page
Authors
Publication
status
Research
focus
De Paola
Country
Time
Thresholds
1991-2001
van der Linde
et al.
2014, WP
Wage of
politicians
Netherlands
2005-2012
Casas-Arce, Saiz
8’000, 14’000, 24’000
40’000, 60’000, 100’000
150’000, 375’000
Panel 2: Studies using a Difference-in-Discontinuity
2015, JPE
Gender Quota
Spain
2003-2007
5’000
Baques & Campa
2011, WP
Gender Quota
Spain
2003-2007
5’000
Grembi,
Nannicini & Troiano
2012, WP
Fiscal rules
Italy
1999-2004
5’000
Asatryan, Baskaran,
Grigoriadis, Heinemann
2013, WP
Citizen
referenda
and spending
Germany
1980-2011
10’000, 20’000, 30’000,
50’000, 100’000
Asatryan, Baskaran,
Heinemann
2013, WP
Citizen
referenda
and taxes
Germany
1980-2011
10’000, 20’000, 30’000,
50’000, 100’000
Gulino
2014, WP
Electoral
rules
Italy
1985-2000
5’000
Notes: All papers are also cited in our references.
41
Table 5: Population thresholds in Italian municipalities
Policy changes at k inhabitants (in tsd)
1
Size of the city council
3
5
x
Wage of the mayor
x
Wage of the executive officers
x
x
10
15
20
x
x
x
x
x
x
x
Attendance fee for city councilors
x
Maximum number of executive officers
x
Electoral Rule (plurality/runoff)
50
60
x
100
250
500
x
x
x
x
x
x
x
x
x
x
x
x
Neighborhood councils
x
Hospitals
x
x
Health district
Balanced-budget rule
30
x
x
Note: The table identifies population thresholds (in thousands) at which given policies change. This is a
partial list of policies, chosen to highlight the variety of policies that depend on population thresholds and the
extent to which the same threshold often determines multiple policies. Source: French legal code.
42
43
Elections
Signatures for party lists
Signatures for mayoral candidacy
Election districts
Ballot districts
Reevaluation of an election
Committees/commisioners
Equal opportunity commissioner
Integration council
Accounting agency
Oversight regulation
Council for top-secret issues
Open council
Mayor
Mayor status
Mayor title
Deselection of mayors
Qualification of mayor
24
25
26
27
28
29
30
31
32
33
Fiscal rules
Status of a larger city
Status of a county free city
Fiscal equalization
Special transfers for mergers
12
13
14
15
19
20
21
22
23
Wages
Wage of mayors
Additional compensation of the mayor
Wage of head of admin. units
Wage of deputies
Wage of mayors in recreational cities
7
8
9
10
11
Citizen involvement
Citizen request
Petition for referenda
Quota requirements for referenda
Councils
Council size
Full-time council members / Deputies
City districts
City district council
Administrative units
Council of the administration units
1
2
3
4
5
6
16
17
18
Institution/ Rule
#
(2)
[1]
[1]
[1]
[1]
[4]
[6]
[6]
[6]
[1]
[1]
[1]
[7]
(5)
[3]
(12)
(3)
SH
[2]
[1]
1
[2]
[1]
(3)
1
[2]
[1]
[2]
[7]
[2]
19*
[6]
[3]
(6)
[9]
[5]
(10)
NRW
[2]
[9]
[5]
(2)
(30)
[2]
[1]
NiedS
1
[1]
1
[1]
[1]
[1]
1
[2]
[7]
[1]
[1]
(11)
1
(9)
Hes
[1]
[3]
[2]
[14]
(6)
(4)
[1]
[8]
[7]
[15]
[4]
[1]
[1]
RP
[2]
1
[3]
(4)
1
[2]
[5]
[4]
[4]
[1]
[1]
[1]
(11)
(10)
[1]
[1]
BW
[2]
[2]
[6]
[2]
[1]
[1]
(7)
(9)
(10)
[1]
[1]
Bay
[1]
[1]
[1]
[1]
[1]
[5]
(7)
[1]
[5]
(7)
[6]
[4]
1
(3)
Saar
1
[5]
[1]
(6)
[1]
(2)
(6)
[1]
(7)
[1]
[1]
(5)
(11)
[1]
BB
[1]
[1]
[1]
[1]
1
1
[3]
1
(3)
(10)
[7]
[5]
[1]
[1]
(12)
[1]
Saan
[1]
[2]
[2]
[1]
[1]
[1]
1
(7)
(6)
[1]
[2]
[1]
[1]
1
[1]
[2]
[7]
[1]
(8)
[7]
[3]
[8
1
[9]
[3]
[1]
[11]
(5)
Th
(12)
(14)
(6)
Sax
continued on next page . . .
[1]
[1]
1
[2]
[1]
[1]
[1]
[1]
(7)
[7]
[1]
(2)
[4]
(15)
[1]
MVP
Population thresholds in the different German states
Table 6: Population Thresholds in Germany (by rule and state)
11
4
3
3
8
5
5
3
5
6
36
7
19
8
3
19
41
12
16
3
69
3
116
36
20
20
4
165
29
5
10
7
9
Σ
44
Notes: Source: Own research.
51
14
22
10
44
18
20
10
73
15
22
4
63
20
20
7
46
17
14
6
[1]
[1]
[1]
Saar
52
19
22
6
[1]
[2]
[1]
[1]
MVP
51
15
23
9
[2]
BB
Saan
Sax
Th
54
17
22
12
[4]
72
15
27
12
[2]
[4]
64
19
17
3
67
14
34
18
[1]
[1]
[1]
(8)
Bay
47
15
17
8
60
19
21
9
[6]
[1]
BW
Σ
# of different policy issues
# of different population thresholds
# of thresholds with unique change
[1]
[1]
[1]
[3]
RP
[1]
[1]
[1]
[1]
[1]
[2]
Hes
[1]
4
[1]
[1]
[1]
NRW
Unique rules
Direct democracy
Consolidated accounts
Youth welfare office
Construction oversight
Mayor deputy recall
County financing
Additional transfers
Key transfers
Deputies in administrative units
Pension funds
Municipal treasurer
Office hours on election day
Election statistics
Economic status
Accounting committee
Water management
Vehicle Tax
Deputy mayor status
Naming of city districts
Cooperation councils
Council after mergers
Outside administrative units
Forced administrative reconsideration
Administrator qualification requirement
Format of the ballots
Add compensation of head of admin
City district elections
Infrastructure subsidies
Country-side municipalities
General committee
Annual audits
Election proposals
NiedS
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
SH
Institution/ Rule
Population thresholds in the different German states
#
. . . continued from previous page