Regression Discontinuity Designs Based on Population Thresholds: Pitfalls and Solutions Andrew C. Eggers – University of Oxford Ronny Freier – DIW Berlin and FU Berlin Veronica Grembi – Copenhagen Business School Tommaso Nannicini – Bocconi University First Version: October 2013 This version: April 2015 Abstract In many countries, council size, mayoral salaries, and various other aspects of municipal policy depend discontinuously on a city’s population. Several papers have used a regression discontinuity design (RDD) to measure the effects of these threshold-based policies on political and economic outcomes. Using evidence from France, Germany, and Italy, we highlight two common pitfalls that arise in exploiting population-based policy changes (confounded treatment and sorting) and provide guidance for detecting and addressing these pitfalls. Even when these problems are severe, policies based on population thresholds offer an appealing opportunity to study the effects of policies and political institutions. Thanks to Janne Tukiainen for providing data. I. Introduction Researchers attempting to estimate the effects of political institutions (or more generally, policies) face serious endogeneity problems: it is usually impossible to run an experiment in which consequential policies are randomized, and in most observational data it is difficult to locate or construct valid counterfactuals given the various strategic and contextual factors that affect policy choices. In recent years, many researchers have attempted to address these problems by exploiting cases in which policies at the subnational (usually municipal) level depend discontinuously on population thresholds. By studying systems where important policies (e.g. the salary of mayors, the electoral system used in local governments, the extent of fiscal transfers) are imposed on municipalities according to arbitrary population cutoffs rather than chosen by the municipalities themselves, researchers can sidestep the central challenge of measuring the effects of policies, which is that political units that adopt different policies typically differ not just in the policy of interest but also in other ways that may be difficult to control for in a regression. In applying the regression discontinuity design to policies determined by a population threshold, researchers both in economics and political science attempt to isolate a comparison where cities that are identical in expectation nevertheless adopt a different policy (examples include Pettersson-Lidbom 2006, 2012; Fujiwara 2011; Hopkins 2011; Gagliarducci and Nannicini 2013). We survey existing studies in this vein in table 4 in the appendix and provide a detailed discussion of the literature in Section II. Although population-based policies offer an unusual chance to study the effects of policies and political institutions, the opportunity comes with two pitfalls that (based on our study of France, Germany, and Italy) are likely to plague many such studies. The first pitfall is that the same population threshold is often used to determine multiple policies, which makes it difficult to interpret the results of RDD estimation as measuring the effect of any one particular policy. We show the extent of confounded treatment in the three countries we study, emphasizing that extensive institutional background research is necessary before one can interpret the results of a population threshold-based RDD as evidence of the effect of a particular policy. We also highlight several approaches to estimating the effects of individual policies in settings with various kinds of confounded treatment, all of which rely on assumptions of additive 1 separability. The second pitfall is sorting, i.e. the tendency of municipalities to strategically manipulate their official population (whether through recruiting/repelling residents or overcounting/undercounting residents) in order to fall on the desired side of a consequential population threshold. Again focusing on France, Germany, and Italy, we show conclusive evidence of sorting and highlight the ways in which this sorting could invalidate the identifying assumptions of the RDD. We emphasize that even when extensive strategic sorting is taking place, an RDD approach based on population thresholds may still be an attractive approach to studying policy effects; after all, the alternative is to study settings where all political units (not just those that are able to manipulate their population figures) choose which policies to adopt. In settings with strategic sorting, it is important to identify and measure confounding factors that help explain the sorting and may be related to potential outcomes. We carry out such analysis in Italy, highlighting some of the challenges of pooling data from several censuses and thresholds in a setting with strategic sorting. Although our focus is on recent decades in France, Germany, and Italy, we emphasize that the use of population thresholds to assign local policies extends much more broadly in both space and time. The existing literature has evaluated population-based policies in twelve countries on four continents, including the U.S., Spain, Brazil, Morocco, India and Japan; our own casual search quickly yielded examples of similar policies in several other countries where to our knowledge no such study has been carried out (e.g., the U.K., Belgium, Austria, Norway, Poland, Slovakia and Mongolia). While existing work overwhelmingly focuses on recent decades, the use of population thresholds in fact has a long history. The first law on municipal government in revolutionary France (passed in December of 1789) includes six provisions dictating features of municipal government as a function of population, including a rule specifying six population thresholds determining the council size.1 A law that reformed the constitutional rights of municipalities in Prussia as early as 1808 created categories of municipalities based on size (with cutoffs of 3,500 and 10,000 inhabitants) which were used to 1 “Loi relative ` a l’organisation des communes du royaume de France” of 14 December 1789. Other provisions specify population thresholds that determine the number of constituent assemblies, the number of citizens necessary to call a new general assembly, whether there would be a substitute “procureur”, the number of municipal districts, and transparency rules for municipal budgets. 2 assign different rules on council size, voting rights, and budget transparency (among others)2 In Italy, statutory rules with population cutoffs applying to council size, executive committees and voting rights can be dated to at least the time of the Italian unification in 1865.3 As we document in the next section, many studies have already appeared that make use of population thresholds, but given how many such policies exist (and for how long they have existed), we expect many more such studies to appear in coming years. We hope this paper helps future researchers to make use of this opportunity. II. Literature review The use of regression discontinuity designs based on population thresholds was first suggested and used by Pettersson-Lidbom (2006, 2012). He evaluated the effect of an increase in council size, changing deterministically at certain population thresholds, on overall expenditures in Finnish and Swedish municipalities. Following his methodological innovation, researchers have used these types of regression discontinuity designs to study a multitude of issues; we survey twenty-eight papers in Table 4 in the appendix. On the whole, this literature can be divided into two categories, based on the type of policy whose effects are being studied: political institutions (within which council size, mayor salary, and electoral institutions have received the most attention) and fiscal resources. Within the category of political institutions, several papers have followed PetterssonLidbom (2006)’s lead by studying the effects of changing the size of the municipal council (motivated ultimately by earlier work such as Weingast, Shepsle and Johnsen (1981)): Egger and Koethenbuerger (2010); Koethenbuerger (2012) study the effects of council size in Germany; focusing on urban land development as an outcome as well as local spending, Hirota and Yunoue (2012) and Hirota and Yunoue (2013) also evaluate changes council size for the case of Japan. Other papers have used the same design to study the effects of changing the salary of the mayor or other elected officials. Gagliarducci and Nannicini (2013); Ferraz and Finan (2009); van der Linde et al. (2014) study how wage increases affect the performance 2 The definition of city types appears in Article 10 of “Preußsche St¨adteordnung von 1808”; policies pertaining to the types appear in Articles 70, 74, 142, 152, 153 and 183. 3 A law named Legge Lanza from 1865 (which was drafted after the Royal Decree n.3702 from 1859) regulated the organization of local administrations for the former Kingdoms Piedmont and Sardinia. 3 of politicians; De Benedetto and De Paola (2014) examines the effect of wages on candidate quality. Baques and Campa (2012); Casas-Arce and Saiz (2015) study the electoral and fiscal effects of a gender quota on municipal councils. Several papers have examined the effects of elements of the electoral system that vary with population thresholds. Fujiwara (2011) shows the validity of Duvenger’s law in a setting where the electoral rule (single round versus runoff system) depends directly on a population threshold, while Barone and De Blasio (2013) study the relationship of such electoral rules with voter turnout. Bordignon, Nannicini and Tabellini (2013) highlight that a single-ballot versus dual-ballot system is of significant influence for the presence of extreme parties. Hopkins (2011) studies the provision of ballot sheets in Spanish on voting outcomes. Gulino (2014) studies the effects of changes in electoral rules on the re-election probabilities of incumbent mayors. The effect of majoritarian versus proportional systems have been evaluated by Eggers (2015) for turnout and by Pellicer and Wegner (2013) for the success of clientelistic parties. Several recent papers also examine the effect of direct democracy using population thresholds that affect the nature of representation or the difficulty of initiating referendums. Hinnerich and Pettersson-Lidbom (2014) studies the impact of representative democracy versus direct democracy on government spending. Arnold and Freier (2015) substantiate the relationship between signature requirements and the number of popular initiatives. Asatryan et al. (2013); Asatryan, Baskaran and Heinemann (2014) study the effects of direct democracy on spending and taxes. Within the category of fiscal resources, researchers have studied the effect of changes in fiscal transfers on local elections (Litschig and Morrison (2010)), education and poverty (Litschig and Morrison (2013)), corruption and the quality of politicians (Brollo and Nannicini (2012)), and school enrollment (Mukherjee (2011)). Baskaran (2012) studies the flypaper effect (i.e. the tendency of transfers and tax revenues to have different effects on fiscal policy). An analysis of particular interest for our paper is the work by Lyytik¨ainen and Tukiainen (2013). They study changes in turnout related to the increase in the probability of being pivotal (due to changes in council size at population thresholds) in Finland. Importantly, the 4 paper identifies two treatments that are triggered by the same changes in council size.4 The paper is, thus, directly confronted with the issue of confounded treatments, an issue which also we highlight in our analysis. Note that also the papers by Arnold and Freier (2015) and Eggers (2015) recognize issues of confounded treatments and provide extended robustness tests respectively to deal with those problems. Another related strand of literature discusses sorting around thresholds and the particular consequences to RDD applications. The assumption of continuity (see Imbens and Lemieux (2008); Lee and Lemieux (2009)) has been criticized and discussed by Urquiola and Verhoogen (2009) for class size, Barreca et al. (2011) for birth weight or Caughey and Sekhon (2011) and Eggers et al. (2015) for close elections. A formal test for sorting in RDD settings has been developed by McCrary (2008). The paper that is most related to our work in this strand of research is the study by Litschig (2012), which looks at the manipulation of population figures in Brazil. III. Multiple treatments A. Documenting the extent of confounded treatment The appeal of studying a setting in which a policy depends on a population threshold is the fundamental comparability of political units above and below the threshold: if cities above a threshold experience policy A while those below the threshold experience policy A0 , and the two groups of cities are comparable in all other respects, then we should be able to use an RDD to compare the two sets of cities and thus learn about the effect of applying policy A as opposed to policy A0 . This research design is attractive because typically political units that choose policy A differ in numerous other ways from political units that choose policy A0 , which makes it difficult in the typical observational study to disentangle the effect of applying policy A (as opposed to A0 ) from the effect of other differences between cities that choose A and those that choose A0 . 4 Lyytik¨ ainen and Tukiainen (2013) show that changes in council size can effect turnout directly, as more council members are elected and the seat allocation rule makes a single vote more probable to be pivotal. At the same time, council size also determines the length of the party lists in the election. The number of politicians running for office is, thus, also directly influenced by council size, which can have an independent effect on turnout. The authors solve this identification problem by using an instrumental variable strategy that disentangle the effects from multiple council size thresholds. 5 The first common problem we highlight in this paper is that often the population threshold that determines whether policy A or A0 is applied will also determine whether other policies (B or B 0 , C or C 0 ) are applied; the policy change we hope to study (A vs A0 ) is thus confounded with other policy changes, undermining the appeal of the RDD. Figure 1 summarizes the problem in France, Italy, and German states. Each dot indicates a population threshold at which at least one policy changes; solid dots indicate that more than one policy changes at the same threshold. In every case (i.e. France, Italy and every German state) there are some thresholds where just one policy changes, but such thresholds are in the minority. Figure 1: Population thresholds at which municipal policies change: France, Italy, and German states ● France One policy change ● ● ● Italy ● ● ● ● SH ● ● ● ● NRW ● ● Hes ● ● NiedS ● RP ● BW ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● 250 500 ● ● ● ● ● ● ● 2500 ● ● ● ● ● ● ● 10000 ● ●● ● ● ● ● ● ● ● ● 5 or more policy changes ● ● 100 ● ● ● ● ● ● ● ● Saan ● ● ● MeckP Sachsen ● ● Saar Brand 2−4 policy changes ● ● ● Bay Thuer ● ● ● ● ● ● ● 50000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● 250000 Population Note: Each dot indicates a population threshold at which a policy changes; solid dots indicate more than one policy changing at the same threshold. 6 Table 1 provides detail on the various policies that change at population thresholds up to 50,000 in France; the appendix provides details about population-dependent policies in Italy and Germany (see tables 5 and 6). As Table 1 indicates, at every threshold at which council size increases, the maximum number of deputy mayors also increases, which makes it impossible to disentangle the effect of council size from the effect of additional paid council staff. There is only one threshold (1,000 inhabitants) at which the salary of mayors and deputy mayors increases without the council size also increasing. Many of the most interesting policies change at a single threshold of 3,500 inhabitants at which several other policies (including council size and mayor’s wage) also change: the electoral rule used to elect the council, the requirement of gender parity in party electoral lists, and the requirement that the council debate the budget before adopting it. In total, we identified 19 different policy issues that depend on at least one of the 14 thresholds in France. Correspondingly, we have 10 policy issues at 12 thresholds in Italy and an astounding 65 policy issues at 51 thresholds in Germany.5 In all three countries that we examine, our ability to study the effects of policies that depend on population thresholds is constrained by the tendency for multiple policies to depend on the same threshold. B. Practical challenges in detecting confounded treatment Detecting whether a given treatment is confounded with another treatment can simply be a matter of scouring the legal code for policies that explicitly depend on population thresholds. To the extent that researchers study historical time periods this institutional research is further complicated by the constraints to identify all relevant changes over time (here especially with policies that where only temporarily enacted6 ). Apart from searching the past and current laws, the issue of identifying all relevant policy issues can be more subtle for a number of reasons. First, it may be the case that further policies ,e.g. size of committees, legally depends on the size of the council, which means that 5 The German case is particular as the municipal level is under control of state legislation. This implies that the rules are state-specific. No state uses fewer than 14 different policies and, which the exception of NorthRhine-Westphalia, every state has policy topics that are used uniquely in that state. We provide detailed descriptions of the content of the policies and the respective thresholds in the tables ??-?? in the online appendix. 6 For Germany, e.g., we identified a number of policies that related to population thresholds that were used in one-shot boundary reforms that determined the size of merged municipalities. 7 Table 1: Population thresholds in French municipalities 1 Policy changes at k inhabitants (in tsd) Council size 0.1 0.5 x x Salary of mayor and deputy mayors x x Max. number of non-resident councilors x x 1.5 2 x x Max. number of deputy mayors 1 2.5 3 x x 5 x x 9 x x Must have a cemetery 3.5 x x x 10 20 30 50 x x x x x x x x x x x x Prohibition on commercial water supply x Campaign leaflets subsidized x Council must approve property sales x Electoral system – PR or plurality x Gender parity x Outsourcing scrutiny x Council must debate budget prior to vote x Committees follow PR principle x Amount of paid leave for council work x Commission on accessibility x x x Max. electoral expenditure x Outsourcing commission x Max. municipal tax on salaries x Debt limit x x Note: The table identifies population thresholds (in thousands) at which given policies change. This is a partial list of policies, chosen to highlight the variety of policies that depend on population thresholds and the extent to which the same threshold often determines multiple policies. Source: French legal code. searching for population cutoffs would not give a complete picture of policies that change at the same population thresholds as council size. An example of this type of second-order policies which can be equally important is given in Lyytik¨ainen and Tukiainen (2013). Second, when there are multiple tiers of government, it may not suffice to understand the regulation of the local municipal code. Also legislation from other tiers and different governmental organization may steer into the effects at certain thresholds.7 Third, even policies that do not depend on 7 In Germany, e.g., multiple policies at the state and federal level implement discontinuous changes of policies at population thresholds (see election laws for state and federal elections). Also, other federal organizations may use population thresholds to discriminate political units. Again in Germany, the federal statistical office 8 population thresholds directly may have an effect through interactions with population-based rules. An example of this is used by Baskaran and Lopes da Fonseca (2014), who study vote share hurdles (which do not depend on population thresholds) in German municipalities and utilize the fact that those rules are ineffective when the council size (which depends on such thresholds) are too small.8 To summarize, a researcher should know a setting intimately before concluding that a given policy (and not other policies) changes at a given population threshold. C. Addressing confounded treatments What can a researcher do when a policy of interest changes at a threshold where other policies change? We highlight a few options. One viable option is to broaden the focus to include other policies that change in tandem. If two policies move in perfect lockstep, then what initially may seem like an opportunity to learn about the effect of policy A vs A0 is at best an opportunity to learn about the effect of policy combination AB vs A0 B 0 . In some cases it is worth studying the effect of this bundle of policies, however. In France, for example, changes in council size always coincide with changes in the number of deputy mayors. The perfect confounding of these two policies means that it is impossible in this setting to separate the effect of the two treatments, but as long as this pair of policies does not perfectly coincide with other policy changes then one could use RDD to learn about the effect of that pair. The downside, of course, is that as one combines additional policy changes it becomes more difficult to motivate and interpret the analysis beyond the immediate setting being considered. Grembi, Nannicini and Troiano (2012) combine the advantages of regression discontinuity design and difference and difference in a new methodology they call “difference-in-discontinuity”. Suppose that at a given point in time there is a population threshold at which policy B changes to policy B 0 ; after a reform, the same threshold produces a change from policy AB to policy implemented the last census in 2011 using two systems of population accounting for towns above and below 10,000 inhabitants respectively which resulted in very different population adjustments above and below the threshold. 8 A small example shall highlight the problem. A 5% hurdle for small parties has no bit when there are only 10 seats in a council (and a party effectively needs around 10% to win a seat). When council size increases from 16 to 20 seats at 5,000 inhabitants the regime change compares a system with an ineffective vote share hurdle to one in which the hurdle poses an actual constraint. 9 A0 B 0 . The basic idea of the difference-in-discontinuity approach is to apply RDD both before and after the reform; under the assumption that the effects of A vs. A0 and B vs. B 0 are additively separable, the difference between these estimates yields a consistent estimate of the effect of A vs. A0 . The Italian example in the paper by Grembi, Nannicini and Troiano (2012) is a new policy that relaxed existing fiscal constraints for towns above 5,000 inhabitants in 2001. As the threshold is used for multiple other treatments throughout the entire time period, a standard RD approach is not suitable, but Grembi, Nannicini and Troiano (2012) use the difference-in-discontinuity approach (comparing the period before and after the policy change) to estimate the effect of relaxing the fiscal constraints. A similar approach can be used in a single cross-section in cases where a treatment of interest coincides with other “nuisance” treatments, but the nuisance treatments also change at other thresholds. Suppose there is a threshold ab where policy A changes to A0 and policy B changes to B 0 ; at another threshold b, policy B 0 changes to B 00 . Under the assumptions that (i) the effect of the B 0 → B 00 change on a given outcome is the same as the B → B 0 change, and (ii) the effect of the A → A0 change does not depend on policy B vs. B 0 , the effect of A vs. A0 can be estimated by subtracting the RDD estimate at threshold b from the RDD estimate at threshold ab. Arnold and Freier (2015) and Eggers (2015) provide evidence in this spirit. Eggers (2015) evaluates the effect of proportionality and turnout in French municipalities, using an electoral system change at the 3,500-inhabitant threshold where the council size and the salary of mayors also change. To make the case that the effects observed at this threshold are due to the electoral system and not the other treatments, Eggers (2015) shows that the outcome does not respond in the same way at other thresholds where the other treatments change. Arnold and Freier (2015) use a similar test to compare small and big thresholds where the policy of interest (the number of signatures necessary for local initiatives) changes only at big thresholds while other confounding policies also change at small cutoffs. Extending this approach, researchers could make use of a setting in which it is not at different thresholds but in different regions (such as German states) that the policy space differs; that is, if a given policy coincides with a nuisance policy in one region but the nuisance policy changes on its own in another region, then (under the assumption that the nuisance policy’s effects are additively 10 separable and comparable across regions) one could obtain consistent estimates by taking the difference in the estimates across regions. Extending the same logic even further, one can also consider disentangling treatment effects through the careful use of alternative outcomes. It is perhaps easiest to explain this approach using the example of Arnold and Freier (2015), who study how the rate of voter-initiated referendums responds to changes in the number of signatures required to initiate a referendum. Having found an effect at thresholds where signature requirements change, they argue that this effect is not due to other policies that change at the same thresholds by showing that there is no change at these same thresholds in the rate of council-initiated referendums, which do not require any signatures to be collected. The argument, therefore, is that because the bundle of policies does not seem to affect an outcome that we would expect to be affected by the nuisance treatments, those nuisance treatments likely do not affect the main outcome of interest either.9 More generally, one can imagine carrying out a difference-in-discontinuity design at a single threshold using multiple outcomes, where the differences are taken across the different outcomes: the key “parallel trends” assumption is that the effect of the full bundle of policies (A0 B 0 vs. AB) on outcome z provides a valid estimate of the effect of a reduced bundle of policies (B 0 vs B) on outcome y; thus under the additional assumption that the effects are additively separable, the difference between the two effects provides a valid estimate of the effect of A0 vs. A on outcome y. These are strong assumptions but they may be appropriate in some settings. Finally, when the treatment effects relate to several thresholds, researchers may also consider identifying the effects using variation in treatment intensity along with functional form assumptions. Consider the example of council size, which in many systems jumps by different amounts at different thresholds, often along with changes in mayor salary or other policies. In this situation it may be possible to use variation in treatment intensity (i.e. the number of seats added at each threshold) to identify the effect of a marginal council member. The crucial assumption is that the effect of additional seats is linear in (or otherwise predictably related to) the size of the jump, and (again) that this effect is additively separable from other effects 9 One could argue, of course, that in equilibrium a change in the cost of voter-initiated referendums would change the rate council-initiated referendums. 11 that occur at the same thresholds. Part of the identification in Lyytik¨ainen and Tukiainen (2013) is based on utilizing the different steps in the council size changes to separate the pure council size policy from increased party lists in the turnout equation. IV. Sorting As mentioned above, the appeal of a population threshold-based RDD is partly that the political unit does not choose the policy, which suggests that (controlling for population) units above and below the threshold should be comparable in all respects other than the policy of interest. As is well known, such an RDD (like any RDD) is less appealing when the units can influence the variable that determines treatment assignment (i.e. population). At an extreme, one could imagine that cities near a population threshold could perfectly control whether they end up above or below the threshold, and thus cities that have policy A differ from cities that have policy A0 not just in that policy but also in a whole host of background characteristics that affected their decision to get policy A or policy A0 .10 In such a situation, carrying out an RDD may be little better than a typical observational study in which political units choose their policies. (Below, we compare the attractiveness of an RDD marred by sorting with that of a typical observational study.) The problems of strategic sorting in RDD applications are well known (see Imbens and Lemieux (2008); Lee and Lemieux (2009); Urquiola and Verhoogen (2009); Barreca et al. (2011)). Strategic sorting in population figures has been documented in the work by Litschig 2012 for Brazil and it has been briefly mentioned in the paper by Gagliarducci and Nannicini 2013. One of our contributions here is to provide evidence that sorting in RDD studies based on population thresholds is an issue in all three countries that we study. We also demonstrate techniques for diagnosing and explaining manipulation in the following two sections as well as in a technical appendix. 10 Alternatively, it may be that only certain cities are able to control whether they end up above or below the threshold, in which case cities that have policy A may differ from cities that have policy A0 not only in the factors that affect their policy preferences but also in the factors that affect their ability to manipulate their population figures. 12 Figure 2: Sorting in municipal population in France, 1962-2011 pooled Estimated density, all thresholds pooled 0.0015 Threshold 1,500+ 0 100 0.0020 Estimated density 300 Threshold = 500, 1,000 200 Count 400 500 Threshold = 100 0.0025 0.0030 600 Histograms (bin width = 1) −200 −100 0 100 200 −200 Distance from threshold in population −100 0 100 200 Distance from threshold in population Note: The left plot depicts three histograms, one for each group of thresholds (100; 500 & 1,000; 1,500 and larger). In each case the bin width is 1, meaning that the top of the line indicates the number of data points (municipality-years) with population that is exactly a given amount (e.g. 50 inhabitants) from the threshold. The right plot depicts the McCrary analysis for all cases pooled. A. Aggregate graphical evidence The basic pattern of sorting is documented in Figures 2 (France), 3 (Italy), and 4 (Germany). Because the figures use the same format and reflect the same analysis, we explain the French case in detail and subsequently note only the relevant differences between the French case and the others. In France, we have population data for eights censuses between 1962 and 2011.11 For each census, we calculate the difference in population between each city and each major population threshold (i.e. one affecting a policy listed in Table 1) that was in force at the time of the census; we store all municipality-years in which a city’s population was within 250 inhabitants 11 The census years are 1962, 1968, 1975, 1982, 1990, 1999, 2006, and 2011. After 1999, France introduced a new census system that produces annual population estimates for all municipalities; the 2006 census was the first such census. 13 of a threshold. In the left panel of Figure 2 we plot three histograms of these population differences, one for each group of relevant population thresholds (100; 500 or 1,000; and 1,500 and larger). Because there are so many municipality-years, we plot histograms with bin widths of 1, meaning that Figure 2 indicates the number of cases in which a municipality was exactly e.g. 25 inhabitants below a threshold. The key evidence of sorting is given by the jumps in each histogram at 0. For example, based on the histogram for the 100-inhabitant threshold, we can see that there were just under 500 cases in which a city was two or three inhabitants short of the 100-inhabitant threshold at which the council size increases, but there were just under 600 cases in which a city narrowly cleared that hurdle. The jump is if anything more striking for the 500- and 1,000-inhabitant thresholds (where the mayor’s salary increases), with about 300 cases of a municipality recording exactly 500 people and only about 150 cases of a municipality recording exactly 499. In the right panel of Figure 2 we depict the McCrary test for all thresholds pooled. This procedure estimates the density of the running variable (i.e. absolute distance in inhabitants to a population threshold) separately on the left and right of the threshold and tests for a jump or drop in the density at the threshold. Not surprisingly (given the histograms in the left panel), the McCrary test indicates a large jump in the estimated density at the threshold. Note that the shape of the estimated density reflects the aggregation of data from various thresholds, including the 100-inhabitant threshold around which the density of municipalities is the largest. In the technical appendix, we explain in detail what issues researchers can face when pooling different thresholds, why we use absolute (and not relative) deviation as a score variable and further issues in correctly implementing the McCrary test when the forcing variable is discrete (see Technical Appendix). Figure 3 indicates an even more striking pattern for Italy. Based on the five decennial censuses from 1961 to 2001, we find about 90 cases in which a city cleared the 1,000 or 3,000 population threshold (at which the mayor’s wage increases, among other changes) by fewer than 10 inhabitants, but we find only about 20 cases in which a city fell short by fewer than 10 inhabitants; in over 300 cases a city cleared one of the thresholds by fewer than 30 inhabitants, but in fewer than 100 cases did a city fall short by fewer than 30. The pattern of sorting is 14 Figure 3: Sorting in municipal population in Italy, 1961-2001 pooled Estimated density, all thresholds pooled 0.0020 Estimated density 60 80 Small thresholds (1000, 2000, 3000) 40 Count 0.0030 100 120 Histograms (bin width = 10) 0.0010 20 Medium thresholds (4000, 5000) 0 Large thresholds (> 5000) −200 −100 0 100 200 −200 Distance from threshold in population −100 0 100 200 Distance from threshold in population Note: In the left plot the bin width is 10, meaning that the top of the line indicates the number of data points (municipality-years) with population that is in a given interval (e.g. 40-49 inhabitants) from the threshold. Otherwise see notes to Figure 2. just as clear (if not as dramatic) at larger thresholds. Again, the McCrary test aggregating all thresholds (right panel) indicates a very large jump in the estimated density at the threshold. In the aggregate, there is not much graphical evidence of sorting in German municipalities (Figure 4). Here we have annual administrative data from 1998-2007 for municipalities from all German states, and our analysis is based on a comparison of each municipality’s population to all thresholds in force in that municipality’s state. Neither the histogram nor the McCrary plot shows much evidence of a jump in the density above the threshold when we aggregate all thresholds and all years. As we will see in the next section, there is nonetheless significant evidence of sorting when we focus on thresholds where more substantial policies are determined. 15 Figure 4: Sorting in municipal population in German states, 1998-2007 pooled 1400 Histograms (bin width = 10) Estimated density, all thresholds pooled 200 0.0020 0 0.0016 Large thresholds (5000+) 0.0018 Estimated density 600 800 Medium thresholds (1000−4000) 400 Count 1000 0.0022 1200 0.0024 Small thresholds (< 1000) −200 −100 0 100 200 −200 Distance from threshold in population −100 0 100 200 Distance from threshold in population Note: In the left plot the bin width is 10, meaning that the top of the line indicates the number of data points (municipality-years) with population that is in a given interval (e.g. 40-49 inhabitants) from the threshold. Otherwise see notes to Figure 2. B. Formal tests at different types of thresholds We now carry out the McCrary (2008) test for different types of thresholds within countries, still pooling population figures from the various censuses we have collected. Table 2 reports the results. In the top row we assess evidence of sorting in all thresholds, reporting the point estimate and standard error (along with the number of thresholds and observations,12 as explained in the table note) for each test. This test corresponds to the hypothesis that the discontinuity in the right panel of Figures 2, 3, and 4 is zero. As seen in the previous figures, we see very clear evidence of substantial sorting in France and Italy (with the latter being quite a bit larger); the test also indicates small, yet significant evidence of sorting in Germany. In the other rows we assess sorting at particular types of thresholds, e.g. thresholds where the salary 12 We count only thresholds for which we observe cities within 250 inhabitants of the threshold, which explains why some of the counts differ from the analysis above. 16 of the mayor increases, or thresholds where the council size increases, or thresholds where both increase. In France, we find significant sorting at all types of thresholds, with the smallest effect (and weakest evidence against the null) at thresholds where council size increases (but not mayor’s wage) and thresholds at 3,500 inhabitants or higher. In Italy the estimated effects are much larger, with (as in France) smaller effects at larger population thresholds. To give a sense of magnitude, a McCrary effect size of 1.3 implies that the density on the right of the average threshold is over 3.5 times larger than on the left.13 In Germany the effects are much smaller in magnitude and only statistically significant at some thresholds. Notably, the largest effects are generally found when both council size and salary of the political personnel are changing at the same time. Even for Germany, where the overall effects are small, the sorting effect increases to 0.15, which is no longer negligible. Comparing the effects by threshold size shows larger effects for smaller thresholds in Italy and France. This is consistent with the idea that population size is more easily manipulated in smaller towns. Intriguingly, in Germany the pattern is reversed, with somewhat larger effects at larger thresholds, which may be partly explained by the fact that the salary of mayors in Germany often only increases at larger population thresholds. At the bottom of Table 2 we conduct the McCrary tests at thresholds at which no policy changes as far as we are aware. We generated placebo thresholds by taking the midpoint between each actual threshold in each setting (e.g. in France the smallest placebo threshold is 300, which is halfway between 100 and 500) and adding an arbitrary number (117 was picked). In none of the countries do we find discontinuities in the density at these placebo thresholds. 13 log(d1 ) − log(d0 ) = 1.3 implies d1 /d0 = exp(1.3) ≈ 3.7. 17 Table 2: Summary of McCrary sorting tests Sample France Italy Germany # of thresholds (# of close obs) McCrary Test statistic # of thresholds (# of close obs) McCrary Test statistic # of thresholds (# of close obs) McCrary Test statistic 21 (311,392) 0.238*** (0.014) 7 (4,756) 1.328*** (0.136) 195 (101,520) 0.068*** (0.025) 14 (140,421) 7 (35,329) 15 (267,558) 12 (162,466) 21 (302,887) 7 (105,092) 0.497*** (0.026) 0.533*** (0.049) 0.215*** (0.015) 0.139*** (0.018) 0.240*** (0.014) 0.475*** (0.029) 6 (4,730) 3 (2,125) 3 (2,605) 0 (0) 6 (4,730) 3 (2,605) 1.331*** (0.134) 0.840*** (0.211) 1.909*** (0.197) n.a. n.a. 1.331*** (0.134) 1.909*** (0.197) 78 (11,579) 21 (447) 120 (81,669) 63 (70,537) 141 (82,116) 57 (11,132) 0.135*** (0.061) 0.001 (0.321) 0.071*** (0.026) 0.063*** (0.029) 0.072*** (0.027) 0.149*** (0.063) 7 (306,520) 14 (4,872) 0.237*** (0.014) 0.239* (0.122) 2 (3,295) 5 (1,461) 1.644*** (0.178) 0.700*** (0.247) 61 (93,873) 134 (7,647) 0.054*** (0.026) 0.216*** (0.077) 20 (215,986) 0.008 (0.018) 9 (2,800) -0.044 (0.133) 186 (85,326) -0.009 (0.025) Total All years, all thresholds Specific thresholds Salary increase Salary increase (no council) Council increase 18 Council increase (no salary) Council and/or salary increase Council and salary increase Threshold size Small thresholds (<3500) Big thresholds (>=3500) Placebo thresholds Notes For each test, we report four numbers: the number of unique population thresholds (e.g. 14 in the first test for France) at which we observe municipalities with populations within 250 inhabitants of the threshold; the number of observations within 250 inhabitants of these thresholds (e.g. 273,274); the estimated difference log frequency above vs. below the threshold (e.g. 0.256) and the standard error of that estimate (0.015). V. Sorting and covariate imbalance: the case of Italy As McCrary (2008) pointed out, a discontinuity in the density of the running variable at the relevant threshold is neither a necessary condition nor a sufficient condition to create bias in RDD analysis. The key assumption of RDD, after all, is the continuity of potential outcomes at the threshold. This assumption may hold even in the presence of substantial sorting (if e.g. units’ ability to manipulate their population is random, and the resulting measurement error in the running variable does not bias the results), and it may not hold even in the absence of sorting (if e.g. there is manipulation in both directions, such that the density remains continuous). Thus while discontinuities of the kind we documented in the previous section should certainly raise red flags about the validity of the RDD, the most direct test of the key RDD assumption is to check for discontinuities in pre-treatment covariates. Locating imbalanced covariates is also important because these covariates can be included in the RDD to guard against bias caused by strategic sorting, as we document below. These covariates may also help us to explain how manipulation is taking place, which could be viewed as an interesting research question in its own right. We focus on Italy mainly because it showed by far the largest extent of aggregate sorting. Figure 5 shows RDD analysis in which the dependent variable is the lagged treatment, meaning an indicator for whether a municipality was above a given threshold in the previous census, given that it was close to that threshold in the current census. The top two plots show this analysis for thresholds at which the salary of the mayor increases. The top left panel shows that the probability of lagged treatment increases with the current running variable (as one might expect) but jumps at the threshold, indicating that cities that are narrowly above the threshold now are more that 10 percentage points likely to have been above the threshold in the previous census. The top right panel shows how the estimated jump varies with the (triangular) bandwidth employed for the local linear regression; the black dot shows the bandwidth suggested by the Imbens-Kalyanaraman algorithm (and employed in the figure at left). The analysis thus indicates that, for this covariate at least, differences between treated and untreated cities do not disappear when we focus on the threshold. Although random variation in the population from one census to the next should produce continuity in 19 this covariate across the threshold, we do not find this in the Italian case. Note that this persistence in treatment status could simply be a result of population figures being very sticky from census to the next: we would expect to see differences in whether cities were above or below any arbitrary population number if a substantial proportion of cities simply did not change in population from one census to the next. The bottom two plots of Figure 5 indicate that the imbalance in lagged treatment status we see at thresholds where salary changes is not simply due to stickiness. In those plots we carry out the same analysis where the thresholds we consider are “placebos” at which neither salary nor any other policy changes, and we find no evidence of similar persistence around these thresholds. Table 3 takes this analysis further. Each of the point estimates in this table is an RDDbased estimate of the effect of crossing population thresholds on the covariate given at left; for example, the estimated effect of crossing thresholds where salary changes on lagged treatment (examined graphically in Figure 5 is reported as 0.138 with a standard error of .037. Columns 2 to 5 carry out the same analysis but include covariates in the RDD analysis: in column 2 we include dummies for each year of the census; in column 3 we include dummies for each threshold; in column 4 we include both; and in column 5 we include a set of covariates describing Italian municipalities around the year 2002.14 Columns 6 to 8 show the models from columns 1, 3, and 5 but focusing on “placebo” thresholds where no policy changes. It should be clear that there is no reason why crossing a placebo threshold should affect the probability that a city is in the South or any of the other outcomes we consider in Figure 5; to the extent that these variables describe background features of the municipalities rather than outcomes that could be affected by the mayor’s salary or any other policy choices, there is also no reason why crossing the threshold where salary changes should produce any effects. We have already seen (in Figure ??) that crossing salary thresholds seems to “affect” the probability of treatment in the previous census (but that the same is not true for placebo thresholds). In the top row of Table 3 we see that the imbalance at salary thresholds persists when we control for the year of the census and the threshold being considered, though the 14 The covariates are the (log) number of nonprofits per person, the proportion of inhabitants who give blood, the ratio of young to old inhabitants, an indicator for the South, an indicator for whether the municipality is on the seaside, and the proportion of second homes in the municipality. 20 imbalance mostly disappears when we include municipal covariates in column 5. (The null effect at placebo thresholds is robust to the inclusion of covariates.) The second row indicates that we do not find a similar effect for the lagged running variable. In the third and fourth rows of Table 3 we see evidence of imbalance in whether the council size changes at the threshold as well as in the year of the census being considered. To understand these findings, note that our analysis pools cases across thresholds and years; thus near the threshold we have cases where one city was just below a threshold where salary changed in 1971 and another city was just above a threshold where both salary and council size changed in 2001. In an RDD environment without fine manipulation of the running variable, the distribution of characteristics just above the threshold and just below the threshold should be identical. How could the observed imbalance arise? In the case of the threshold type (council size change or not), we think the reason is that sorting is more pronounced at thresholds where both salary and council size change (as seen above); this means that when we pool cases that were near different thresholds, we find a greater concentration of those near council size thresholds above than below. Similarly, the reason why the year of the census is unbalanced is that sorting was worse in earlier censuses (as we have confirmed in separate analyses). 21 Figure 5: Imbalance in lagged treatment in Italy BW sensitivity: Lagged treatment (Salary thresholds) 1506 2783 0.4 0.2 −0.2 0.0 Estimated effect 0.8 0.6 0.4 0.0 −0.4 0.2 −100 0 100 200 0 50 100 150 Bandwidth of local linear regression RD Plot: Lagged treatment (Placebo thresholds) BW sensitivity: Lagged treatment (Placebo thresholds) 151 Obs: 1000 1817 0.0 0.2 0.0 −0.6 −0.4 0.2 0.4 0.6 Estimated effect 0.8 0.4 0.6 1.0 Distance to threshold −0.2 Lagged treatment −200 Lagged treatment 173 Obs: 0.6 0.8 1.0 RD Plot: Lagged treatment (Salary thresholds) −200 −100 0 100 200 0 Distance to threshold 50 100 150 Bandwidth of local linear regression Note: The dependent variable in the RDD analysis above is “lagged treatment” – an indicator for whether a municipality was above a given threshold in the previous census, given that it was close to that threshold in the current census. The left panel in each pair shows the dependent variable in binned means of the running variable (gray dots) and the local linear regression estimate at the Imbens-Kalyanaraman optimal triangular bandwidth; the right panel shows the sensitivity of the estimated effect to the bandwidth, where the optimal is shown with a black dot. 22 Table 3: “Effects” of crossing threshold on covariates (Italy) Obs [BW] 2592 [132] (1) .138∗∗∗ (.037) Lagged running variable 2522 [128] 80.115 (63.630) 55.194 (63.642) Both salary and council size change 1590 [81.7] .268∗∗∗ (.049) .207∗∗∗ (.046) Year of census 3056 [158.8] -7.091∗∗∗ (1.045) Population at threshold (in 1000s) 1678 [86.4] .147 (.401) -.091 (.396) Log nonprofits/person 1647 [84.5] .001 (.057) .011 (.058) .026 (.058) .023 (.058) -.000 (.058) .004 (.045) .002 (.045) -.052 (.045) Proportion donating blood 1570 [80.5] -.003 (.002) -.003 (.002) -.003 (.002) -.003 (.002) -.004∗ (.002) -.001 (.002) -.001 (.002) -.001 (.002) Log young/old ratio 1815 [93.8] .045 (.056) -.089∗ (.040) -.045 (.047) -.089∗ (.040) -.086∗ (.039) .027 (.053) -.007 (.045) -.035 (.034) South 1777 [91.3] .037 (.048) .037 (.048) .036 (.048) .033 (.048) -.023 (.040) -.012 (.050) -.010 (.050) -.031 (.041) Seaside 2077 [106.9] -.026 (.023) -.033 (.023) -.031 (.023) -.032 (.023) -.033 (.024) -.012 (.020) -.017 (.020) -.044∗ (.020) Proportion vacation homes 1897 [97.8] .025 (.022) .043† (.022) .033 (.022) .037† (.022) .037† (.022) .032 (.022) .040† (.021) .021 (.020) X X X X X X X X X X Lagged treatment 23 Year dummies: Threshold dummies: Other municipal characteristics: Thresholds where salary changes (2) (3) (4) ∗∗ ∗∗∗ .113 .128 .108∗∗ (.037) (.037) (.037) 84.583 (58.183) 63.567 (57.660) (5) .041 (.036) 27.862 (53.600) -3.172∗∗∗ (.889) “Thresholds” w. no changes (6) (7) (8) .024 .028 -.055 (.041) (.041) (.040) 95.758 (97.363) 124.452 (79.532) -2.871† (1.620) -1.988 (1.386) 1.985 (72.847) .346 (.532) X Notes: Each point estimate comes from a different RDD analysis in which the row variable is the dependent variable and we pool data from multiple censuses and population thresholds. Model (1) includes no extra control variables; Model (2) includes a dummy for threshold type (e.g. mayor’s salary, council size, both); Model (3) includes a dummy for each threshold (e.g. 1000, 2000); Model (4) includes controls for municipality characteristics (e.g. South, blood donations) other than the dependent variable; Model (5) includes threshold dummies and municipal characteristics. To reiterate, these two imbalances – in whether the council size changes and in the year of the census – arise because we are pooling data from multiple thresholds and multiple censuses. It should be clear that neither of these particular problems would arise if we focused on a single threshold in a single year (although imbalance in other covariates might). Pooling across thresholds and/or time is often an attractive strategy in RDD studies based on population thresholds, and our analysis shows that when sorting is a possibility, covariate imbalances can arise in pooled data because of differences in the rate of sorting across thresholds and/or time. Note also that these imbalances could cause serious biases in analysis if e.g. outcomes vary across different types thresholds or across years. Suppose for example that the outcome one is studying is trending upward over time, and that sorting at a given threshold is also getting worse over time. Then pooled RDD analysis at that threshold will show an apparent positive effect even if the true effect is zero, simply because more recent years (and thus higher outcomes) will be over-represented on the right side of the threshold. The clear implication is that studies that pool data across thresholds and years should control for the threshold and year whenever sorting seems like a possibility. The rest of Table 3 reports similar analysis for a set of covariates we selected because we thought they might explain the aggregate sorting in Italy. For example, we speculated that measures of social capital might be imbalanced at thresholds because places with lower social capital manage to manipulate census numbers (perhaps through active housing policy) more effectively; we therefore tested for imbalance in the number of nonprofit organizations per person and the rate of blood donations. We find no imbalance in these covariates in the raw RDD; when we control for other municipal characteristics we do (perhaps oddly) find some imbalance. We do find some imbalance in the proportion of young to old citizens when we control from the year of the census: the analysis indicates that cities just to the right of a threshold currently have a somewhat older population than cities just to the left of the threshold; this difference is robust to the additional inclusion of threshold dummies and municipal covariates. We find no imbalance in an indicator for whether the city is in the south of Italy. Finally, we checked for imbalances in an indicator for whether the city is located by the sea and in the proportion of second homes (partly because we thought that one way to 24 increase the official number of residents would be to encourage some part-time residents to be registered in the municipality); we find no imbalances in the seaside proportion, but once we control for the year we find borderline significant (p < .1) imbalance in the proportion of second homes, consistent with our speculation.15 What can we conclude from this evidence on covariate imbalance at salary thresholds in Italy? The optimistic conclusion is that researchers can productively conduct RDD in Italy using multiple thresholds and/or multiple censuses as long as they include appropriate controls, which based on this analysis would include an indicator for lagged treatment, indicators for the year and the threshold, and controls for the age structure of the population and the proportion of second homes. The pessimistic conclusion is that there are many covariates we have not tested (and many that are unobservable and thus untestable), and thus RDD analysis is likely to be biased even after controlling for the set of covariates we have tested here. Clearly, strategic sorting undermines the appeal of the RDD, and one would prefer RDD without sorting to RDD with sorting (no matter how many covariates one can include). The practical research design question, in our view, is whether to prefer an RDD to a pure observational study in which the units freely choose their treatments. We consider this question in the next section. VI. Discussion: when is an RDD with sorting still better than the alternative? Suppose one has the opportunity to use RDD to study the effect of a municipal policy that depends on a population threshold, but there is evidence of strategic sorting; alternatively, one can study the same policy in another setting where there is policy variation but cities choose their policies freely. Which should one choose? We offer the following thought experiment as a way to clarify some of the subtle tradeoffs involved in the comparison between a flawed RDD and a pure observational study. Suppose we wish to study the effect of policy A vs. A0 ; we have the choice of examining outcomes in one 15 An important caveat is that these covariates are measured at one point in time: the analysis tells us that cities that have more nonprofits per capita now are not over-represented on the right side of salary thresholds on average over the censuses in our dataset. If we had measures of the this and other covariates across time, we would expect to find imbalance for the same reason we find imbalance in the year of the census, which is that the sorting also varied over time. 25 setting where the policy is assigned to cities based on a population threshold but there is some strategic sorting or in another setting where the policy is freely chosen by the cities themselves. To make the comparison as simple as possible, we assume that we would be studying cities around the same size in either setting. In the setting where the policy depends on a threshold (the “RDD setting”), cities get policy A unless their reported population is c or higher (in which case they get policy A0 ), but a proportion p of the cities are able to add or subtract up to m inhabitants at a small cost; thus for p > 0 we expect to see strategic sorting in the (reported) population interval [c − m, c + m), as some cities with true populations above the threshold will have reported a population below the threshold and vice versa. In the other setting (the “pure observational setting”), cities simply choose their own policy, and choices are diverse such that in the population interval [c − m, c + m) we see both A and A0 . The question is, under what conditions do we prefer the estimate from the RDD setting (where for p > 0 we have strategic sorting) to the estimate from the pure observational setting? We begin by making the most general argument for RDD that we can. Suppose that we prefer the RDD estimate when p = 0 to the pure observational estimate, perhaps because there are unobserved variables in the assignment of treatment in the pure observational case. If we assume in addition that the potential outcomes are unrelated to population in both the RDD and the pure observational case, then we can state that the RDD is always preferred to the pure observational study, regardless of the amount of strategic sorting in the RDD case. To see this, note that (given the independence of potential outcomes and population) the optimal estimator in the RDD case when p = 0 (i.e. there is no sorting) is simply the difference in means between units above and below the threshold; the optimal estimator in the RDD case when p = 1 and no city is indifferent between policies (i.e. sorting is complete) is the optimal estimator in the pure observational study. If the optimal estimator in the pure observational study is simply the difference in means between the two treatment groups, then in the RDD case the difference in means will yield a weighted average of the RDD estimate and the pure observational estimate, with weights 1−p and p respectively. Because we have assumed that the RDD estimate is preferred to the pure observational estimate, we can conclude that the RDD setting is strictly preferred to the pure observational setting for all p < 1. Put simply, under 26 the assumptions we have made above, the worst case for RDD analysis with strategic sorting is the estimate we would get in the corresponding pure observational study. This suggests that RDD analysis may be preferred to a pure observational study even in the presence of substantial sorting – assuming that the researcher accounts for sorting as well as she would account for treatment assignment in the pure observational study. When we deviate from the assumptions employed in the previous paragraph, however, the case for RDD in the presence of sorting becomes less clear. The crucial assumption is the independence of potential outcomes from population. Even if we maintain the assumption that the clean RDD estimate is preferred to the pure observational estimate, the dependence of potential outcomes on population could introduce problems such that there is a level of p above which the pure observational setting would yield better answers. To see this, note that the pure observational setting has one clear advantage over the RDD setting with substantial sorting: in the pure observational setting we observe the true population. The problem with the RDD setting when p is high is that, while we can control for other covariates that affect treatment assignment in the pure observational study, we cannot control for true population; as a result, RDD estimates with high p may be inferior to the pure observational estimates due to what we may think of as either omitted variable bias or non-random measurement error. What does this analysis suggest about how researchers might choose between a problematic RDD analysis and a pure observational study? We note three considerations: • Clearly, if the pure observational study would be preferred to the RDD analysis even in the absence of sorting (e.g. because it is less local or has lower variance) then there is even more reason to prefer the observational study when there is sorting. • If the RDD would be preferred in the absence of sorting and the researcher is equally able to observe covariates that determine treatment in the two settings, then the RDD setting may be preferable to the pure observational setting even in the presence of substantial sorting. • The argument of the previous point weakens as the dependence of potential outcomes on the (unmanipulated) running variable gets stronger and the bias from measurement 27 error in the problematic RDD becomes more severe. VII. Conclusion We have documented two major challenges that require careful attention when using population thresholds in regression discontinuity designs: the same population cutoff is often used to determine several policies (making identification and interpretation challenging), and in some settings it appears that municipalities with populations near thresholds manage to manipulate their populations to achieve desirable policies. We find evidence of both problems in France, Italy, and Germany, three countries that have used such thresholds since the 19th century or before. We expect these problems to appear in the many other countries in which similar population-based policies are found. Although both problems diminish the appeal of the RDD as a means of learning about the effect of policies, we emphasize that the RDD may still be an attractive approach even when both problems are present. We highlight a number of approaches for disentangling the effects of policies that change at some of the same thresholds. We examined covariate imbalance in Italy and discussed the importance of controlling for factors that are associated with sorting, which may include the type of threshold or the vintage of the census in analysis that pools data from several thresholds. Finally, we presented assumptions under which an RDD with sorting is always preferable to a pure observational study, and we discussed deviations from those assumptions that could lead to the opposite conclusion. Our work leaves open important questions for future study. One such question is how the strategic sorting that we document could be accomplished. In the French case, where the municipality has always played a role in the collection of census data (e.g. recruiting and training the staff that carries out the census), one can easily imagine how local officials could encourage more effort to identify local inhabitants when it is known that the municipality would benefit from doing so. (Consistent with this, sorting has declined since 1999 as the central government has taken a more active supervisory and training role at the local level.) In Italy, where the census is carried out in a much more centralized way, it is less clear. Our analysis suggests a kind of “ratchet effect” at salary thresholds (producing persistence in 28 treatment over time); it also indicates that sorting was more severe in the past and (controlling for the vintage of the census) municipalities with older populations and more second homes are more effective at strategic sorting. It remains to explore the mechanisms by which this was achieved. While our focus has been on how researchers can use population thresholds to learn about the effects of political institutions and municipal policies, it may be enlightening to study further the incentives these thresholds induce and what determines the local response to these incentives. 29 References Arnold, Felix and Ronny Freier. 2015. “Signature requirements and citizen initiatives: Quasiexperimental evidence from Germany.” Public Choice 162(1):43–56. Asatryan, Zareh, Thushyanthan Baskaran and Friedrich Heinemann. 2014. The effect of direct democracy on the level and structure of local taxes. Technical report ZEW Discussion Papers. Asatryan, Zareh, Thushyanthan Baskaran, Theocharis Grigoriadis and Friederich Heinemann. 2013. “Direct Democracy and Local Public finances Under Cooperative Federalism.” ZEW Working Paper 13-038 . Baques, Manuel and Pamela Campa. 2012. “Gender Quotas, Female Politicians, and Public Expenditures: Quasi Experimental Evidence.” Working Paper . Barone, Guglielmo and Guido De Blasio. 2013. “Electoral rules and voter turnout.” International Review of Law and Economics 36:25–35. Barreca, Alan I, Melanie Guldi, Jason M Lindo and Glen R Waddell. 2011. “Saving Babies? Revisiting the effect of very low birth weight classification.” The Quarterly Journal of Economics 126(4):2117–2123. Baskaran, Thushyanthan. 2012. “The flypaper effect: evidence from a natural experiment in Hesse.”. Baskaran, Thushyanthan and Mariana Lopes da Fonseca. 2014. “Electoral competition and endogenous political institutions: quasi-experimental evidence from Germany.” Working Paper Econstor . Bordignon, Massimo, Tommaso Nannicini and Guido Tabellini. 2013. “Moderating Political Extremism: Single Round vs. Runoff Elections under Plurality Rule.” WP . Brollo, Fernanda and Tommaso Nannicini. 2012. “Tying Your Enemy’s Hands in Close Races: The Politics of Federal Transfers in Brazil.” American Political Science Review 106:742–761. Casas-Arce, Pablo and Albert Saiz. 2015. “Women and power: unpopular, unwilling, or held back?” Journal of Political Economy, forthcoming . Caughey, Devin and Jasjeet S Sekhon. 2011. “Elections and the regression discontinuity design: Lessons from close us house races, 1942–2008.” Political Analysis 19(4):385–408. De Benedetto, Marco Alberto and Maria De Paola. 2014. Candidates’ Quality and Electoral Participation: Evidence from Italian Municipal Elections. Technical report IZA Discussion Paper. Egger, Peter and Marko Koethenbuerger. 2010. “Government spending and legislative organization: Quasi-experimental evidence from Germany.” American Economic Journal: Applied Economics pp. 200–212. 30 Eggers, Andrew. 2015. “Proportionality and Turnout: Evidence from French Municipalities.” Comparative Political Studies, forthcoming 48(2):135–167. Eggers, Andrew C, Anthony Fowler, Jens Hainmueller, Andrew B Hall and James M Snyder. 2015. “On the validity of the regression discontinuity design for estimating electoral effects: New evidence from over 40,000 close races.” American Journal of Political Science 59(1):259– 274. Ferraz, Claudio and Frederico Finan. 2009. Motivating politicians: The impacts of monetary incentives on quality and performance. Technical report National Bureau of Economic Research. Fujiwara, Thomas. 2011. “A regression Discontinuity Test of Strategic Voting and Duverger’s Law.” Quarterly Journal of Political Science 6(3-4):197–233. Gagliarducci, Stefano and Tommaso Nannicini. 2013. “Do better paid politicians perform better? Disentangling incentives from selection.” Journal of the European Economic Association 11(2):369–398. Grembi, Veronica, Tommaso Nannicini and Ugo Troiano. 2012. “Policy Responses to Fiscal Restraints: A Difference-in-Discontinuities Design.” IZA Working Paper n.6952 . Gulino, Giorgio. 2014. “Do Electoral Systems Affect the Incumbent Probability of Re-election? Evidence from Italian Municipalities.” Working Paper presented at the EEA/ESEM . Hinnerich, Bj¨orn Tyrefors and Per Pettersson-Lidbom. 2014. “Democracy, Redistribution, and Political Participation: Evidence From Sweden 1919–1938.” Econometrica 82(3):961–993. Hirota, Haruaki and Hideo Yunoue. 2012. “Local Government Expenditures and Council Size: Quasi Experimental Evidence from Japan.”. Hirota, Haruaki and Hideo Yunoue. 2013. “Does Local Council Size Affect Land Development Expenditures? Quasi Experimental Evidence from Japanese Municipal Data.”. Hopkins, Daniel J. 2011. “Translating into Votes: The Electoral Impact of Spanish-Language Ballots.” American Journal of Political Science 55(4):814–830. Imbens, Guido W and Thomas Lemieux. 2008. “Regression discontinuity designs: A guide to practice.” Journal of Econometrics 142(2):615–635. Koethenbuerger, Marko. 2012. “Do Political Parties Curb Pork-Barrel Spending? Municipality-Level Evidence from Germany.” CESifo Discussion Paper . Lee, David S. and Thomas Lemieux. 2009. Regression Discontinuity Designs in Economics. Working Paper 14723 National Bureau of Economic Research. Litschig, Stephan. 2012. “Are Rules-based Government Programs Shielded from SpecialInterest Politics? Evidence from Revenue-Sharing Transfers in Brazil.” Journal of Public Economics 96:1047–1060. 31 Litschig, Stephan and Kevin M. Morrison. 2013. “The Impact of Intergovernmental Transfers on Education Outcomes and Poverty Reduction.” American Economic Journal: Applied Economics 5(4):206–240. Litschig, Stephan and Kevin Morrison. 2010. “Government spending and re-election: Quasiexperimental evidence from Brazilian municipalities.” UPF Discussion Paper . Lyytik¨ainen, Teemu and Janne Tukiainen. 2013. “Voters are Rational.” Government Institute for Economic Research Working Papers (50). McCrary, Justin. 2008. “Manipulation of the running variable in the regression discontinuity design: A density test.” Journal of Econometrics 142(2):698–714. Mukherjee, Mukta. 2011. “Do Better Roads Increase School Enrollment? Evidence from a Unique Road Policy in India.” Syracuse University . Pellicer, Miquel and Eva Wegner. 2013. “Electoral Rules and Clientelistic Parties: A Regression Discontinuity Approach.” Quarterly Journal of Political Science 8(4):339–371. Pettersson-Lidbom, Per. 2006. “Does the Size of the Legislature Affect the Size of Government? Evidence from Two Natural Experiments.” Stockholm University Working Paper . Pettersson-Lidbom, Per. 2012. “Does the Size of the Legislature Affect the Size of Government: Evidence from Two Natural Experiments.” Journal of Public Economics 98(3–4):269–278. Urquiola, Miguel and Eric Verhoogen. 2009. “Class-size caps, sorting, and the regressiondiscontinuity design.” The American Economic Review 99(1):179–215. van der Linde, Daan, Swantje Falcke, Ian Koetsier and Brigitte Unger. 2014. “Do Wages Affect Politicians’ Performance? A regression discontinuity approach for Dutch municipalities.” Utrecht University School of Economics Discussion Paper Paper 14-15. Weingast, Barry R, Kenneth A Shepsle and Christopher Johnsen. 1981. “The political economy of benefits and costs: A neoclassical approach to distributive politics.” The Journal of Political Economy pp. 642–664. 32 VIII. Technical appendix A. Two issues with discrete variables in the McCrary sorting test The McCrary test based on the work by McCrary (2008) has become a standard method to test for sorting in RDD settings. The test checks for manipulation of the RDD running variable by closely examining the distribution of this variable around the threshold. Importantly, the test has been designed for continuous variables. In this appendix, we illustrate two difficulties that researchers need to consider when applying this test to a setting where the underlying RDD running variable is discrete. Issue 1: Bin size selection with discrete variables The first issue relates to the selection of one of the key parameters in the test, i.e. bin size. The McCrary test proceeds in two steps. The bin size is important in the first step of the McCrary test which produces an undersmoothed histogram from the data. Given this bin size selection, the histogram is in the second step smoothed using a local linear regression (involving a selection of the bandwidth h). McCrary (2008) suggests the following bin size selection procedure: bb = 2b σ n−1/2 (A.1) where σ b is the sample standard deviation of the forcing variable. Naturally, bb is not confined to the values of the discrete variable distribution. Let us first consider how the bin size works in discrete distributions. Assume that we have a discrete distribution which takes the values -10 to 9 in increments of 1. The threshold is at 0 (which is also counted into treatment). Further assume that we observe 100 observations for each discrete value. Note that bin size, x, in the McCrary is defined in the following way: {..., [−2x, −x), [−x, 0), [0, x), [x, 2x), ...}. Now consider that we use a cut-off point exactly at 0 and choose a bin size of 1. That would give us 20 equal sized bins with 100 observations each. If we now vary the bin size to 0.5, we observe a crucial imbalance. To the right of the threshold (the treatment side) we first have a bin [0,0.5) with 100 observations, followed by an empty bin [0.5,1). On the left of the threshold, we observe the reverse. Here, the first bin [-0.5,0) is empty, and the second bin [-1,0.5) holds 100 observations. A bin size smaller than the discrete steps in the distribution, thus, creates empty bins which are distributed asymmetrically to the right and left of the threshold. Even for non-integer bin sizes larger than one you will tend to have more possible population values per bin to the right than to the left. Consider a bin size of 1.5 in the above example. The first bin to the right [0,1.5) now includes 200 observations while the second bin [1.5,3) holds 100 observations only. Again, the reverse is true for the left side. Here, the first bin [-1.5,0) includes only 100 obs, and the second bin [-3,1.5) includes 200 obs. When doing a McCrary on a discrete running variable, if you use a non-integer bin size there will be a 33 bias toward finding a jump in population. That is because for any bin size x and any k > 0 the number of integer values in [0, kx) is weakly larger than the number of integer values in [−kx, 0). To illustrate the problems in the estimation of the McCrary statistic, we simulated a data set consisting of 200,000 observations (perfectly) uniformly distributed to 500 discrete values [-250,249] in increments of one. We set the threshold to 0 (zero is included in the treatment). Given this setup, the data are constructed such that no sorting of any magnitude should be found. Implementing the McCrary Stata routine (using the cut-off point at 0 and leaving the bin size and bandwidth to be calculated by the algorithm), we obtain an estimate of 0.031 (0.011), a bin size of 0.65 and a bandwidth of 191.3 indicating a positive sorting result. In the following we manipulated the bin size. In the left panel of figure 6, we vary the bin size between 0.5 – 3. The figure shows that depending on the exact bin size the estimated McCrary statistic varies significantly and often signals a false positive sorting result. In panel 2 of figure 6, we illustrate a similar problem of discrete bins when the cut-off value in the McCrary test is manipulated. Note, that due to the discrete values there is no true cut-off point. Any value between (-1,0] could be said to be the cut-off point. In the graph we highlight that the estimated McCrary statistic can vary from significant negative to significant positive depending on the exact positioning of the cut-off value. In figure 7, we again highlight the problem that certain bin sizes create artifically jumps in the density of the running variable (by construction). In the left upper panel, we use a bin size of 0.5 which creates empty bins. With a bin size of 1 (upper right panel) all bins are of similar size. Increasing the bin size to 1.5 (lower left panel) again highlights that the choice of bin size creates distinct sets of bins with more or less observations in it. It is important to understand how the bin size interacts with the second important parameter in the test statistic, the bandwidth h. For the case of continuous variables, McCrary (2008) finds that the choice of bin size is unimportant as long as h is large compared to b (he suggests h/b > 10). Even for discrete variables, any asymmetric grouping in the bins will become less important with a larger bandwidth. However, in our simulations we found that while an increased bandwidth had a mitigating effect on the bias from choosing a particular bin size, a false positive test result could be obtained even with h/b > 100 or more. We see two solutions to this problem. First, one can restrict the bin size in the McCrary test to the set of integers. Guidance as to which multiple of the increment to chose can be obtained from the original McCrary test. For a bin size of below 1 in the original McCrary, the minimum bin size of 1 should be set. For larger bin sizes, the closest multiple of an increment value of the discrete running variable distribution should be used. A second solution works as follows. Redefine the breakpoint, c, in the McCrary away from zero to half the distance between 0 and the next integer in the discrete distribution (in our above example this would be -0.5). Now, define the bins to the right and left of this new breakpoint to be {..., [c − 2x, c − x), [c − x, c), (c, c + x], (c + x, c + 2x], ...}. This solves the asymmetry problem, does not throw out any data, and allows the McCrary process to pick the bin size and bandwidth with the original algorithm. 34 −.05 −.05 0 Estimated McCrary statistic .05 0 Estimated McCrary statistic .1 .05 Figure 6: Simulating different bin sizes and cut-off values .5 1 1.5 2 2.5 Selecting a bin size 3 −1 −.8 −.6 −.4 −.2 0 Position of the threshold in the McCrary code Note: In this figure, we manipulate the bin size (left panel) or the exact position of the cutoff (right panel) in the McCrary routine. The patterns signify that the choice of those parameters can be crucial in a discrete settinng. Source: Own calculations. 35 Figure 7: Issues with bin size and discrete values in McCrary tests Bin size = 1 0 0 .0005 .001 .0015 .002 .001 .002 .003 .004 Bin size = 0.5 −500 0 500 −400 −200 0 200 400 0 .001 .002 .003 Bin size = 1.5 −400 −200 0 200 400 Notes: The figure shows the graphical output of the McCrary routine. The focus is on the distinct sets of observations when there is a bin size of 0.5 (upper left panel) or a bin size of 1.5 (lower left panel). Only when there is a bin size of 1 do we see that all the simulated observations are correctly bined. Source: Own calculations. A third, but inferior solution is to drop the observations at 0 from the analysis. This, however, comes at the cost of losing (potentially crucial) information just around the threshold. Issue: Pooling different thresholds and using a log transformation A more specific problem arises when data are pooled over different thresholds and distance to the threshold is measured in a relative scale. Consider the case of population thresholds. Say, a policy changes at 10 different thresholds: 1,000, 2,000, ..., 10,000. Also assume that we have a universe of towns in the size of 500 to 10500 with exactly one town at each particular value (1 town of size 500, 1 of size 501,... 1 of size 10499, 1 of size 10500). Now, consider a pooling strategy with absolute deviations. For each town we measure the shortest absolute distance to the next threshold and stack the samples on top of each other. We end up with a distribution (similar to the case above) in which we have values between -500 to 500 and exactly 10 observations for each integer (including the zero). We are back to the problem described above, however, no additional issue arises. Let us now turn to the case of relative scaling. Say, we measure the distance to the next threshold in the form (z−c) c , where z is the size of the town and c is the closest threshold (a log transformation will give a similar representation). This type of relative transformation might be preferable to the absolute distance, as larger towns might indeed be comparable within a larger absolute margin. 36 .4 .2 0 0 5 Frequency 10 Estimated McCrary statistic .6 .8 Figure 8: Pooling different thresholds −.5 0 Distribution of the pooled data .5 0 .05 .1 .15 Different bin size .2 Note: This graph highlights the uneven distribution of an initially uniformly distributed universe of towns when pooling is done on a relative scale (left panel). Also, we show how the McCrary routine estimates in this flawed pooling exercise which turn out to give false positive test results when the bin size is very small (right panel). Source: Own calculations. Using this relative transformation results in a particular type of discrete distribution. Each town that has a size of exactly a threshold value will get the relative distance of 0. There are 10 such towns. Now, the town that has 1001 will be at 1/1000, so will be the towns at 2002, 3003, 4004,..., 10010. Thus, there are also 10 observations at this discrete value in the distribution. However, the town at 2001 will be at the 1/2000 point in the distribution. Here, there are only towns at 4002, 6003, 8004 and 10005. Hence, there are only 5 towns in this discrete bin. In the extreme, consider the town at 10001 which stands alone at the 1/10000 bin. Similarly, towns at 9001, 8001, 7001, and 6001 stand alone at their particular distribution value. The same pattern occurs on the other side of the thresholds (here, the observations at 9999, 8999, 7999, 6999, 5999 stand by themselves). Estimating the sorting in this case is problematic when the zeros count into either of the sides. Assume treatment starts at zero (to the right side). We now have a bin at zero which has (by construction) observations from all thresholds. To the left of this bin (on the non-treatment side) there are 5 bins which relate to only one threshold respectively and in our example hold only one observation. For small bin sizes, the relative pooling creates an asymmetry exactly at the threshold. 37 We illustrate the issue by simulating the above example. The left panel of figure 8 illustrates the resulting histogram. Noticeable, the histogram is mirrored to the right and left of the zero. Depending on whether the zero is counted to the right or to the left, we can see that there exists a sorting issue right at the threshold. In the next step, we estimated the McCrary test statistic on a sample where we duplicated every observation above 20 times (this brings us to a comparable sample size as above). To estimate the McCrary and avoid the first issue (see above) we set the breakpoint to -0.005. In panel 2 of figure 8, we vary the bin sizes between 0.01 and 0.2.16 We find that for small bin sizes the McCrary test signals a false positive and significant sorting result which is entirely driven by the particular issue related to the zeros. While the first issue (see above) can be remedied without loss of information, the second problem concerning the zeros in pooled data, when the distance is measured in relative terms, cannot be solved. The only technical solution to obtain correct inference on the scope of sorting in this case is to drop the observations at zero altogether. However, in doing this, we lose (potentially critical) information. 16 Here, we also manipulated the bandwidth to be exactly 10 times the bin size. If we let the algorithm choose a bandwidth, we do find positive point estimates, however, they are not statistically different from zero. The reason is that the algorithm chooses a bandwidth which is more than 200 times larger than the bin size and thus smooths away any difference at the threshold. 38 Appendix: Additional tables and Graphs Table 4: RDD studies using population thresholds Authors Publication status Research focus Country Time Thresholds Panel 1: Studies using a simple RDD Egger & Koethenbuerger Pettersson-Lidbom 2010, AEJ:App Council size Germany 1984-2004 1’000, 2’000, 3’000 5’000, 10’000, 20’000 30’000, 50’000, 100’000 200’000 2011, JPubE Council size Finland 1977-2002 2’000, 4’000, 8’000 15’000, 30’000, 60’000 120’000, 250’000, 400’000 12’000, 24’000, 36’000 Sweden 1977-2002 Fujiwara 2011, QJPS Single round vs. runoff Brazil 1996-2008 200’000 Hopkins 2011, AJPS Ballots US 2005-2006 5% of citizen, 10,000 Litschig 2012, JPubE Sorting Brazil 1991 17 brackets Barone & de Blasio 2013, IRLE Single round vs. runoff Italy 1993-2000 15’000 Brollo, Nannicini, Perotti & Tabellini 2013, AER Transfers Brazil 2001-2008 10’189, 13’585, 16’981 23’773, 30’564, 37’356 44’148 Gagliarducci & Nannicini 2013, JEEA Wage of mayors Italy 1993-2001 5’000 2013, AEJ:App Transfers Brazil 1982-1988 10’189, 13’585, 16’981 2013, QJPS Election rules Morocco Litschig & Morrison Pellicer & Wegner 39 25,000 continued on next page . . . . . . continued from previous page Authors Publication status Research focus Country Time Thresholds 2003, 2009 Hinnerich-Tyrefors & Pettersson-Lidbom 2014, Econometrica Direct democracy Sweden 1918-1938 1’500 Arnold & Freier 2015, PuCh Signature requirements Germany 1995-2009 10’000, 20’000, 30’000, 50’000, 100’000 Eggers 2015, CPS Electoral rule France 2001-2008 3’500 Bordignon, Nannicini & Tabellini R&R, AER Single round vs. runoff Italy 1993-2007 15’000 Ferraz & Finan 2012, WP Wage of mayor Brazil 2004-2008 10’000, 50’000, 100’000 300’000, 500’000 Mukherjee 2011, WP Infrastructure India 2001-2009 500 Baskaran 2012, WP Transfers Germany 2001-2010 5’000, 7’500, 10’000 15’000, 20’000, 30’000 50’000 Hirota & Yunoue 2012, WP Council size Japan 2’000, 5’000, 10’000 20’000, 50’000, 100,000 200’000, 300’000, 500’000 900’000 Egger & Koethenbuerger 2013, WP Council size Germany 1984-2004 1’000, 2’000, 3’000 5’000, 10’000, 20’000 30’000, 50’000, 100’000 200’000 Lyytik¨ ainen & Tukiainen 2013, WP Council size Finland 1996-2008 2’000, 4’000, 8’000 15’000, 30’000 De Benedetto, & 2014, WP Wage of mayors Italy 40 1’000, 5’000, 50’000 continued on next page . . . . . . continued from previous page Authors Publication status Research focus De Paola Country Time Thresholds 1991-2001 van der Linde et al. 2014, WP Wage of politicians Netherlands 2005-2012 Casas-Arce, Saiz 8’000, 14’000, 24’000 40’000, 60’000, 100’000 150’000, 375’000 Panel 2: Studies using a Difference-in-Discontinuity 2015, JPE Gender Quota Spain 2003-2007 5’000 Baques & Campa 2011, WP Gender Quota Spain 2003-2007 5’000 Grembi, Nannicini & Troiano 2012, WP Fiscal rules Italy 1999-2004 5’000 Asatryan, Baskaran, Grigoriadis, Heinemann 2013, WP Citizen referenda and spending Germany 1980-2011 10’000, 20’000, 30’000, 50’000, 100’000 Asatryan, Baskaran, Heinemann 2013, WP Citizen referenda and taxes Germany 1980-2011 10’000, 20’000, 30’000, 50’000, 100’000 Gulino 2014, WP Electoral rules Italy 1985-2000 5’000 Notes: All papers are also cited in our references. 41 Table 5: Population thresholds in Italian municipalities Policy changes at k inhabitants (in tsd) 1 Size of the city council 3 5 x Wage of the mayor x Wage of the executive officers x x 10 15 20 x x x x x x x Attendance fee for city councilors x Maximum number of executive officers x Electoral Rule (plurality/runoff) 50 60 x 100 250 500 x x x x x x x x x x x x Neighborhood councils x Hospitals x x Health district Balanced-budget rule 30 x x Note: The table identifies population thresholds (in thousands) at which given policies change. This is a partial list of policies, chosen to highlight the variety of policies that depend on population thresholds and the extent to which the same threshold often determines multiple policies. Source: French legal code. 42 43 Elections Signatures for party lists Signatures for mayoral candidacy Election districts Ballot districts Reevaluation of an election Committees/commisioners Equal opportunity commissioner Integration council Accounting agency Oversight regulation Council for top-secret issues Open council Mayor Mayor status Mayor title Deselection of mayors Qualification of mayor 24 25 26 27 28 29 30 31 32 33 Fiscal rules Status of a larger city Status of a county free city Fiscal equalization Special transfers for mergers 12 13 14 15 19 20 21 22 23 Wages Wage of mayors Additional compensation of the mayor Wage of head of admin. units Wage of deputies Wage of mayors in recreational cities 7 8 9 10 11 Citizen involvement Citizen request Petition for referenda Quota requirements for referenda Councils Council size Full-time council members / Deputies City districts City district council Administrative units Council of the administration units 1 2 3 4 5 6 16 17 18 Institution/ Rule # (2) [1] [1] [1] [1] [4] [6] [6] [6] [1] [1] [1] [7] (5) [3] (12) (3) SH [2] [1] 1 [2] [1] (3) 1 [2] [1] [2] [7] [2] 19* [6] [3] (6) [9] [5] (10) NRW [2] [9] [5] (2) (30) [2] [1] NiedS 1 [1] 1 [1] [1] [1] 1 [2] [7] [1] [1] (11) 1 (9) Hes [1] [3] [2] [14] (6) (4) [1] [8] [7] [15] [4] [1] [1] RP [2] 1 [3] (4) 1 [2] [5] [4] [4] [1] [1] [1] (11) (10) [1] [1] BW [2] [2] [6] [2] [1] [1] (7) (9) (10) [1] [1] Bay [1] [1] [1] [1] [1] [5] (7) [1] [5] (7) [6] [4] 1 (3) Saar 1 [5] [1] (6) [1] (2) (6) [1] (7) [1] [1] (5) (11) [1] BB [1] [1] [1] [1] 1 1 [3] 1 (3) (10) [7] [5] [1] [1] (12) [1] Saan [1] [2] [2] [1] [1] [1] 1 (7) (6) [1] [2] [1] [1] 1 [1] [2] [7] [1] (8) [7] [3] [8 1 [9] [3] [1] [11] (5) Th (12) (14) (6) Sax continued on next page . . . [1] [1] 1 [2] [1] [1] [1] [1] (7) [7] [1] (2) [4] (15) [1] MVP Population thresholds in the different German states Table 6: Population Thresholds in Germany (by rule and state) 11 4 3 3 8 5 5 3 5 6 36 7 19 8 3 19 41 12 16 3 69 3 116 36 20 20 4 165 29 5 10 7 9 Σ 44 Notes: Source: Own research. 51 14 22 10 44 18 20 10 73 15 22 4 63 20 20 7 46 17 14 6 [1] [1] [1] Saar 52 19 22 6 [1] [2] [1] [1] MVP 51 15 23 9 [2] BB Saan Sax Th 54 17 22 12 [4] 72 15 27 12 [2] [4] 64 19 17 3 67 14 34 18 [1] [1] [1] (8) Bay 47 15 17 8 60 19 21 9 [6] [1] BW Σ # of different policy issues # of different population thresholds # of thresholds with unique change [1] [1] [1] [3] RP [1] [1] [1] [1] [1] [2] Hes [1] 4 [1] [1] [1] NRW Unique rules Direct democracy Consolidated accounts Youth welfare office Construction oversight Mayor deputy recall County financing Additional transfers Key transfers Deputies in administrative units Pension funds Municipal treasurer Office hours on election day Election statistics Economic status Accounting committee Water management Vehicle Tax Deputy mayor status Naming of city districts Cooperation councils Council after mergers Outside administrative units Forced administrative reconsideration Administrator qualification requirement Format of the ballots Add compensation of head of admin City district elections Infrastructure subsidies Country-side municipalities General committee Annual audits Election proposals NiedS 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 SH Institution/ Rule Population thresholds in the different German states # . . . continued from previous page
© Copyright 2024