The Role of Commodity Taxation in Pareto Efficient Tax Structures for Redistribution John Burbidge∗ 25 May 2015 Abstract Ramsey (1927) studied the problem of how to structure commodity taxes efficiently when some commodities cannot be taxed. Mirrlees (1971) studied efficient nonlinear income taxes for redistribution when the government can observe only earnings. Deaton (1979) argued persuasively that using distance functions and having the government choose commodities and leisure directly, rather than tax rates, offer the easiest route to understanding the Ramsey problem. This paper combines these papers in the simplest setting possible: two types, two goods and leisure, and a government that wants to redistribute from the high-wage type to the low-wage type but the government observes only earnings and consumption of each good. Piketty and Saez correctly observe that the literature that flowed from Mirrlees (1971) shifted emphasis away from using commodity taxes to redistribute income to using nonlinear earnings taxation (2013, p. 402). I argue in the present paper that the distance-function method of framing and solving optimal tax problems highlights an alternative path the post-Mirrlees literature might have followed and, along this path, the role of commodity taxation in redistribution is much larger. Keywords: Optimal taxation, distance function, separability JEL claasification H21 ∗ Department of Economics, University of Waterloo, Waterloo, Ontario, Canada, N2L 3G1, [email protected]. I thank Lutz-Alexander Busch, Christoph L¨ ulfesmann, John Revesz, C´esar Sosa-Padilla, Michael Veall, and seminar participants at McMaster University, the University of Waterloo and the 2014 Canadian Economics Association Meetings in Vancouver, for helpful conversations. 1 1 Introduction Ramsey (1927) studied the problem of how to structure commodity taxes efficiently when some commodities cannot be taxed. Mirrlees (1971) studied efficient nonlinear income taxes for redistribution when the government can observe only earnings. Deaton (1979) argued persuasively that using distance functions and having the government choose commodities and leisure directly, rather than tax rates, offer the easiest route to understanding the Ramsey problem. This paper combines these papers in the simplest setting possible: two types, two goods and leisure, and a government that wants to redistribute from the high-wage type to the low-wage type but the government observes only earnings and consumption of each good. Piketty and Saez correctly observe that the literature that flowed from Mirrlees (1971) shifted emphasis away from using commodity taxes to redistribute income to using nonlinear earnings taxation (2013, p. 402). I argue in the present paper that the distance-function method of framing and solving optimal tax problems highlights an alternative path the post-Mirrlees literature might have followed and, along this path, the role of commodity taxation in redistribution is much larger. The government’s ability to redistribute depends, of course, on the constraints on its behaviour. The use of distance functions leads to a natural sequence of constraints. The most restrictive set of constraints, label this case (a), captures a setting with commodity tax rates alone. Relaxing one constraint depicts a more powerful government that can be interpreted as levying a proportional earnings tax on the high earner as well as commodity taxes; label this case (b). In case (c), relaxing one more constraint captures the nonlinear earnings tax structure, together with commodity taxes, studied by Sadka (1976), Seade (1977) and Atkinson and Stiglitz (1976). I prove that if preferences are homothetic and leisure is weakly separable from goods then in case (a) the government can redistribute only until the mimicking constraint binds. Relaxing either homotheticity or weak separability of leisure may permit redistribution beyond this point. To derive the Ramsey tax results using distance functions the modeler must impose a constraint that rules out lump-sum taxes. The constraint that must be relaxed to move from case (b) to case (c) is precisely this constraint, as it applies to the high earner. So, for example, the famous zero-marginal tax rate result on the high earner follows from not imposing the constraint that leads to the Ramsey tax results. Re-imposing the constraint that induces the Ramsey tax results, that is, moving from (c) back to (b), opens a scenario in which both nonlinear earnings taxes and commodity taxes are important for redistribution, even if leisure and goods are weakly separable. I show that, starting at the private equilibrium, and before the mimicking constraint binds, the earnings 2 tax on the high earner is the primary instrument for redistribution. Here the Corlett and Hague (1953-54) intuition applies — the commodity tax rate on the good most complementary with leisure is positive; the rate on the other good is negative. Once the mimicking constraint binds reducing the earnings tax rate becomes the primary instrument to prevent mimicking and commodity tax rates rise to generate the revenue required for further redistribution. Section 2 restates Deaton’s distance function method for solving the Ramsey problem. Section 3 develops the full set of constraints on the government’s behaviour and states the government’s optimization problem. Section 4 presents the results for each of the three cases listed above. Section 5 summarizes and concludes. 2 Ramsey with two goods and leisure With two goods and leisure, each person’s budget constraint can be written as p1 x1 + p2 x2 + wl = wL, where L is the time endowment and (p1 , p2 , w) are the prices of (x1 , x2 , l).1 Assume that the government wishes to raise given revenue with the smallest decrease in utility and lump-sum taxes are ruled out. If the government were able to tax leisure directly a proportional tax rate on goods and leisure at rate t would function as a lump-sum tax on the time endowment because the following budget constraints (1 + t)(p1 x1 + p2 x2 + wl) = wL p1 x1 + p2 x2 + wl = w(1 − t∗ )L are the same if 1 = 1 − t∗ . 1+t Assuming that leisure cannot be taxed directly and that the government uses proportional tax rates, the options in the present setting are commodity taxes, t1 and t2 and an earnings tax, te . Since the following equations (1 + t1a )p1 x1 + (1 + t2a )p2 x2 = (1 − te )w(L − l) 1 Much of this section is drawn from Burbidge, CJE, 2015. 3 (1 + t1 )p1 x1 + (1 + t2 )p2 x2 = w(L − l) are the same if tj are defined by 1 + tja , 1 − te assuming the government uses only commodity tax rates does not diminish the government’s effectiveness. Second-best optimal commodity tax rates must address the only issue standing in the way of a first-best outcome — leisure cannot be taxed directly. Thus, as Corlett and Hague (1953-54) suggested, the answer to the question of whether good 1 or good 2 should be taxed at a higher rate must be that whichever of goods 1 or 2 is more complementary with leisure should be taxed at a higher rate. Since the deadweight loss of any tax works solely off substitution effects and these are defined with utility held constant the required construct must isolate the relationship between good 1 and leisure and good 2 and leisure, with utility held constant. This construct is the distance function. If u(x1 , x2 , l) is the ordinary utility function d(x1 , x2 , l, u0 ) is defined by x2 l x1 , , = u0 , (1) u d(x1 , x2 , l, u0 ) d(x1 , x2 , l, u0 ) d(x1 , x2 , l, u0 ) 1 + tj ≡ that is, d(x1 , x2 , l, u0 ) is the number by which an arbitrary consumption vector must be scaled to deliver utility level u0 , and u(x1 , x2 , l) ≥ u0 if and only if d(x1 , x2 , l, u0 ) ≥ 1. (2) For the moment, set all tax rates to zero to reduce clutter. The next few paragraphs follow Deaton (1979) in showing the connections between the distance function and the expenditure function, e (p1 , p2 , w, u0 ). The scaled vector x2 l x1 , , , d(x1 , x2 , l, u0 ) d(x1 , x2 , l, u0 ) d(x1 , x2 , l, u0 ) will deliver a utility level of u0 and so the expenditure function must satisfy e (p1 , p2 , w, u0 ) ≤ p1 x1 + p2 x2 + wl . d(x1 , x2 , l, u0 ) 4 (3) But if (x∗1 , x∗2 , l∗ ) are chosen to be the Hicksian demand levels for these prices and utility level u0 then d (x∗1 , x∗2 , l∗ , u0 ) = 1 and e (p1 , p2 , w, u0 ) = p1 x∗1 + p2 x∗2 + wl∗ . d(x∗1 , x∗2 , l∗ , u0 ) Thus e (p1 , p2 , w, u0 ) = Min p1 x1 + p2 x2 + wl . x1 , x2 , l d(x1 , x2 , l, u0 ) (4) From (3) d (x1 , x2 , l, u0 ) ≤ p1 x1 + p2 x2 + wl e (p1 , p2 , w, u0 ) Now let prices be (p∗1 , p∗2 , w∗ ) which (not uniquely) generate a budget plane tangent to the u0 indifference surface at the point where the ray from the origin to (x1 , x2 , l) cuts this indifference surface.2 Then d (x1 , x2 , l, u0 ) = p∗1 x1 + p∗2 x2 + w∗ l e (p∗1 , p∗2 , w∗ , u0 ) so d (x1 , x2 , l, u0 ) = Min p1 x1 + p2 x2 + wl . p1 , p2 , w e (p1 , p2 , w, u0 ) (5) Equations (4) and (5) clearly show the duality between the distance function and the expenditure function. Just as the Hessian of the expenditure function must be symmetric and negative semi-definite, so must the Hessian of the distance function (the Antonelli matrix) be symmetric and negative semi-definite (again, see Deaton (1979)). Reintroduce commodity tax rates. Since earnings are the only source of income for the household, earnings are not taxed and initial prices (p1 , p2 , w) are assumed to be constant, e ((1 + t1 )p1 , (1 + t2 )p2 , w, u0 ) = wL. Applying the envelope theorem to (5) with commodity tax rates in place, obtain 2 See Figures 1 and 2 in Deaton (1979). 5 ∂d (x1 , x2 , l, u0 ) (1 + t1 )p1 = ∂x1 wL (1 + t2 )p2 ∂d (x1 , x2 , l, u0 ) = a2 (x1 , x2 , l, u0 ) ≡ ∂x2 wL ∂d (x1 , x2 , l, u0 ) w 1 a3 (x1 , x2 , l, u0 ) ≡ = = ∂l wL L a1 (x1 , x2 , l, u0 ) ≡ (6) (7) (8) Not only must the Antonelli matrix be symmetric and negative semi-definite, but pre-multiplying it by [x1 x2 l] must yield a row vector of zeros, for any admissible [x1 x2 l]. Most of the results in this paper depend on the signs of a∗ij ≡ aij /ai . I will assume that elements on the main diagonal of the a∗ij matrix are negative, the off-diagonal elements are positive, and in any row the magnitude of the diagonal element is larger than any of the off-diagonal elements. The Ramsey problem can be solved by assuming the government maximizes utility, or equivalently d (x1 , x2 , l, u0 ), subject to a minimum revenue requirement and a constraint that rules out lump-sum taxes. In this model the government obtains whatever the person does not consume, that is, wL − p1 x1 − p2 x2 − wl. Denoting the revenue requirement by R the revenue constraint can be written as wL − p1 x1 − p2 x2 − wl − R ≥ 0, and the Lagrange multiplier on this constraint must be nonnegative. A first-best solution to the problem would be to employ a lump-sum tax, T . From (8), a lumpsum tax would imply w 1 w ≥ = . wL − T wL L Thus the absence of lump-sum taxes implies the government must deal with the constraint that a3 (x1 , x2 , l, u0 ) = 1 − a3 (x1 , x2 , l, u0 ) ≥ 0, L (9) and the Lagrange multiplier on this constraint must be nonnegative. From this discussion the Lagrangian is L = d (x1 , x2 , l, u0 ) + λ R L wL − p1 x1 − p2 x2 − wl − R + λ 6 1 − a3 (x1 , x2 , l, u0 ) L The first-order conditions are ∂L = a1 − λR p1 − λL a31 ∂x1 ∂L = a2 − λR p2 − λL a32 0 = ∂x2 ∂L = a3 − λR w − λL a33 0 = ∂l 0 = Using the expressions (6)-(8) and the symmetry of the Antonelli matrix 1 = λR wL 1 1 + λL a∗13 = λR wL + λL a∗23 = λR wL + λL a∗33 . 1 + t1 1 + t2 So then 1 λ − = λ wL 1 − 1 + t2 t1 − t2 λL = (a∗13 − a∗23 ) R (1 + t1 ) (1 + t2 ) λ wL L (a∗23 a∗33 ) R or t1 − t2 = t2 (1 + t1 ) a∗13 − a∗23 a∗23 − a∗33 On my assumptions, a∗23 − a∗33 > 0. a∗j3 measures the degree of complementarity between good j and leisure holding utility constant. If good 1 is more complementary with leisure than is good 2 then it is efficient to tax good 1 at a higher rate than good 2 because doing so helps compensate for the inability to tax leisure directly. Since ∂ ln (a1 /a2 ) = a∗13 − a∗23 ∂l and a1 and a2 are positive I can write ∂ (a1 /a2 ) R 0. (10) ∂l In particular, a necessary and sufficient condition for the efficiency of equal proportional taxation of commodities 1 and 2 is that a1 /a2 be independent of leisure. t1 R t2 if and only if 7 The result in (10) is the same as equation (52) in Deaton (1979) and, given the duality between the expenditure function and the distance function described by Deaton (1979), (5.1) in Besley and Jewitt (1995). ∂ (h1 /h2 ) R 0, (11) ∂w where hj = ∂e (p1 , p2 , w, u0 ) /∂pj is the Hicksian demand for commodity j. The slip in Deaton (1979), which Besley and Jewitt correct, is in jumping from (10) to saying that (10) holds if and only if leisure and goods are implicitly separable, that is, d (x1 , x2 , l, u0 ) can be written as d∗ (f (x1 , x2 , u0 ) , l, u0 ).3 t1 R t2 if and only if 3 Taxation for redistribution Consider an economy with two types of price-taking agents like the agent discussed above. A and B differ only in their wage rates, wA > wB . Earnings have to be spent on goods 1 and 2, which are taxed at proportional rates t1 , t2 . Assume the government wishes to redistribute money from the As to the Bs but it can observe only earnings and consumption levels; this is the Mirrlees (1971) problem. I am going to approach the problem by building on the previous section. Clearly the government’s ability to redistribute efficiently depends on the instruments at its disposal. The equivalent of equations (6)-(8) in the present context are 1 − tH wA (1 + t1 ) p1 A (1 + t2 ) p2 A ; a2 = ; a3 = = EA EA E A 1 − tL w B (1 + t1 ) p1 B (1 + t2 ) p2 B B ; a2 = ; a3 = , a1 = EB EB EB aA 1 (12) (13) where E j is the total expenditure of agent j = A, B and tj , j = H, L are marginal earnings tax rates on high and low earners. I need to employ some normalization of tax rates and, for the moment, will set tL = 0. These equalities build in the assumption that A and B pay the same prices for goods. 3 In a note on the literature that followed Atkinson and Stiglitz (1976), Auerbach (1979) showed that with the additively separable utility function 1/2 1/2 u (x1 , x2 , l) = x1 x2 1/2 + x1 + l1/2 , uniform commodity taxation is never efficient in the Ramsey setting — the optimal level of t1 will always exceed the optimal level of t2 . See footnote 4. 8 B B B I assume the government acts to maximize B’s utility, d xB given 1 , x2 , l , u0 various constraints. One of these is a minimum level for A’s utility A A A d xA 1 , x2 , l , u0 − 1 ≥ 0; the Lagrange multiplier on this constraint, λA , must be nonnegative. I assume the only purpose of taxation is for redistribution and thus another constraint is that total revenue be nonnegative B B B A A A ≥ 0. + nB wB L − p1 xB n A w A L − p 1 xA 1 − p 2 x2 − w l 1 − p 2 x2 − w l nj is the number of each type and the Lagrange multiplier associated with this constraint, λR , must be nonnegative. Each B will be given a cash transfer equal to total revenue divided by nB . T L ≤ 0 denotes the lump-sum “tax” for each low earner, which is a B, E B = wB L − T L ≥ wB L. The most efficient way to redistribute from the As to the Bs would be with a lump-sum tax on the high earner, T H > 0. I show below that when T H is available as an instrument, tH is unnecessary and would be set equal to zero. Thus, if T H > 0, E A = wA L − T H < wA L and then from (12) aA 3 > 1/L. Therefore, the counterpart of the no-lump-sum-tax constraint that is inequality (9) in the Ramsey model, can be written in the Mirrlees model as 1 A A A A − aA 3 x1 , x2 , l , u0 ≥ 0, L and the associated Lagrange multiplier, λL , must be nonnegative. With a lump-sum tax on the higher earners ruled out a second-best instrument would be a marginal-equals-average earnings tax rate, tH , on the high earner. Using (12) and (13) for leisure and good 1 one could write (1 + t1 ) p1 aA 3 = 1 − tH w A aA 1 (1 + t1 ) p1 aB 3 = 1 w B aB 1 or, subtracting the second equation from the first, A a3 aB 3 (1 + t1 ) p1 − B B = −tH . w A aA w a1 1 9 With tH > 0, and with no lump-sum tax on the high earners (T H = 0), aA aB 3 − B3 B < 0. A A w a1 w a1 Without tH as an instrument the government has to live with the constraint that aB aA 3 3 − ≥ 0. B aB w A aA w 1 1 This can be rewritten as B aA aA 3 1 a3 − ≥ 0. B w A aB 1 w (14) If the relative price of leisure and good 2 had been used instead of the relative price of leisure and good 1 the corresponding inequality would have been B aA aA 2 a3 3 − ≥ 0. B w A aB 2 w (15) Clearly, whether one employs the relative price of leisure and good 1 or the relative price of leisure and good 2 may affect the way constraints are written and the signs of the associated Lagrange multipliers, but this choice cannot affect the implications of the model for tax policy. Observe that inequality (14) implies (15) when aA aA 1 2 ≥ B aB a 1 2 (16) and (15) implies (14) when aA aA 2 1 ≥ . B aB a 2 1 This paper uses inequality constraints that are the equivalent of (14) and (16). B B B B B A A A A A A A A A B B B B B B w A aA 1 x1 , x2 , l , u0 a3 x1 , x2 , l , u0 −w a1 x1 , x2 , l , u0 a3 x1 , x2 , l , u0 ≤ 0, with the Lagrange multiplier denoted, λw ≤ 0, and 10 B B B B B B B B B B A A A A A A A A A aA 1 x1 , x2 , l , u0 a2 x1 , x2 , l , u0 − a2 x1 , x2 , l , u0 a1 x1 , x2 , l , u0 ≥ 0, with the Lagrange multiplier denoted λp ≥ 0. Simply put, these two constraints are one way of ruling out a proportional earnings tax rate on the high earner and imposing the constraint that A and B pay the same prices for goods. Finally, the government might be prevented from setting different commodity A tax rates so that t1 = t2 . If this were true it would mean that p1 aA 2 = p 2 a1 . And then the equivalent condition for person B would follow from the constraint that A and B face the same prices of goods; this is the constraint whose Lagrange A multiplier is λp . Denote the Lagrange multiplier for the p1 aA 2 = p2 a1 constraint by λt . At this point the government faces six constraints, with Lagrange multipliers λj , j = A, R, L, p, w, t and six choice variables which are goods consumption and leisure for A and B. Starting at the private equilibrium where the commodity tax rate, t1 = t2 = t = 0, raising t and giving the revenue to the Bs, will move us along the utility possibility frontier in the direction of lower A utility. At some point each A will realize that her utility would be higher if she pretended to be a low earner (a B) and was eligible for the cash transfer. When an A mimics a B, the A A ) equal the earnings , to make her earnings wA (L − lm mimicking A chooses leisure, lm B B of a B, w (L − l ). If she does this she receives the cash transfer, −T L , and will face a budget constraint for goods 1 and 2 that is identical to that faced by each B. If leisure and goods are not weakly separable typically she will choose to consume A > lB . If leisure more of the good that is complementary with leisure because lm and goods are weakly separable then a mimicking A will consume the same bundle of goods as each B. In the present setting with only commodity tax rates and t1 forced to equal t2 the mimicking constraint is a seventh constraint that prevents further redistribution. With only six instruments, once the mimicking constraint binds, further redistribution is impossible. Can the problem be solved by allowing t1 and t2 to differ? Before mimicking binds and dropping the constraint that t1 = t2 we have six choice variables and five constraints. With mimicking we have two extra constraints A and two new choice variables, xA 1m and x2m . One of the extra constraints is that A’s utility acting as an A be at least as high as the utility of an A mimicking a B. Given that we already have a constraint that sets a lower bound on A’s utility, A A A A d x1 , x2 , l , u0 − 1 ≥ 0, the no-mimicking constraint can be written as A A A 1 − d xA 1m , x2m , lm , u0 ≥ 0; the Lagrange multiplier on this constraint, λm , must be nonnegative. The other side of the observation that to prevent mimicking the government must make the utility 11 of an A at least as large as the utility of an A mimicking a B is that, to discourage mimicking, the government would like to have an instrument that would discourage mimicking by pushing the goods budget of a mimicking A below the goods budget for a B: B B B B B B B B B B B aB 1 x1 , x2 , l , u0 x1 + a2 x1 , x2 , l , u0 x2 ≥ A A B B B B B B B B B aB x , x , l , u x + a x , x , l , u 1 1 2 0 1m 2 1 2 0 x2m . The absence of such an instrument means that B B B B B B B B B B B aB 1 x1 , x2 , l , u0 x1 + a2 x1 , x2 , l , u0 x2 ≤ A A B B B B B B B B B aB 1 x1 , x2 , l , u0 x1m + a2 x1 , x2 , l , u0 x2m or B B B B B B B B B B B aB x1 − xA x2 − xA 1 x1 , x2 , l , u0 1m + a2 x1 , x2 , l , u0 2m ≤ 0. Label the Lagrange multiplier on this constraint λc ≤ 0. The optimization problem for the government can now be written as Opt A A xA 1 , x2 , l B B B x1 , x2 , l A xA 1m , x2m B B B d xB 1 , x2 , l , u0 + λj , j = A, R, L, w, p, m, c A A A λ A d xA , x , l , u − 1 + 1 2 0 A A A B B B λ R n A w A L − p 1 xA + nB wB L − p1 xB + 1 − p 2 x2 − w l 1 − p 2 x2 − w l 1 A A A A − aA x , x , l , u + λL 3 1 2 0 L B B B B B A A A A A A A A A B B B B B B λ w w A aA + 1 x1 , x2 , l , u0 a3 x1 , x2 , l , u0 − w a1 x1 , x2 , l , u0 a3 x1 , x2 , l , u0 A A A B B B B B B A A A A A B B B B A + λ p aA 1 x1 , x2 , l , u0 a2 x1 , x2 , l , u0 − a2 x1 , x2 , l , u0 a1 x1 , x2 , l , u0 A m A A A λ 1 − d x1m , x2m , lm , u0 + B B B B B B B B B B B x2 − xA x 1 − xA λ c aB 2m 1m + a2 x1 , x2 , l , u0 1 x1 , x2 , l , u0 where A wA L − lm = wB L − lB 12 (17) 4 Results I discuss results in three settings. In the first, the government has only commodity taxes, t1 , t2 . In the second, I go to the other extreme and endow the government with a lump-sum tax on the high earners, T H , together with marginal earnings tax rates on the high and low earners, tH , tL , as well as commodity taxes. In the third, I examine an intermediate setting where the government has access to a proportional earnings tax rate on the high earners, tH , as well as commodity tax rates. In the appendix I prove that, with any of these tax systems, and with or without the mimicking constraint binding, it is efficient to set t1 = t2 if preferences are homothetic and leisure is weakly separable from goods. Later in the paper I provide an example where equal commodity tax rates are efficient but leisure is additively separable from goods so homotheticity and the separability of leisure from goods are sufficient but not necessary conditions for the efficiency of uniform commodity taxation. Inspection of the proof reveals that homotheticity and separability imply goods 1 and 2 are equally complementary with leisure, holding utility constant, for ∗j both types — a∗j 13 = a23 , j = A, B. A central theme of the results in this paper is that there is a very tight relationship between the Ramsey problem and the Mirrlees problem. 4.1 Only commodity tax rates In this setting, with homotheticity and separability of leisure, there is, in effect, one instrument, a uniform commodity tax rate, t1 = t2 ≡ t, available for redistribution. Starting at the private equilibrium and raising the commodity tax rate with the revenue given to the low earners will move us along the upf in the direction of higher utility for Bs. Once the mimicking constraint binds, there is no extra instrument to prevent As from mimicking Bs, and further redistribution is impossible. This is the case, for example, with Cobb-Douglas preferences: u (x1 , x2 , l) = xα1 xβ2 l1−α−β , α > 0, β > 0, α + β < 1. If one steps away from homotheticity or separability between leisure and goods ∗j then typically a∗j 13 6= a23 , j = A, B. Given the results in section 2 one would expect that, along the upf before mimicking starts, the sign of t1 − t2 matches the sign of ∗j a∗j 13 − a23 , j = A, B — if good 1 is more complementary with leisure than is good 2, it is efficient to tax good 1 at a higher rate. Recalling that λw < 0, inspection of equation (25) in the appendix shows this to be true. 13 What happens when mimicking begins? Now there are two instruments, t1 and t2 , to cope with the targets of increasing the utility of a B while keeping the utility of an A mimicking a B as high as the utility of an A. Coping with mimicking is possible, but barely so. Consider a step up the upf; uB must increase and uA and uA m must fall B A B A B by the same amount. If dl were negative or zero, dlm = (w /w )dl would have to fall by less or the same amount. For uB to rise, the goods budget for B would have to increase but this goods budget is the same for a mimicking A and a B. Therefore, B A B A > 0 if uB went up so would uA m , which is impossible. Thus, dlm = (w /w )dl and the goods budget for a B and a mimicking A must fall. This could not occur if both tax rates increased; one must fall and the other must increase. Changing commodity taxes to increase the consumption of the good most complementary with leisure encourages the Bs to work less which discourages the As from mimicking them. In other words, the mimicking constraint switches the government’s problem away from trying to tax leisure indirectly to trying to tax work; the Corlett-Hague intuition is reversed. For example, if good 1 is more complementary with leisure ∗j than is good 2, a∗j 13 > a23 , j = A, B, then t1 > t2 on the upf when mimicking starts, and, as we move further up the upf, t1 falls and t2 rises. At some point t1 may equal t2 which proves that homotheticity and weak separability are not necessary for the efficiency of uniform commodity taxation in this model (see the examples described in footnote 4). 4.2 Lump-sum taxes on the high earner As one might expect, until mimicking starts, a lump-sum tax on the high earner is a first-best instrument — no other tax instrument needed. When mimicking starts the results in the appendix confirm the results in the literature. It is efficient to continue to have a zero marginal earnings tax rate on the high earners, tH = 0, but to have a marginal earnings tax rate on the low earners, tL > 0. This discourages the Bs from working which discourages the As from mimicking the Bs. I prove that if leisure is ∗j not weakly separable from goods, and a∗j 13 > a23 , j = A, B, t1 > 0 and t2 < 0, and vice versa. 4.3 A proportional earnings tax on the high earner and commodity taxes Here the no-lump-sum tax constraint is still binding but now λw = 0. In the appendix I prove the following results which are very intuitive. Until mimicking begins it is efficient to use the proportional earnings tax on the high earner as the primary 14 instrument for redistribution. This is supplemented by commodity taxation in the following way: if good 1 is more complementary with leisure than is good 2 then t1 > 0 and t2 < 0, and vice versa. Once mimicking begins the earnings tax on the high earner becomes the primary instrument to prevent the As from mimicking the Bs; it peaks at the point where mimicking begins and then falls. The role vacated by the earnings tax on the high earner is picked up by increases in commodity tax rates, and, again, the rate structure follows the Ramsey rule; t1 is higher than t2 if good 1 is more complementary with leisure than is good 2.4 5 Summary and conclusions The utility possibility frontiers attainable through taxation and redistribution depend on the quality of the instruments available to the government. The literature that followed Mirrlees (1971) chose to drop the constraint that was the basis for the optimal commodity tax literature that followed Ramsey (1927). This was one path and it led economists to argue that commodity taxation should play only a minor role in redistribution. Another path, the one highlighted in this paper, is to study nonlinear earnings taxes, maintaining the constraint on government behaviour that is the basis of the Ramsey model. In this setting both earnings and commodity taxation have important roles to play in redistribution. It is a great distance between the real world and the models in this paper but it is difficult not to notice that for many countries differential commodity taxes raise a substantial share of the money used for redistribution and very few countries have a zero marginal earnings tax rate for their highest earners. 4 The algebra and sample tables of upfs for the following utility functions 1/2 1/2 + x1 1/2 1/2 1/2 + x1 l1/2 1/2 1/2 + x2 l1/2 u (x1 , x2 , l) = x1 x2 u (x1 , x2 , l) = x1 x2 u (x1 , x2 , l) = x1 x2 u (x1 , x2 , l) = β 1−α−β xα 1 x2 l + l1/2 1/2 1/2 are at: https://artsonline.uwaterloo.ca/jburbidg/node/4. In the first utility function leisure is additively separable and preferences are not homothetic. In the second and third utility functions leisure is not weakly separable and preferences are homothetic. The fourth utility function is CobbDouglas; leisure is weakly separable and preferences are homothetic. The R code for the simulation programs and further details are available from the author. 15 Appendix Implications of the first-order conditions The first-order conditions for the eight goods and leisure variables are A B A B B A p B A B A L A w w a a − w a a 0 = −λR nA p1 + λA aA + λ a a − a a − λ a + λ 3 11 1 31 1 2 11 1 21 31 A B A B B A B A B A L A w R A A A p 0 = −λ n p2 + λ a2 + λ a2 a12 − a1 a22 − λ a32 + λ w a3 a12 − w a1 a32 A B B A B A L A w A p w A aB aB 0 = −λR nA wA + λA aA 2 a13 − a1 a23 − λ a33 + λ 3 +λ 3 a13 − w a1 a33 A A B B A B w A B A B R B p w a a − w a a + λ a − a a a − λ n p + λ 0 = aB 1 1 3 11 + 1 B21 B 2 11A 1 31 c B B A B λ a11 x1 − x1m + a21 x2 − x2m + a1 R B p B A B w B B A B 0 = aB aA w A aA 2 − λ n p2 + λ 1 a22 − a2 a12 + λ 1 a32 − w a3 a12 + B A B B A B λc aB 12 x1 − x1m + a22 x2 − x2m + a2 R B B p B A B w B B A B 0 = aB aA w A aA 3 −λ n w +λ 1 a23 − a2 a13 + λ 1 a33 − w a3 a13 + B B A B B A m A w λc aB x − x + a x − x − λ a 13 1 1m 23 2 2m 3m A w m A c B 0 = −λ a1m − λ a1 c B 0 = −λm aA 2m − λ a2 Note the last two equations imply that λc aB = −λm aA 1 1m c B m A λ a2 = −λ a2m EB λc = −λm A Em Using these equations we have 16 A B A B B A B A B A L A w p w a a − w a a a a − a a − λ a + λ 0 = −λR nA p1 + λA aA + λ 3 11 1 31 2 11 1 21 31 1 A B A B B A B A B A L A w p w a a − w a a a − a a − λ a + λ 0 = −λR nA p2 + λA aA + λ 3 12 1 a32 2 12 1 22 32 2 A B B A B A L A w A p w A aB aB 0 = −λR nA wA + λA aA 2 a13 − a1 a23 − λ a33 + λ 3 +λ 3 a13 − w a1 a33 A A B B A B w A B A B R B p w a a − w a a + λ a − a a a − λ n p + λ 0 = aB 1 2 11 1 21 1 1 31 3 11 B E B B A −λm A aB x B − xA − λm aA 1m + a21 x2 − x2m 1m Em 11 1 B B A B B A B w R B p w A aA aA 0 = aB 1 a32 − w a3 a12 1 a22 − a2 a12 + λ 2 − λ n p2 + λ EB B A B B A −λm A aB − λm aA 12 x1 − x1m + a22 x2 − x2m 2m Em R B B p B A B w B B A B 0 = aB aA w A aA 3 −λ n w +λ 1 a23 − a2 a13 + λ 1 a33 − w a3 a13 B EB B A B B A m A w x − x + a x − x − λ a −λm A aB 1m 23 2 2m 3m A Em 13 1 w Using (12) and (13) we have 17 1 ∗A B ∗A L ∗A = λA + λp aB 2 a11 − a1 a12 − λ a13 + 1 + t1 ∗A B B ∗A λw wA aB 3 a11 − w a1 a13 1 ∗A B ∗A L ∗A λR nA E A = λA + λp aB 2 a21 − a1 a22 − λ a23 + 1 + t2 ∗A B B ∗A λw wA aB 3 a21 − w a1 a23 wA ∗A B ∗A L ∗A λR nA A = λA + λp aB 2 a31 − a1 a32 − λ a33 + a3 B B ∗A ∗A λw wA aB 3 a31 − w a1 a33 1 ∗B A ∗B w ∗B B A ∗B = 1 + λp aA w A aA λR nB E B 1 a12 − a2 a11 + λ 1 a13 − w a3 a11 1 + t1 B B E B A ∗B B A mE −λm A a∗B x − x + a x − x − λ 1m 12 2 2m A Em 11 1 Em 1 ∗B A ∗B w A A ∗B B A ∗B λR nB E B = 1 + λp aA a − a a + λ w a a − w a a 1 22 2 21 1 23 3 21 1 + t2 B EB B A ∗B B A mE x −λm A a∗B − x + a x − x − λ 1m 22 2 2m A Em 21 1 Em wB ∗B A ∗B w ∗B B A ∗B λR nB B = 1 + λp aA w A aA 1 a32 − a2 a31 + λ 1 a33 − w a3 a31 a3 B A EB B A ∗B B A m a3m w −λm A a∗B x − x + a x − x − λ 1m 32 2 2m A Em 31 1 aB 3 w λR nA E A (18) (19) (20) (21) (22) (23) Then (19) minus (18), and (22) minus (21) yield t1 − t2 ∗A ∗A B ∗A ∗A = λp aB + 2 a21 − a11 + a1 a12 − a22 (1 + t1 ) (1 + t2 ) B B ∗A ∗A ∗A w ∗A ∗A λL a∗A wA aB 3 a21 − a11 + w a1 a13 − a23 13 − a23 + λ t1 − t2 ∗B ∗B A ∗B ∗B + a − a + a a − a = λ p aA λR nB E B 1 22 12 2 11 21 (1 + t1 ) (1 + t2 ) ∗B ∗B B A ∗B ∗B λw wA aA + 1 a23 − a13 + w a3 a11 − a21 B B EB ∗B ∗B ∗B λm A a∗B x1 − xA x2 − xA 11 − a21 1m + a12 − a22 2m Em λR nA E A Then E B times (24) minus E A times (25) yields 18 (24) (25) ∗A 0 = E B λL a∗A 13 − a23 + ∗A ∗B ∗B ∗A ∗B ∗B ∗A + λp (1 + t1 ) p1 a∗A 12 − a22 + a12 − a22 + (1 + t2 ) p2 a21 − a11 + a21 − a11 ∗B A ∗A + a∗B λw (1 + t1 ) p1 wB a∗A 13 − a23 13 − a23 + w ∗B ∗B B A A ∗A ∗A A B B w λ w a3 E a21 − a11 + w a3 E a21 − a11 + B B E AE B ∗B ∗B A ∗B ∗B A λm a − a x − x + a − a x − x (26) 21 11 1 1m 22 12 2 2m A Em With just commodity tax rates this is ∗A 0 = E B λL a∗A 13 − a23 + ∗A A ∗B λw (1 + t1 ) p1 wB a∗A a∗B + 13 − a23 + w 13 − a23 ∗A ∗B ∗B λp (1 + t1 ) p1 a∗A 12 − a22 + a12 − a22 + ∗A ∗B ∗B λp (1 + t2 ) p2 + λw wA wB a∗A 21 − a11 + a21 − a11 + B B E AE B ∗B ∗B A ∗B ∗B A a − a x − x + a − a x − x λm 21 11 1 1m 22 12 2 2m A Em (27) Proof that homotheticity and weak separability of leisure and goods imply t1 = t2 If the utility function is homothetic there is a strictly increasing transformation of it that is homogeneous of degree 1. Thus, with weak separability of leisure, u (x1 , x2 , l) = f (x1 , x2 ) g(l) and for all admissible values of goods and leisure, for all α > 0 f (αx1 , αx2 ) g (αl) = αf (x1 , x2 ) g(l) = αγ f (x1 , x2 ) α1−γ g(l). Thus f is homogeneous of degree γ and using Euler’s theorem its first derivatives are homogeneous of degree γ − 1 and its second derivatives are homogeneous of degree γ − 2. Then for i, j, k = 1, 2 αγ−2 fij (x1 , x2 ) 1 fij (x1 , x2 ) fij (αx1 , αx2 ) = γ−1 = . fk (αx1 , αx2 ) α fk (x1 , x2 ) α fk (x1 , x2 ) Letting α = 1/x2 19 1 fij (x1 /x2 , 1) fij (x1 , x2 ) = , fk (x1 , x2 ) x2 fk (x1 /x2 , 1) and since preferences are homothetic and A and B pay the same prices for goods 1 B B A and 2, xA 1 /x2 = x1 /x2 . Thus for q = A, B fij (xq1 , xq2 ) 1 fij (x1 /x2 , 1) , q = q q fk (x1 , x2 ) x2 fk (x1 /x2 , 1) and the second term on the RHS is independent of type. Turning to the distance function, u (x1 , x2 , l) homogeneous of degree 1 implies that from x1 x2 l u , , = u0 d d d we have −1 d (x1 , x2 , l, u0 ) = u−1 0 u (x1 , x2 , l) = u0 f (x1 , x2 ) g(l). Remembering that a∗ij ≡ aij /ai , for i, j = 1, 2 and q = A, B a∗q ij 1 fij (x1 /x2 , 1) 1 fij (xq1 , xq2 ) = q ≡ q fij∗ , = q q fi (x1 , x2 ) x2 fi (x1 /x2 , 1) x2 and a∗q 13 = g0 = a∗q 23 . g (28) Thus, inspection of (26) shows that if λw = 0, as it is with either a proportional earnings tax on the high earner or a lump-sum tax on the high earner, homogeneity and weak separability imply λp = 0 and therefore t1 = t2 from (24). To prove t1 = t2 in a pure commodity tax regime is a little more work. Using (28) and (13), E B times (24) yields t1 − t2 ∗A ∗A ∗A + = λp (1 + t1 ) p1 a∗A 12 − a22 + (1 + t2 ) p2 a21 − a11 (1 + t1 ) (1 + t2 ) ∗A a∗A 21 − a11 λR nA E A E B λw w A w B A B Then using the expressions for a∗q ij it follows that x2 E times (24) produces 20 t1 − t2 ∗ ∗ ∗ ∗ = λp ((1 + t1 ) p1 (f12 − f22 ) + (1 + t2 ) p2 (f21 − f11 )) + (1 + t1 ) (1 + t2 ) ∗ ∗ λw wA wB (f21 − f11 ) λ R E A E B n A xA 2 Now note in (25) that if leisure is weakly separable from goods a mimicking A has the same consumption plan as a B and therefore the coefficient of λm is zero. Then, A following the pattern above, xB 2 E times (25) produces t1 − t2 ∗ ∗ ∗ ∗ = λp ((1 + t1 ) p1 (f22 − f12 ) + (1 + t2 ) p2 (f11 − f21 )) + (1 + t1 ) (1 + t2 ) ∗ ∗ − f21 ). λw wA wB (f11 λ R E A E B n B xB 2 Adding the last two equations B B λ R E A E B n A xA 2 + n x2 t1 − t2 = 0 or t1 = t2 . (1 + t1 ) (1 + t2 ) Lump-sum taxes on the high earner Before mimicking begins, λL = λw = λm = 0. Then from (26) we know λp = 0. Suppose we normalize tax rates by setting t1 = 0. Then from (18), (19), (20) and (12) we know that t1 = t2 = tH = 0. tL = 0 follows from (22), (23) and (13). With mimicking, equation (26) shows that λp is still zero if leisure is weakly separable from B goods, because in this case xA jm = xj , j = 1, 2; mimicking As spend their money the same way Bs do. In this case, the argument above shows that tH = t1 = t2 = 0. What about tL ? From (21) or (22) λR nB E B = 1 − λm EB , A Em and from (23) λR nB A B wB m a3m w . = 1 − λ A aB aB 3 3 w Using (13), the first equation divided by the second is A 1 − λm E B /Em 1−t = . B A B 1 − λm (aA 3m w ) /(a3 w ) L 21 When mimicking starts λm rises from zero and therefore tL will increase from zero if B aA EB aA 1m 3m w = A > B A or Em aB a3 w 1 B A aA 1m a3 w >1 B B aA 3m a1 w Since a1 /a3 is the MRS between good 1 and leisure, and mimicking As enjoy more leisure and the same utility as As, B A B A aA aA 1m a3 w 1 a3 w > . B B B B aA aA 3m a1 w 3 a1 w (29) and the right side equals unity when mimicking starts. Thus tL rises from zero when mimicking starts. What happens when mimicking starts if leisure is not weakly separable from goods? Suppose good 1 is more complementary with leisure than is good 2, a∗13 > a∗23 . Since mimicking As have the same goods budget as Bs but more leisure then in (26), m B A B is negative and therefore λp rises from xA 1m > x1 , x2m < x2 , the coefficient of λ zero as mimicking begins. Then from (18) we can see that t1 > 0 and from (19), t2 < 0. The argument above for tL rising from zero as mimicking begins still holds because (29) is a strict inequality when mimicking begins, while λp is zero. A proportional earnings tax on the high earner This section derives results for the case where lump-sum taxes are inadmissible, λ > 0, but the tools of commodity tax rates are supplemented by a proportional earnings tax on the high earners, tH . The model is like that for a pure commodity tax regime except that λw = 0. Inspection of (26) reveals that starting at the private ∗A equilibrium the sign of λp must be the opposite of the sign of a∗A 13 −a23 . Suppose good ∗A 1 is more complementary with leisure than is good 2, a∗A 13 − a23 > 0. Look at (21) and (22). As we move away from the private equilibrium where λp = t1 = t2 = 0, λp becoming negative tells us that t1 becomes positive and t2 negative. The same equations show that when mimicking starts and λm rises above zero both tax rates A will rise from the E B /Em term. In either case good 1 is taxed at a higher rate than good 2 if good 1 is more complementary with leisure. Using (12), the ratio of (20) to (18) is L 22 ∗A B ∗A L ∗A λA + λp aB 1 + t1 2 a31 − a1 a32 − λ a33 . = B ∗A ∗A L ∗A 1 − tH λA + λp (aB 2 a11 − a1 a12 ) − λ a13 If the goods are equally complementary with leisure then λp = 0 and moving away from the private equilibrium the numerator rises, the denominator falls, the ratio on the right-hand side rises and thus tH rises above zero (remember that in this case commodity tax rates are zero). When mimicking begins the upf must become flatter which means that λL falls, and tH falls. If the two goods are not equally complementary with leisure the λp terms moderate the movements in the numerator and denominator. In either case, however, the pattern for tH is the same. Before mimicking starts tH is the primary instrument for redistribution; after mimicking kicks in reductions in tH are used to prevent the As from mimicking the Bs and commodity taxes are used to raise revenue for redistribution. Throughout, the good that is more complementary with leisure is taxed at the higher rate. 23 References [1] Atkinson, A. and J. Stiglitz, 1976, The design of tax structure: direct versus indirect taxation, Journal of Public Economics 6, 55–75. [2] Auerbach, A.J., 1979, A brief note on a non-existent theorem about the optimality of uniform taxation, Economics Letters 3, 49–52. [3] Besley, T. and I. Jewitt, 1995, Uniform taxation and consumer preferences, Journal of Public Economics 58, 73–84. [4] Burbidge, J.B., 2015, Using distance functions to understand interest taxation, Canadian Journal of Economics, forthcoming: see https://artsonline.uwaterloo.ca/jburbidg/ [5] Corlett, W.J. and D.C. Hague, 1953-54, Complementarity and the excess burden of taxation, The Review of Economic Studies, 21, pp. 21–30. [6] Deaton, A.S., 1979, The distance function and consumer behaviour with applications to index numbers and optimal taxation, Review of Economic Studies 46, 391–405. [7] Diamond, P.A. and J.A. Mirrlees, 1971, Optimal taxation and public production, American Economic Review 61, 8–27 and 261–278. [8] Mirrlees, J.A., 1971, An exploration in the theory of optimum income taxation, The Review of Economic Studies 38, 175–208. [9] Piketty, T. and E. Saez, 2013, Optimal labor income taxation, in A. Auerbach, R. Chetty, M. Feldstein and E. Saez, editors, Handbook of Public Economics, vol. 5 (Amsterdam: Elsevier-North Holland), 391-474. [10] Ramsey, F.P., 1927, A contribution to the theory of taxation, Economic Journal 37, 47–61. [11] Sadka, E., 1976, On income distribution, incentive effects and optimal income taxation, Review of Economic Studies, 43, 261–267. [12] Seade, J.K., 1977, On the shape of optimal tax schedules, Journal of Public Economics, 7, 203–236. 24
© Copyright 2025