The Role of Commodity Taxation in Pareto Efficient Tax Structures

The Role of Commodity Taxation in Pareto
Efficient Tax Structures for Redistribution
John Burbidge∗
25 May 2015
Abstract
Ramsey (1927) studied the problem of how to structure commodity taxes
efficiently when some commodities cannot be taxed. Mirrlees (1971) studied
efficient nonlinear income taxes for redistribution when the government can
observe only earnings. Deaton (1979) argued persuasively that using distance
functions and having the government choose commodities and leisure directly,
rather than tax rates, offer the easiest route to understanding the Ramsey
problem. This paper combines these papers in the simplest setting possible:
two types, two goods and leisure, and a government that wants to redistribute
from the high-wage type to the low-wage type but the government observes
only earnings and consumption of each good. Piketty and Saez correctly observe that the literature that flowed from Mirrlees (1971) shifted emphasis away
from using commodity taxes to redistribute income to using nonlinear earnings
taxation (2013, p. 402). I argue in the present paper that the distance-function
method of framing and solving optimal tax problems highlights an alternative
path the post-Mirrlees literature might have followed and, along this path, the
role of commodity taxation in redistribution is much larger.
Keywords: Optimal taxation, distance function, separability
JEL claasification H21
∗
Department of Economics, University of Waterloo, Waterloo, Ontario, Canada, N2L 3G1, [email protected]. I thank Lutz-Alexander Busch, Christoph L¨
ulfesmann, John Revesz, C´esar
Sosa-Padilla, Michael Veall, and seminar participants at McMaster University, the University of
Waterloo and the 2014 Canadian Economics Association Meetings in Vancouver, for helpful conversations.
1
1
Introduction
Ramsey (1927) studied the problem of how to structure commodity taxes efficiently
when some commodities cannot be taxed. Mirrlees (1971) studied efficient nonlinear
income taxes for redistribution when the government can observe only earnings.
Deaton (1979) argued persuasively that using distance functions and having the
government choose commodities and leisure directly, rather than tax rates, offer the
easiest route to understanding the Ramsey problem. This paper combines these
papers in the simplest setting possible: two types, two goods and leisure, and a
government that wants to redistribute from the high-wage type to the low-wage
type but the government observes only earnings and consumption of each good.
Piketty and Saez correctly observe that the literature that flowed from Mirrlees
(1971) shifted emphasis away from using commodity taxes to redistribute income to
using nonlinear earnings taxation (2013, p. 402). I argue in the present paper that
the distance-function method of framing and solving optimal tax problems highlights
an alternative path the post-Mirrlees literature might have followed and, along this
path, the role of commodity taxation in redistribution is much larger.
The government’s ability to redistribute depends, of course, on the constraints
on its behaviour. The use of distance functions leads to a natural sequence of constraints. The most restrictive set of constraints, label this case (a), captures a setting
with commodity tax rates alone. Relaxing one constraint depicts a more powerful
government that can be interpreted as levying a proportional earnings tax on the
high earner as well as commodity taxes; label this case (b). In case (c), relaxing
one more constraint captures the nonlinear earnings tax structure, together with
commodity taxes, studied by Sadka (1976), Seade (1977) and Atkinson and Stiglitz
(1976). I prove that if preferences are homothetic and leisure is weakly separable
from goods then in case (a) the government can redistribute only until the mimicking
constraint binds. Relaxing either homotheticity or weak separability of leisure may
permit redistribution beyond this point. To derive the Ramsey tax results using
distance functions the modeler must impose a constraint that rules out lump-sum
taxes. The constraint that must be relaxed to move from case (b) to case (c) is
precisely this constraint, as it applies to the high earner. So, for example, the famous zero-marginal tax rate result on the high earner follows from not imposing the
constraint that leads to the Ramsey tax results. Re-imposing the constraint that
induces the Ramsey tax results, that is, moving from (c) back to (b), opens a scenario in which both nonlinear earnings taxes and commodity taxes are important for
redistribution, even if leisure and goods are weakly separable. I show that, starting
at the private equilibrium, and before the mimicking constraint binds, the earnings
2
tax on the high earner is the primary instrument for redistribution. Here the Corlett and Hague (1953-54) intuition applies — the commodity tax rate on the good
most complementary with leisure is positive; the rate on the other good is negative.
Once the mimicking constraint binds reducing the earnings tax rate becomes the
primary instrument to prevent mimicking and commodity tax rates rise to generate
the revenue required for further redistribution.
Section 2 restates Deaton’s distance function method for solving the Ramsey
problem. Section 3 develops the full set of constraints on the government’s behaviour
and states the government’s optimization problem. Section 4 presents the results for
each of the three cases listed above. Section 5 summarizes and concludes.
2
Ramsey with two goods and leisure
With two goods and leisure, each person’s budget constraint can be written as
p1 x1 + p2 x2 + wl = wL,
where L is the time endowment and (p1 , p2 , w) are the prices of (x1 , x2 , l).1
Assume that the government wishes to raise given revenue with the smallest
decrease in utility and lump-sum taxes are ruled out. If the government were able
to tax leisure directly a proportional tax rate on goods and leisure at rate t would
function as a lump-sum tax on the time endowment because the following budget
constraints
(1 + t)(p1 x1 + p2 x2 + wl) = wL
p1 x1 + p2 x2 + wl = w(1 − t∗ )L
are the same if
1
= 1 − t∗ .
1+t
Assuming that leisure cannot be taxed directly and that the government uses
proportional tax rates, the options in the present setting are commodity taxes, t1
and t2 and an earnings tax, te . Since the following equations
(1 + t1a )p1 x1 + (1 + t2a )p2 x2 = (1 − te )w(L − l)
1
Much of this section is drawn from Burbidge, CJE, 2015.
3
(1 + t1 )p1 x1 + (1 + t2 )p2 x2 = w(L − l)
are the same if tj are defined by
1 + tja
,
1 − te
assuming the government uses only commodity tax rates does not diminish the government’s effectiveness.
Second-best optimal commodity tax rates must address the only issue standing in
the way of a first-best outcome — leisure cannot be taxed directly. Thus, as Corlett
and Hague (1953-54) suggested, the answer to the question of whether good 1 or good
2 should be taxed at a higher rate must be that whichever of goods 1 or 2 is more
complementary with leisure should be taxed at a higher rate. Since the deadweight
loss of any tax works solely off substitution effects and these are defined with utility
held constant the required construct must isolate the relationship between good 1
and leisure and good 2 and leisure, with utility held constant. This construct is
the distance function. If u(x1 , x2 , l) is the ordinary utility function d(x1 , x2 , l, u0 ) is
defined by
x2
l
x1
,
,
= u0 ,
(1)
u
d(x1 , x2 , l, u0 ) d(x1 , x2 , l, u0 ) d(x1 , x2 , l, u0 )
1 + tj ≡
that is, d(x1 , x2 , l, u0 ) is the number by which an arbitrary consumption vector must
be scaled to deliver utility level u0 , and
u(x1 , x2 , l) ≥ u0 if and only if d(x1 , x2 , l, u0 ) ≥ 1.
(2)
For the moment, set all tax rates to zero to reduce clutter. The next few paragraphs follow Deaton (1979) in showing the connections between the distance function and the expenditure function, e (p1 , p2 , w, u0 ).
The scaled vector
x2
l
x1
,
,
,
d(x1 , x2 , l, u0 ) d(x1 , x2 , l, u0 ) d(x1 , x2 , l, u0 )
will deliver a utility level of u0 and so the expenditure function must satisfy
e (p1 , p2 , w, u0 ) ≤
p1 x1 + p2 x2 + wl
.
d(x1 , x2 , l, u0 )
4
(3)
But if (x∗1 , x∗2 , l∗ ) are chosen to be the Hicksian demand levels for these prices and
utility level u0 then d (x∗1 , x∗2 , l∗ , u0 ) = 1 and
e (p1 , p2 , w, u0 ) =
p1 x∗1 + p2 x∗2 + wl∗
.
d(x∗1 , x∗2 , l∗ , u0 )
Thus
e (p1 , p2 , w, u0 ) =
Min p1 x1 + p2 x2 + wl
.
x1 , x2 , l d(x1 , x2 , l, u0 )
(4)
From (3)
d (x1 , x2 , l, u0 ) ≤
p1 x1 + p2 x2 + wl
e (p1 , p2 , w, u0 )
Now let prices be (p∗1 , p∗2 , w∗ ) which (not uniquely) generate a budget plane tangent
to the u0 indifference surface at the point where the ray from the origin to (x1 , x2 , l)
cuts this indifference surface.2 Then
d (x1 , x2 , l, u0 ) =
p∗1 x1 + p∗2 x2 + w∗ l
e (p∗1 , p∗2 , w∗ , u0 )
so
d (x1 , x2 , l, u0 ) =
Min p1 x1 + p2 x2 + wl
.
p1 , p2 , w e (p1 , p2 , w, u0 )
(5)
Equations (4) and (5) clearly show the duality between the distance function and
the expenditure function. Just as the Hessian of the expenditure function must be
symmetric and negative semi-definite, so must the Hessian of the distance function
(the Antonelli matrix) be symmetric and negative semi-definite (again, see Deaton
(1979)).
Reintroduce commodity tax rates. Since earnings are the only source of income
for the household, earnings are not taxed and initial prices (p1 , p2 , w) are assumed
to be constant,
e ((1 + t1 )p1 , (1 + t2 )p2 , w, u0 ) = wL.
Applying the envelope theorem to (5) with commodity tax rates in place, obtain
2
See Figures 1 and 2 in Deaton (1979).
5
∂d (x1 , x2 , l, u0 )
(1 + t1 )p1
=
∂x1
wL
(1 + t2 )p2
∂d (x1 , x2 , l, u0 )
=
a2 (x1 , x2 , l, u0 ) ≡
∂x2
wL
∂d (x1 , x2 , l, u0 )
w
1
a3 (x1 , x2 , l, u0 ) ≡
=
=
∂l
wL
L
a1 (x1 , x2 , l, u0 ) ≡
(6)
(7)
(8)
Not only must the Antonelli matrix be symmetric and negative semi-definite, but
pre-multiplying it by [x1 x2 l] must yield a row vector of zeros, for any admissible
[x1 x2 l]. Most of the results in this paper depend on the signs of a∗ij ≡ aij /ai . I
will assume that elements on the main diagonal of the a∗ij matrix are negative, the
off-diagonal elements are positive, and in any row the magnitude of the diagonal
element is larger than any of the off-diagonal elements.
The Ramsey problem can be solved by assuming the government maximizes utility, or equivalently d (x1 , x2 , l, u0 ), subject to a minimum revenue requirement and
a constraint that rules out lump-sum taxes. In this model the government obtains
whatever the person does not consume, that is, wL − p1 x1 − p2 x2 − wl. Denoting
the revenue requirement by R the revenue constraint can be written as
wL − p1 x1 − p2 x2 − wl − R ≥ 0,
and the Lagrange multiplier on this constraint must be nonnegative. A first-best
solution to the problem would be to employ a lump-sum tax, T . From (8), a lumpsum tax would imply
w
1
w
≥
= .
wL − T
wL
L
Thus the absence of lump-sum taxes implies the government must deal with the
constraint that
a3 (x1 , x2 , l, u0 ) =
1
− a3 (x1 , x2 , l, u0 ) ≥ 0,
L
(9)
and the Lagrange multiplier on this constraint must be nonnegative. From this
discussion the Lagrangian is
L = d (x1 , x2 , l, u0 ) + λ
R
L
wL − p1 x1 − p2 x2 − wl − R + λ
6
1
− a3 (x1 , x2 , l, u0 )
L
The first-order conditions are
∂L
= a1 − λR p1 − λL a31
∂x1
∂L
= a2 − λR p2 − λL a32
0 =
∂x2
∂L
= a3 − λR w − λL a33
0 =
∂l
0 =
Using the expressions (6)-(8) and the symmetry of the Antonelli matrix
1 = λR wL
1
1
+ λL a∗13 = λR wL
+ λL a∗23 = λR wL + λL a∗33 .
1 + t1
1 + t2
So then
1
λ
−
= λ wL 1 −
1 + t2
t1 − t2
λL
=
(a∗13 − a∗23 )
R
(1 + t1 ) (1 + t2 )
λ wL
L
(a∗23
a∗33 )
R
or
t1 − t2 = t2 (1 + t1 )
a∗13 − a∗23
a∗23 − a∗33
On my assumptions, a∗23 − a∗33 > 0. a∗j3 measures the degree of complementarity
between good j and leisure holding utility constant. If good 1 is more complementary
with leisure than is good 2 then it is efficient to tax good 1 at a higher rate than
good 2 because doing so helps compensate for the inability to tax leisure directly.
Since
∂ ln (a1 /a2 )
= a∗13 − a∗23
∂l
and a1 and a2 are positive I can write
∂ (a1 /a2 )
R 0.
(10)
∂l
In particular, a necessary and sufficient condition for the efficiency of equal proportional taxation of commodities 1 and 2 is that a1 /a2 be independent of leisure.
t1 R t2 if and only if
7
The result in (10) is the same as equation (52) in Deaton (1979) and, given the
duality between the expenditure function and the distance function described by
Deaton (1979), (5.1) in Besley and Jewitt (1995).
∂ (h1 /h2 )
R 0,
(11)
∂w
where hj = ∂e (p1 , p2 , w, u0 ) /∂pj is the Hicksian demand for commodity j. The
slip in Deaton (1979), which Besley and Jewitt correct, is in jumping from (10) to
saying that (10) holds if and only if leisure and goods are implicitly separable, that
is, d (x1 , x2 , l, u0 ) can be written as d∗ (f (x1 , x2 , u0 ) , l, u0 ).3
t1 R t2 if and only if
3
Taxation for redistribution
Consider an economy with two types of price-taking agents like the agent discussed
above. A and B differ only in their wage rates, wA > wB . Earnings have to be
spent on goods 1 and 2, which are taxed at proportional rates t1 , t2 . Assume the
government wishes to redistribute money from the As to the Bs but it can observe
only earnings and consumption levels; this is the Mirrlees (1971) problem. I am
going to approach the problem by building on the previous section.
Clearly the government’s ability to redistribute efficiently depends on the instruments at its disposal. The equivalent of equations (6)-(8) in the present context
are
1 − tH wA
(1 + t1 ) p1 A (1 + t2 ) p2 A
; a2 =
; a3 =
=
EA
EA
E A
1 − tL w B
(1 + t1 ) p1 B (1 + t2 ) p2 B
B
; a2 =
; a3 =
,
a1 =
EB
EB
EB
aA
1
(12)
(13)
where E j is the total expenditure of agent j = A, B and tj , j = H, L are marginal
earnings tax rates on high and low earners. I need to employ some normalization
of tax rates and, for the moment, will set tL = 0. These equalities build in the
assumption that A and B pay the same prices for goods.
3
In a note on the literature that followed Atkinson and Stiglitz (1976), Auerbach (1979) showed
that with the additively separable utility function
1/2 1/2
u (x1 , x2 , l) = x1 x2
1/2
+ x1
+ l1/2 ,
uniform commodity taxation is never efficient in the Ramsey setting — the optimal level of t1 will
always exceed the optimal level of t2 . See footnote 4.
8
B B
B
I assume the government acts to maximize B’s utility, d xB
given
1 , x2 , l , u0
various constraints. One of these is a minimum level for A’s utility
A A
A
d xA
1 , x2 , l , u0 − 1 ≥ 0;
the Lagrange multiplier on this constraint, λA , must be nonnegative.
I assume the only purpose of taxation is for redistribution and thus another
constraint is that total revenue be nonnegative
B B
B
A A
A
≥ 0.
+ nB wB L − p1 xB
n A w A L − p 1 xA
1 − p 2 x2 − w l
1 − p 2 x2 − w l
nj is the number of each type and the Lagrange multiplier associated with this
constraint, λR , must be nonnegative. Each B will be given a cash transfer equal
to total revenue divided by nB . T L ≤ 0 denotes the lump-sum “tax” for each low
earner, which is a B, E B = wB L − T L ≥ wB L.
The most efficient way to redistribute from the As to the Bs would be with a
lump-sum tax on the high earner, T H > 0. I show below that when T H is available
as an instrument, tH is unnecessary and would be set equal to zero. Thus, if T H > 0,
E A = wA L − T H < wA L and then from (12) aA
3 > 1/L. Therefore, the counterpart
of the no-lump-sum-tax constraint that is inequality (9) in the Ramsey model, can
be written in the Mirrlees model as
1
A
A A
A
− aA
3 x1 , x2 , l , u0 ≥ 0,
L
and the associated Lagrange multiplier, λL , must be nonnegative.
With a lump-sum tax on the higher earners ruled out a second-best instrument
would be a marginal-equals-average earnings tax rate, tH , on the high earner. Using
(12) and (13) for leisure and good 1 one could write
(1 + t1 ) p1 aA
3
= 1 − tH
w A aA
1
(1 + t1 ) p1 aB
3
= 1
w B aB
1
or, subtracting the second equation from the first,
A
a3
aB
3
(1 + t1 ) p1
− B B = −tH .
w A aA
w a1
1
9
With tH > 0, and with no lump-sum tax on the high earners (T H = 0),
aA
aB
3
− B3 B < 0.
A
A
w a1
w a1
Without tH as an instrument the government has to live with the constraint that
aB
aA
3
3
−
≥ 0.
B aB
w A aA
w
1
1
This can be rewritten as
B
aA
aA
3
1 a3
−
≥ 0.
B
w A aB
1 w
(14)
If the relative price of leisure and good 2 had been used instead of the relative price
of leisure and good 1 the corresponding inequality would have been
B
aA
aA
2 a3
3
−
≥ 0.
B
w A aB
2 w
(15)
Clearly, whether one employs the relative price of leisure and good 1 or the relative
price of leisure and good 2 may affect the way constraints are written and the signs
of the associated Lagrange multipliers, but this choice cannot affect the implications
of the model for tax policy. Observe that inequality (14) implies (15) when
aA
aA
1
2
≥
B
aB
a
1
2
(16)
and (15) implies (14) when
aA
aA
2
1
≥
.
B
aB
a
2
1
This paper uses inequality constraints that are the equivalent of (14) and (16).
B B B B B
A A A A A
A
A A
A
B B
B
B B
B
w A aA
1 x1 , x2 , l , u0 a3 x1 , x2 , l , u0 −w a1 x1 , x2 , l , u0 a3 x1 , x2 , l , u0 ≤ 0,
with the Lagrange multiplier denoted, λw ≤ 0, and
10
B B B B B
B B B B B
A
A A
A
A
A
A A
A
aA
1 x1 , x2 , l , u0 a2 x1 , x2 , l , u0 − a2 x1 , x2 , l , u0 a1 x1 , x2 , l , u0 ≥ 0,
with the Lagrange multiplier denoted λp ≥ 0. Simply put, these two constraints
are one way of ruling out a proportional earnings tax rate on the high earner and
imposing the constraint that A and B pay the same prices for goods.
Finally, the government might be prevented from setting different commodity
A
tax rates so that t1 = t2 . If this were true it would mean that p1 aA
2 = p 2 a1 .
And then the equivalent condition for person B would follow from the constraint
that A and B face the same prices of goods; this is the constraint whose Lagrange
A
multiplier is λp . Denote the Lagrange multiplier for the p1 aA
2 = p2 a1 constraint by
λt . At this point the government faces six constraints, with Lagrange multipliers
λj , j = A, R, L, p, w, t and six choice variables which are goods consumption and
leisure for A and B. Starting at the private equilibrium where the commodity tax
rate, t1 = t2 = t = 0, raising t and giving the revenue to the Bs, will move us
along the utility possibility frontier in the direction of lower A utility. At some point
each A will realize that her utility would be higher if she pretended to be a low
earner (a B) and was eligible for the cash transfer. When an A mimics a B, the
A
A
) equal the earnings
, to make her earnings wA (L − lm
mimicking A chooses leisure, lm
B
B
of a B, w (L − l ). If she does this she receives the cash transfer, −T L , and will
face a budget constraint for goods 1 and 2 that is identical to that faced by each B.
If leisure and goods are not weakly separable typically she will choose to consume
A
> lB . If leisure
more of the good that is complementary with leisure because lm
and goods are weakly separable then a mimicking A will consume the same bundle
of goods as each B. In the present setting with only commodity tax rates and t1
forced to equal t2 the mimicking constraint is a seventh constraint that prevents
further redistribution. With only six instruments, once the mimicking constraint
binds, further redistribution is impossible.
Can the problem be solved by allowing t1 and t2 to differ?
Before mimicking binds and dropping the constraint that t1 = t2 we have six
choice variables and five constraints. With mimicking we have two extra constraints
A
and two new choice variables, xA
1m and x2m . One of the extra constraints is that
A’s utility acting as an A be at least as high as the utility of an A mimicking a
B. Given that we
already have a constraint that sets a lower bound on A’s utility,
A
A A
A
d x1 , x2 , l , u0 − 1 ≥ 0, the no-mimicking constraint can be written as
A
A
A
1 − d xA
1m , x2m , lm , u0 ≥ 0;
the Lagrange multiplier on this constraint, λm , must be nonnegative. The other side
of the observation that to prevent mimicking the government must make the utility
11
of an A at least as large as the utility of an A mimicking a B is that, to discourage
mimicking, the government would like to have an instrument that would discourage
mimicking by pushing the goods budget of a mimicking A below the goods budget
for a B:
B
B
B
B B
B
B
B
B B
B
aB
1 x1 , x2 , l , u0 x1 + a2 x1 , x2 , l , u0 x2 ≥
A
A
B
B B
B
B
B
B B
B
aB
x
,
x
,
l
,
u
x
+
a
x
,
x
,
l
,
u
1
1
2
0
1m
2
1
2
0 x2m .
The absence of such an instrument means that
B
B
B
B B
B
B
B
B B
B
aB
1 x1 , x2 , l , u0 x1 + a2 x1 , x2 , l , u0 x2 ≤
A
A
B
B B
B
B
B
B B
B
aB
1 x1 , x2 , l , u0 x1m + a2 x1 , x2 , l , u0 x2m or
B
B
B
B B
B
B
B
B B
B
aB
x1 − xA
x2 − xA
1 x1 , x2 , l , u0
1m + a2 x1 , x2 , l , u0
2m ≤ 0.
Label the Lagrange multiplier on this constraint λc ≤ 0.
The optimization problem for the government can now be written as
Opt
A A
xA
1 , x2 , l
B
B B
x1 , x2 , l
A
xA
1m , x2m
B B
B
d xB
1 , x2 , l , u0 +
λj , j = A, R, L, w,
p, m, c
A A
A
λ A d xA
,
x
,
l
,
u
−
1
+
1
2
0
A
A A
B
B B
λ R n A w A L − p 1 xA
+ nB wB L − p1 xB
+
1 − p 2 x2 − w l
1 − p 2 x2 − w l
1
A
A A
A
− aA
x
,
x
,
l
,
u
+
λL
3
1
2
0
L
B B B B B
A A A A A A
A A
A
B B
B
B B
B
λ w w A aA
+
1 x1 , x2 , l , u0 a3 x1 , x2 , l , u0 − w a1 x1 , x2 , l , u0 a3 x1 , x2 , l , u0
A A
A
B
B
B B
B
B
A
A
A A
A
B
B
B B
A
+
λ p aA
1 x1 , x2 , l , u0 a2 x1 , x2 , l , u0 − a2 x1 , x2 , l , u0 a1 x1 , x2 , l , u0
A
m
A
A
A
λ 1 − d x1m , x2m , lm , u0 +
B
B
B
B B
B
B B
B
B
B
x2 − xA
x 1 − xA
λ c aB
2m
1m + a2 x1 , x2 , l , u0
1 x1 , x2 , l , u0
where
A
wA L − lm
= wB L − lB
12
(17)
4
Results
I discuss results in three settings. In the first, the government has only commodity
taxes, t1 , t2 . In the second, I go to the other extreme and endow the government
with a lump-sum tax on the high earners, T H , together with marginal earnings tax
rates on the high and low earners, tH , tL , as well as commodity taxes. In the third, I
examine an intermediate setting where the government has access to a proportional
earnings tax rate on the high earners, tH , as well as commodity tax rates.
In the appendix I prove that, with any of these tax systems, and with or without
the mimicking constraint binding, it is efficient to set t1 = t2 if preferences are
homothetic and leisure is weakly separable from goods. Later in the paper I provide
an example where equal commodity tax rates are efficient but leisure is additively
separable from goods so homotheticity and the separability of leisure from goods
are sufficient but not necessary conditions for the efficiency of uniform commodity
taxation. Inspection of the proof reveals that homotheticity and separability imply
goods 1 and 2 are equally complementary with leisure, holding utility constant, for
∗j
both types — a∗j
13 = a23 , j = A, B. A central theme of the results in this paper is
that there is a very tight relationship between the Ramsey problem and the Mirrlees
problem.
4.1
Only commodity tax rates
In this setting, with homotheticity and separability of leisure, there is, in effect, one
instrument, a uniform commodity tax rate, t1 = t2 ≡ t, available for redistribution.
Starting at the private equilibrium and raising the commodity tax rate with the
revenue given to the low earners will move us along the upf in the direction of higher
utility for Bs. Once the mimicking constraint binds, there is no extra instrument to
prevent As from mimicking Bs, and further redistribution is impossible. This is the
case, for example, with Cobb-Douglas preferences:
u (x1 , x2 , l) = xα1 xβ2 l1−α−β ,
α > 0, β > 0, α + β < 1.
If one steps away from homotheticity or separability between leisure and goods
∗j
then typically a∗j
13 6= a23 , j = A, B. Given the results in section 2 one would expect
that, along the upf before mimicking starts, the sign of t1 − t2 matches the sign of
∗j
a∗j
13 − a23 , j = A, B — if good 1 is more complementary with leisure than is good 2,
it is efficient to tax good 1 at a higher rate. Recalling that λw < 0, inspection of
equation (25) in the appendix shows this to be true.
13
What happens when mimicking begins? Now there are two instruments, t1 and
t2 , to cope with the targets of increasing the utility of a B while keeping the utility of
an A mimicking a B as high as the utility of an A. Coping with mimicking is possible,
but barely so. Consider a step up the upf; uB must increase and uA and uA
m must fall
B
A
B
A
B
by the same amount. If dl were negative or zero, dlm = (w /w )dl would have to
fall by less or the same amount. For uB to rise, the goods budget for B would have
to increase but this goods budget is the same for a mimicking A and a B. Therefore,
B
A
B
A
> 0
if uB went up so would uA
m , which is impossible. Thus, dlm = (w /w )dl
and the goods budget for a B and a mimicking A must fall. This could not occur
if both tax rates increased; one must fall and the other must increase. Changing
commodity taxes to increase the consumption of the good most complementary with
leisure encourages the Bs to work less which discourages the As from mimicking
them. In other words, the mimicking constraint switches the government’s problem
away from trying to tax leisure indirectly to trying to tax work; the Corlett-Hague
intuition is reversed. For example, if good 1 is more complementary with leisure
∗j
than is good 2, a∗j
13 > a23 , j = A, B, then t1 > t2 on the upf when mimicking starts,
and, as we move further up the upf, t1 falls and t2 rises. At some point t1 may equal
t2 which proves that homotheticity and weak separability are not necessary for the
efficiency of uniform commodity taxation in this model (see the examples described
in footnote 4).
4.2
Lump-sum taxes on the high earner
As one might expect, until mimicking starts, a lump-sum tax on the high earner is a
first-best instrument — no other tax instrument needed. When mimicking starts the
results in the appendix confirm the results in the literature. It is efficient to continue
to have a zero marginal earnings tax rate on the high earners, tH = 0, but to have a
marginal earnings tax rate on the low earners, tL > 0. This discourages the Bs from
working which discourages the As from mimicking the Bs. I prove that if leisure is
∗j
not weakly separable from goods, and a∗j
13 > a23 , j = A, B, t1 > 0 and t2 < 0, and
vice versa.
4.3
A proportional earnings tax on the high earner and commodity taxes
Here the no-lump-sum tax constraint is still binding but now λw = 0. In the appendix
I prove the following results which are very intuitive. Until mimicking begins it is
efficient to use the proportional earnings tax on the high earner as the primary
14
instrument for redistribution. This is supplemented by commodity taxation in the
following way: if good 1 is more complementary with leisure than is good 2 then
t1 > 0 and t2 < 0, and vice versa. Once mimicking begins the earnings tax on the
high earner becomes the primary instrument to prevent the As from mimicking the
Bs; it peaks at the point where mimicking begins and then falls. The role vacated
by the earnings tax on the high earner is picked up by increases in commodity tax
rates, and, again, the rate structure follows the Ramsey rule; t1 is higher than t2 if
good 1 is more complementary with leisure than is good 2.4
5
Summary and conclusions
The utility possibility frontiers attainable through taxation and redistribution depend on the quality of the instruments available to the government. The literature
that followed Mirrlees (1971) chose to drop the constraint that was the basis for the
optimal commodity tax literature that followed Ramsey (1927). This was one path
and it led economists to argue that commodity taxation should play only a minor
role in redistribution. Another path, the one highlighted in this paper, is to study
nonlinear earnings taxes, maintaining the constraint on government behaviour that
is the basis of the Ramsey model. In this setting both earnings and commodity taxation have important roles to play in redistribution. It is a great distance between
the real world and the models in this paper but it is difficult not to notice that for
many countries differential commodity taxes raise a substantial share of the money
used for redistribution and very few countries have a zero marginal earnings tax rate
for their highest earners.
4
The algebra and sample tables of upfs for the following utility functions
1/2 1/2
+ x1
1/2
1/2 1/2
+ x1 l1/2
1/2 1/2
+ x2 l1/2
u (x1 , x2 , l)
=
x1 x2
u (x1 , x2 , l)
=
x1 x2
u (x1 , x2 , l)
=
x1 x2
u (x1 , x2 , l)
=
β 1−α−β
xα
1 x2 l
+ l1/2
1/2
1/2
are at: https://artsonline.uwaterloo.ca/jburbidg/node/4. In the first utility function leisure is
additively separable and preferences are not homothetic. In the second and third utility functions
leisure is not weakly separable and preferences are homothetic. The fourth utility function is CobbDouglas; leisure is weakly separable and preferences are homothetic. The R code for the simulation
programs and further details are available from the author.
15
Appendix
Implications of the first-order conditions
The first-order conditions for the eight goods and leisure variables are
A B A
B B A
p
B A
B A
L A
w
w
a
a
−
w
a
a
0 = −λR nA p1 + λA aA
+
λ
a
a
−
a
a
−
λ
a
+
λ
3 11
1 31
1
2 11
1 21
31
A B A
B B A
B A
B A
L A
w
R A
A A
p
0 = −λ n p2 + λ a2 + λ a2 a12 − a1 a22 − λ a32 + λ w a3 a12 − w a1 a32
A
B B A
B A
L A
w
A
p
w A aB
aB
0 = −λR nA wA + λA aA
2 a13 − a1 a23 − λ a33 + λ
3 +λ
3 a13 − w a1 a33
A A B
B A B
w
A B
A B
R B
p
w
a
a
−
w
a
a
+
λ
a
−
a
a
a
−
λ
n
p
+
λ
0 = aB
1
1
3 11 +
1 B21 B 2 11A 1 31
c
B
B
A
B
λ a11 x1 − x1m + a21 x2 − x2m + a1
R B
p
B
A B
w
B
B A B
0 = aB
aA
w A aA
2 − λ n p2 + λ
1 a22 − a2 a12 + λ
1 a32 − w a3 a12 +
B
A
B
B
A
B
λc aB
12 x1 − x1m + a22 x2 − x2m + a2
R B B
p
B
A B
w
B
B A B
0 = aB
aA
w A aA
3 −λ n w +λ
1 a23 − a2 a13 + λ
1 a33 − w a3 a13 +
B
B
A
B
B
A
m A w
λc aB
x
−
x
+
a
x
−
x
−
λ
a
13
1
1m
23
2
2m
3m A
w
m A
c B
0 = −λ a1m − λ a1
c B
0 = −λm aA
2m − λ a2
Note the last two equations imply that
λc aB
= −λm aA
1
1m
c B
m A
λ a2 = −λ a2m
EB
λc = −λm A
Em
Using these equations we have
16
A B A
B B A
B A
B A
L A
w
p
w
a
a
−
w
a
a
a
a
−
a
a
−
λ
a
+
λ
0 = −λR nA p1 + λA aA
+
λ
3 11
1 31
2 11
1 21
31
1
A B A
B B A
B A
B A
L A
w
p
w
a
a
−
w
a
a
a
−
a
a
−
λ
a
+
λ
0 = −λR nA p2 + λA aA
+
λ
3 12
1 a32
2 12
1 22
32
2
A
B B A
B A
L A
w
A
p
w A aB
aB
0 = −λR nA wA + λA aA
2 a13 − a1 a23 − λ a33 + λ
3 +λ
3 a13 − w a1 a33
A A B
B A B
w
A B
A B
R B
p
w
a
a
−
w
a
a
+
λ
a
−
a
a
a
−
λ
n
p
+
λ
0 = aB
1
2 11
1 21
1
1 31
3 11
B
E
B
B
A
−λm A aB
x B − xA
− λm aA
1m + a21 x2 − x2m
1m
Em 11 1
B
B A B
B
A B
w
R B
p
w A aA
aA
0 = aB
1 a32 − w a3 a12
1 a22 − a2 a12 + λ
2 − λ n p2 + λ
EB
B
A
B
B
A
−λm A aB
− λm aA
12 x1 − x1m + a22 x2 − x2m
2m
Em
R B B
p
B
A B
w
B
B A B
0 = aB
aA
w A aA
3 −λ n w +λ
1 a23 − a2 a13 + λ
1 a33 − w a3 a13
B
EB
B
A
B
B
A
m A w
x
−
x
+
a
x
−
x
−
λ
a
−λm A aB
1m
23
2
2m
3m A
Em 13 1
w
Using (12) and (13) we have
17
1
∗A
B ∗A
L ∗A
= λA + λp aB
2 a11 − a1 a12 − λ a13 +
1 + t1
∗A
B B ∗A
λw wA aB
3 a11 − w a1 a13
1
∗A
B ∗A
L ∗A
λR nA E A
= λA + λp aB
2 a21 − a1 a22 − λ a23 +
1 + t2
∗A
B B ∗A
λw wA aB
3 a21 − w a1 a23
wA
∗A
B ∗A
L ∗A
λR nA A = λA + λp aB
2 a31 − a1 a32 − λ a33 +
a3
B B ∗A
∗A
λw wA aB
3 a31 − w a1 a33
1
∗B
A ∗B
w
∗B
B A ∗B
= 1 + λp aA
w A aA
λR nB E B
1 a12 − a2 a11 + λ
1 a13 − w a3 a11
1 + t1
B
B
E
B
A
∗B
B
A
mE
−λm A a∗B
x
−
x
+
a
x
−
x
−
λ
1m
12
2
2m
A
Em 11 1
Em
1
∗B
A ∗B
w
A A ∗B
B A ∗B
λR nB E B
= 1 + λp aA
a
−
a
a
+
λ
w
a
a
−
w
a
a
1 22
2 21
1 23
3 21
1 + t2
B
EB
B
A
∗B
B
A
mE
x
−λm A a∗B
−
x
+
a
x
−
x
−
λ
1m
22
2
2m
A
Em 21 1
Em
wB
∗B
A ∗B
w
∗B
B A ∗B
λR nB B = 1 + λp aA
w A aA
1 a32 − a2 a31 + λ
1 a33 − w a3 a31
a3
B
A
EB
B
A
∗B
B
A
m a3m w
−λm A a∗B
x
−
x
+
a
x
−
x
−
λ
1m
32
2
2m
A
Em 31 1
aB
3 w
λR nA E A
(18)
(19)
(20)
(21)
(22)
(23)
Then (19) minus (18), and (22) minus (21) yield
t1 − t2
∗A
∗A
B
∗A
∗A
= λp aB
+
2 a21 − a11 + a1 a12 − a22
(1 + t1 ) (1 + t2 )
B B
∗A
∗A
∗A
w
∗A
∗A
λL a∗A
wA aB
3 a21 − a11 + w a1 a13 − a23
13 − a23 + λ
t1 − t2
∗B
∗B
A
∗B
∗B
+
a
−
a
+
a
a
−
a
= λ p aA
λR nB E B
1
22
12
2
11
21
(1 + t1 ) (1 + t2 )
∗B
∗B
B A
∗B
∗B
λw wA aA
+
1 a23 − a13 + w a3 a11 − a21
B
B
EB
∗B
∗B
∗B
λm A a∗B
x1 − xA
x2 − xA
11 − a21
1m + a12 − a22
2m
Em
λR nA E A
Then E B times (24) minus E A times (25) yields
18
(24)
(25)
∗A
0 = E B λL a∗A
13 − a23 +
∗A
∗B
∗B
∗A
∗B
∗B
∗A
+
λp (1 + t1 ) p1 a∗A
12 − a22 + a12 − a22 + (1 + t2 ) p2 a21 − a11 + a21 − a11
∗B
A
∗A
+
a∗B
λw (1 + t1 ) p1 wB a∗A
13 − a23
13 − a23 + w
∗B
∗B
B A A
∗A
∗A
A B B
w
λ w a3 E a21 − a11 + w a3 E a21 − a11 +
B
B
E AE B
∗B
∗B
A
∗B
∗B
A
λm
a
−
a
x
−
x
+
a
−
a
x
−
x
(26)
21
11
1
1m
22
12
2
2m
A
Em
With just commodity tax rates this is
∗A
0 = E B λL a∗A
13 − a23 +
∗A
A
∗B
λw (1 + t1 ) p1 wB a∗A
a∗B
+
13 − a23 + w
13 − a23
∗A
∗B
∗B
λp (1 + t1 ) p1 a∗A
12 − a22 + a12 − a22 +
∗A
∗B
∗B
λp (1 + t2 ) p2 + λw wA wB a∗A
21 − a11 + a21 − a11 +
B
B
E AE B
∗B
∗B
A
∗B
∗B
A
a
−
a
x
−
x
+
a
−
a
x
−
x
λm
21
11
1
1m
22
12
2
2m
A
Em
(27)
Proof that homotheticity and weak separability of leisure and goods
imply t1 = t2
If the utility function is homothetic there is a strictly increasing transformation
of it that is homogeneous of degree 1. Thus, with weak separability of leisure,
u (x1 , x2 , l) = f (x1 , x2 ) g(l)
and for all admissible values of goods and leisure, for all α > 0
f (αx1 , αx2 ) g (αl) = αf (x1 , x2 ) g(l) = αγ f (x1 , x2 ) α1−γ g(l).
Thus f is homogeneous of degree γ and using Euler’s theorem its first derivatives are
homogeneous of degree γ − 1 and its second derivatives are homogeneous of degree
γ − 2. Then for i, j, k = 1, 2
αγ−2 fij (x1 , x2 )
1 fij (x1 , x2 )
fij (αx1 , αx2 )
= γ−1
=
.
fk (αx1 , αx2 )
α
fk (x1 , x2 )
α fk (x1 , x2 )
Letting α = 1/x2
19
1 fij (x1 /x2 , 1)
fij (x1 , x2 )
=
,
fk (x1 , x2 )
x2 fk (x1 /x2 , 1)
and since preferences are homothetic and A and B pay the same prices for goods 1
B
B
A
and 2, xA
1 /x2 = x1 /x2 . Thus for q = A, B
fij (xq1 , xq2 )
1 fij (x1 /x2 , 1)
,
q = q
q
fk (x1 , x2 )
x2 fk (x1 /x2 , 1)
and the second term on the RHS is independent of type.
Turning to the distance function, u (x1 , x2 , l) homogeneous of degree 1 implies
that from
x1 x2 l
u
, ,
= u0
d d d
we have
−1
d (x1 , x2 , l, u0 ) = u−1
0 u (x1 , x2 , l) = u0 f (x1 , x2 ) g(l).
Remembering that a∗ij ≡ aij /ai , for i, j = 1, 2 and q = A, B
a∗q
ij
1 fij (x1 /x2 , 1)
1
fij (xq1 , xq2 )
= q
≡ q fij∗ ,
=
q
q
fi (x1 , x2 )
x2 fi (x1 /x2 , 1)
x2
and
a∗q
13 =
g0
= a∗q
23 .
g
(28)
Thus, inspection of (26) shows that if λw = 0, as it is with either a proportional
earnings tax on the high earner or a lump-sum tax on the high earner, homogeneity
and weak separability imply λp = 0 and therefore t1 = t2 from (24). To prove t1 = t2
in a pure commodity tax regime is a little more work.
Using (28) and (13), E B times (24) yields
t1 − t2
∗A
∗A
∗A
+
= λp (1 + t1 ) p1 a∗A
12 − a22 + (1 + t2 ) p2 a21 − a11
(1 + t1 ) (1 + t2 )
∗A
a∗A
21 − a11
λR nA E A E B
λw w A w B
A B
Then using the expressions for a∗q
ij it follows that x2 E times (24) produces
20
t1 − t2
∗
∗
∗
∗
= λp ((1 + t1 ) p1 (f12
− f22
) + (1 + t2 ) p2 (f21
− f11
)) +
(1 + t1 ) (1 + t2 )
∗
∗
λw wA wB (f21
− f11
)
λ R E A E B n A xA
2
Now note in (25) that if leisure is weakly separable from goods a mimicking A has
the same consumption plan as a B and therefore the coefficient of λm is zero. Then,
A
following the pattern above, xB
2 E times (25) produces
t1 − t2
∗
∗
∗
∗
= λp ((1 + t1 ) p1 (f22
− f12
) + (1 + t2 ) p2 (f11
− f21
)) +
(1 + t1 ) (1 + t2 )
∗
∗
− f21
).
λw wA wB (f11
λ R E A E B n B xB
2
Adding the last two equations
B B
λ R E A E B n A xA
2 + n x2
t1 − t2
= 0 or t1 = t2 .
(1 + t1 ) (1 + t2 )
Lump-sum taxes on the high earner
Before mimicking begins, λL = λw = λm = 0. Then from (26) we know λp = 0.
Suppose we normalize tax rates by setting t1 = 0. Then from (18), (19), (20) and
(12) we know that t1 = t2 = tH = 0. tL = 0 follows from (22), (23) and (13). With
mimicking, equation (26) shows that λp is still zero if leisure is weakly separable from
B
goods, because in this case xA
jm = xj , j = 1, 2; mimicking As spend their money the
same way Bs do. In this case, the argument above shows that tH = t1 = t2 = 0.
What about tL ? From (21) or (22)
λR nB E B = 1 − λm
EB
,
A
Em
and from (23)
λR nB
A
B
wB
m a3m w
.
=
1
−
λ
A
aB
aB
3
3 w
Using (13), the first equation divided by the second is
A
1 − λm E B /Em
1−t =
.
B A
B
1 − λm (aA
3m w ) /(a3 w )
L
21
When mimicking starts λm rises from zero and therefore tL will increase from zero if
B
aA
EB
aA
1m
3m w
= A > B A or
Em
aB
a3 w
1
B
A
aA
1m a3 w
>1
B
B
aA
3m a1 w
Since a1 /a3 is the MRS between good 1 and leisure, and mimicking As enjoy more
leisure and the same utility as As,
B
A
B
A
aA
aA
1m a3 w
1 a3 w
>
.
B
B
B
B
aA
aA
3m a1 w
3 a1 w
(29)
and the right side equals unity when mimicking starts. Thus tL rises from zero when
mimicking starts.
What happens when mimicking starts if leisure is not weakly separable from
goods? Suppose good 1 is more complementary with leisure than is good 2, a∗13 > a∗23 .
Since mimicking As have the same goods budget as Bs but more leisure then in (26),
m
B
A
B
is negative and therefore λp rises from
xA
1m > x1 , x2m < x2 , the coefficient of λ
zero as mimicking begins. Then from (18) we can see that t1 > 0 and from (19),
t2 < 0. The argument above for tL rising from zero as mimicking begins still holds
because (29) is a strict inequality when mimicking begins, while λp is zero.
A proportional earnings tax on the high earner
This section derives results for the case where lump-sum taxes are inadmissible,
λ > 0, but the tools of commodity tax rates are supplemented by a proportional
earnings tax on the high earners, tH . The model is like that for a pure commodity
tax regime except that λw = 0. Inspection of (26) reveals that starting at the private
∗A
equilibrium the sign of λp must be the opposite of the sign of a∗A
13 −a23 . Suppose good
∗A
1 is more complementary with leisure than is good 2, a∗A
13 − a23 > 0. Look at (21)
and (22). As we move away from the private equilibrium where λp = t1 = t2 = 0,
λp becoming negative tells us that t1 becomes positive and t2 negative. The same
equations show that when mimicking starts and λm rises above zero both tax rates
A
will rise from the E B /Em
term. In either case good 1 is taxed at a higher rate than
good 2 if good 1 is more complementary with leisure. Using (12), the ratio of (20)
to (18) is
L
22
∗A
B ∗A
L ∗A
λA + λp aB
1 + t1
2 a31 − a1 a32 − λ a33
.
=
B ∗A
∗A
L ∗A
1 − tH
λA + λp (aB
2 a11 − a1 a12 ) − λ a13
If the goods are equally complementary with leisure then λp = 0 and moving away
from the private equilibrium the numerator rises, the denominator falls, the ratio
on the right-hand side rises and thus tH rises above zero (remember that in this
case commodity tax rates are zero). When mimicking begins the upf must become
flatter which means that λL falls, and tH falls. If the two goods are not equally
complementary with leisure the λp terms moderate the movements in the numerator
and denominator. In either case, however, the pattern for tH is the same. Before
mimicking starts tH is the primary instrument for redistribution; after mimicking
kicks in reductions in tH are used to prevent the As from mimicking the Bs and
commodity taxes are used to raise revenue for redistribution. Throughout, the good
that is more complementary with leisure is taxed at the higher rate.
23
References
[1] Atkinson, A. and J. Stiglitz, 1976, The design of tax structure: direct versus
indirect taxation, Journal of Public Economics 6, 55–75.
[2] Auerbach, A.J., 1979, A brief note on a non-existent theorem about the optimality of uniform taxation, Economics Letters 3, 49–52.
[3] Besley, T. and I. Jewitt, 1995, Uniform taxation and consumer preferences,
Journal of Public Economics 58, 73–84.
[4] Burbidge, J.B., 2015, Using distance functions to understand interest taxation, Canadian Journal of Economics, forthcoming:
see
https://artsonline.uwaterloo.ca/jburbidg/
[5] Corlett, W.J. and D.C. Hague, 1953-54, Complementarity and the excess burden
of taxation, The Review of Economic Studies, 21, pp. 21–30.
[6] Deaton, A.S., 1979, The distance function and consumer behaviour with applications to index numbers and optimal taxation, Review of Economic Studies 46,
391–405.
[7] Diamond, P.A. and J.A. Mirrlees, 1971, Optimal taxation and public production,
American Economic Review 61, 8–27 and 261–278.
[8] Mirrlees, J.A., 1971, An exploration in the theory of optimum income taxation,
The Review of Economic Studies 38, 175–208.
[9] Piketty, T. and E. Saez, 2013, Optimal labor income taxation, in A. Auerbach,
R. Chetty, M. Feldstein and E. Saez, editors, Handbook of Public Economics,
vol. 5 (Amsterdam: Elsevier-North Holland), 391-474.
[10] Ramsey, F.P., 1927, A contribution to the theory of taxation, Economic Journal
37, 47–61.
[11] Sadka, E., 1976, On income distribution, incentive effects and optimal income
taxation, Review of Economic Studies, 43, 261–267.
[12] Seade, J.K., 1977, On the shape of optimal tax schedules, Journal of Public
Economics, 7, 203–236.
24