Download Report

Performance Pay and Workers’ Non-Monetary
Motivations: Evidence from a Natural Field Experiment
David Huffman1
Michael Bognanno2
April 2, 2015
Abstract
A literature in psychology and behavioral economics cautions that paying workers for good performance may undermine non-monetary motivations to do a good job.
This paper provides the first implementation of the standard experimental design from
psychology in a real work setting with paid workers. The findings are consistent with
the view that performance pay may have negative psychological effects, but there is
also evidence of heterogeneity. A sub-group of workers report positive psychological effects, and there is suggestive evidence that different responses to incentives are
related to worker personalities and preferences.
1
2
Keywords:
Incentives, non-cognitive skills, experiment,
intrinsic motivation
JEL codes:
D03, J22, J33
University of Oxford and IZA; e-mail: [email protected]
Temple University and IZA; e-mail: [email protected]
1
Introduction
Literature in psychology, behavioral economics, and prominent management textbooks,
caution that paying workers for good performance may undermine non-monetary motivations for doing a good job (Deci, 1971; Lepper et al., 1973; Kreps, 1997; Baron and Kreps,
1999; Gneezy et al., 2011). The psychology literature has mainly used student subjects
in a lab setting performing tasks such as solving puzzles. The typical experimental design involves treatment subjects going through three stages, ABA, where A involves no
monetary incentives and B involves incentives for performance. Control subjects never get
incentives, going through AAA. The key stylized fact is that treatment subjects have lower
output than control in the third stage, consistent with performance pay having reduced
non-monetary motivations (for a meta analysis see Deci and Ryan, 1999). Some studies
even find lower output for treatment in the second stage, when incentives are active, consistent with the monetary incentives being too weak to offset a reduction in non-monetary
motivations (e.g., Deci, 1971).
There are various proposed mechanisms for such effects, many of which have in
common that the introduction of performance pay “signals” something to the worker,
about the enjoyability of the task, or the relevance of social norms calling for hard work,
or the beliefs of the employer about the worker’s trustworthiness (Gneezy and Rustichini,
2000a and 200b; Benabou and Tirole, 2003 and 2006; Heyman and Ariely, 2004; Gneezy
et al., 2011; Carpenter and Dolifka, 2014). We refer to these as changes in non-monetary
motivations, rather than using the term “intrinsic motivation” from psychology, which is
sometimes taken to mean task enjoyment alone (Fehr and Falk, 2002).
If performance pay does affect worker’s non-monetary motivations, this has important implications for economics and for managers. It means that even though performance
pay tends to increase output in the workplace (e.g., Lazear, 2000), the size of the impact
depends partly on “psychological variables” that are left of out economic models, and the
impact might be even greater in the absence of negative psychological effects (Benabou and
Tirole, 2003). This raises new issues for thinking about the optimal design of incentives,
in terms of whether different ways of delivering incentives might have better psychological
properties.
This paper makes two main contributions: It provides the first implementation
1
of the “classic” design from psychology in the context of a real work setting with paid
workers, and it explores potential heterogeneity in the psychological effects of performance
pay.1 One reason why it is particularly important to have evidence from a workplace
is that most previous studies have focused on traditionally unpaid activities, such as
solving puzzles, or volunteer activities like collecting donations or contributing to a student
newspaper.2 Understanding the impact of incentives on pro-social behaviors is clearly
important, but the extent and nature of non-monetary motivations in the domain of paid
work may be quite different (Titmuss, 1970; Staw and Calder, 1980; Kreps, 1997; Fehr
and Falk, 2002; Gneezy and Rustichini, 2000b; Stutzer et al., 2011). Evidence from a real
work context, where payment is the norm, is also crucial because the psychological effects
of performance pay are thought to work through signaling, and context can affect how
individuals interpret the meaning of a given action (Ross and Nisbett, 1991). Investigating
individual heterogeneity in psychological effects is important, because heterogeneity would
imply that the impact of performance pay depends on the mix of worker types in the job.
Our subjects were 39 workers, hired to mingle with the crowd at a street festival
for 5 hours and convince attendees to register in a company’s database. We were able to
measure the “sign-ups” generated by each worker on a minute-by-minute basis, for 195
worker-hour observations. Workers were randomly assigned to a control group receiving a
fixed wage of $18 per hour, or a treatment group who received the same base wage but also
an additional $5 per sign-up during the second hour. Treatment workers learned about
performance pay only at the beginning of the second hour, and learned that it would be
temporary, lasting exactly one hour.
The design involves several methodological innovations relative to the standard design from psychology. First, we deliberately chose a “high-powered” incentive, $5 per
sign-up, to see whether there are negative psychological effects even when incentives are
strong; some previous studies have found the strongest negative psychological effects of
incentives when incentives were quite weak (Gneezy and Rustichini, 2000b). Second, we
1
Jordan (1989) studies intrinsic motivation in the workplace, but measures only self-reported motivation
and job satisfaction, not actual work performance, and also does not have random assignment of workers
to treatment and control.
2
This includes one of the initial crowding out experiments where subjects were students writing headlines
for a student newspaper (Deci, 1971); while this comes closer to a work context, student newspapers
are traditionally done on a volunteer basis and for the purposes of education and skill development.
This study also involved only 8 subjects, and the measure of output was speed in completing headlines,
without any assessment of quality.
2
added a substantial rest break right after the incentive hour for both treatment and control, to help rule out differential levels of fatigue later on, and a follow-up questionnaire
asked questions about fatigue; fatigue effects would be an alternative explanation for why
treatment group workers exert less effort than control later in the workday, having worked
harder earlier on.3 Third, we informed treatment workers ahead of time, at the time performance pay was introduced, about the future removal of incentives, to minimize surprise
or “disappointment” effects that could otherwise play a role. Fourth, our questionnaire
measured worker traits such as personality and social preferences, as well as self-reported
changes in non-monetary motivations, facilitating the study of individual heterogeneity;
research on intrinsic motivation in psychology has typically not considered the role of
personality type (Watanabe and Kanazawa, 2009). Fifth, our design allows observing the
time profile of worker output after the removal of incentives; this can shed light on the
timing and duration of any treatment effects on non-monetary motivations. Sixth, unlike
many previous studies, our subjects were unaware of being in a study;4 this is important
for ruling out experimenter demand as a source of non-monetary motivations to work hard
(Levitt and List, 2009).
To preserve unawareness of being in an experiment and avoid treatment contamination, treatment and control workers were randomly assigned to different areas of the festival
and kept separate throughout the work day. This raises a potential concern in terms of
randomization failure, namely if different areas of the festival turned out by chance to have
different levels of customer availability. We discuss why differences in customer availability
are unlikely to be a binding constraint, because there were tens of thousands of potential
customers, but we also check robustness of the results to a difference-in-difference analysis
that corrects for any time-invariant differences in productivity for treatment and control.
Our main finding is that treatment group workers had substantially higher output
than control workers while performance pay was active, but lower output than control
when considering the hours after incentives were removed. The lower output during the
post-incentive hours did not manifest immediately, however, but rather grew over time.
The pattern is robust to using a difference-in-difference analysis. The difference in out3
One of the early psychology experiments did have a substantial break of some weeks between phases of
the study, but in that case treatment subjects did less of the task already during the incentive phase
and thus fatigue effects were not an issue for interpreting the results (Deci, 1971).
4
We obtained a waiver of consent from the IRB, under the relevant section of the federal guidelines.
3
put profiles for treatment and control after removal of performance pay is not consistent
with a simple, canonical economic model, because both groups of workers faced the same
monetary incentives in those hours.5
Turning to potential explanations, we consider a modified version of the canonical
economic model that includes fatigue effects, and a model that incorporates a negative
impact of performance pay on non-monetary motivations. We find little support for explanations based on fatigue spillovers; all indications suggest that treatment workers were
no more fatigued than control workers for the post-incentive hours. On the other hand we
do find a negative effect of performance pay on non-monetary motivations (self-reported),
with the majority of treatment group workers indicating that the experience of performance pay made work “less fun” in subsequent hours. This is consistent with a role for
changed non-monetary motivations in explaining the observed behavior. Interestingly,
however, there was individual heterogeneity, with a substantial minority reporting that
the experience of performance pay made work seem “more fun” even after incentives were
removed. Thus, the findings are not consistent with a simple model in which the psychological impact of incentives is uniformly negative. The shape of the aggregate output
profile, with its delayed drop in output, is also not fully consistent with the crowding out
hypothesis, which predicts an immediate negative psychological impact of performance
pay.
Investigating the role of non-monetary motivations and individual heterogeneity
in more depth, we find that several types of non-cognitive skills, selected ex ante as
potential carriers of non-monetary motivation (personality traits of conscientiousness and
extraversion, and social preferences in the form of positive reciprocity), are important for
explaining how workers responded to the experience of incentives. Conscientious workers
exhibit lower output immediately in the post-incentive hours, and the drop in output is
particularly pronounced. Extraverted and reciprocal workers, by contrast, actually exhibit
positive effects of the treatment on output in the post-incentive hours. The results on noncognitive skills need to be treated with caution as they are based on a modest number
of individuals, but they suggest that heterogeneity in psychological effects of incentives
may be systematically related to worker traits. We discuss how the shape of the aggregate
5
Notably, standard inter-temporal substitution motives from life-cycle models are not relevant, under
the plausible assumption that one hour of piece rate earnings leaves the marginal utility of income
unchanged.
4
output profile, in particular the delayed drop, could possibly reflect composition effects
stemming from individual heterogeneity.
In summary, we took the crowding out hypothesis from psychology, and the standard
approach to testing this hypothesis, to a real work setting. Several aspects of our findings
are consistent with the view that psychological effects of performance pay are a relevant
consideration for real work settings: we observe treatment differences in behavior that are
hard to explain with a canonical model of labor supply, or a model with fatigue effects;
workers report changed non-monetary motivations; worker traits related to non-monetary
motivations are important for explaining the treatment effect. At the same time, our
results show that the issue is complex, and challenge the view that performance pay
necessarily has negative psychological effects. A substantial minority of workers reports
increased non-monetary motivations, and some worker traits are associated with crowding
in of non-monetary motivations. This heterogeneity implies that the overall psychological
effects of performance pay could be negative, positive, or zero depending on the mix of
worker types in the job.
Thus, despite a large literature on crowding out in the lab, our findings point to
the need for further research in real work settings. In the conclusion we discuss several
directions for future research: uncovering the specific mechanisms that underly the psychological impact of incentives in the workplace; better understanding of the timing and
duration of psychological effects; research on the determinants of individual heterogeneity, and how the impact of worker traits may vary with the nature of the job; research
on how the ability of workers to self-select into jobs affects the psychological impact of
performance pay.
2
Related Literature
Our findings complement a previous literature in behavioral economics on incentives and
non-monetary motivations. A seminal paper by Gneezy and Rustichini (2000b) showed
that a group of subjects paid a low piece rate collected fewer contributions for charity than
a group of volunteers, while a group paid a higher piece rate collected about as much as
the volunteers. This evidence is valuable as it shows how low-powered incentives can have
no impact or even reduce performance on an important type of activity. Our study is dif-
5
ferent because we focus on a work setting, where payment is the norm, and we investigate
heterogeneity. Also, we take our investigation in a different direction, towards understanding whether high-powered incentives have psychological effects; this seems useful because
we know from previous evidence that employers may use performance incentives that are
strong enough to increase output substantially (Lazear, 2000 and others). One explanation for negative psychological effects of incentives, if incentives are low, is a signal of
low “social value” of the task (Gneezy and Rustichini; 2000b). This raises the question
whether high-powered incentives might avoid negative psychological effects, by signalling
high task value. Our findings are consistent with even high-powered incentives reducing
non-monetary motivations, for at least some workers. This suggests that signals of task
value may not be the only mechanism underlying psychological effects of incentives. It also
underlines the importance of research on psychological effects, as they are not eliminated
simply by using strong incentives.6
Various studies in behavioral economics have explored whether compensation affects
non-monetary motivations in the workplace, but mainly focusing on changes in the level of
fixed wages (e.g., Gneezy and List, 2006; Kube et al., 2008; Cohn et al., 2014; Cohn et al.,
2014; Gneezy and Rey-Biel, 2014). Our paper differs because we explore how changing the
payment mode – introducing performance pay – might change worker behavior by either
crowding out or crowding in non-monetary motivations.7
The results on worker personality also contribute to the growing literature on “noncognitive skills” in economics. Economists are increasingly interested in studying how
such traits can be important determinants of labor market success and life outcomes.8
Non-cognitive skills have also been shown to affect task motivation in the absence of
financial incentives, in the context of tests of cognitive ability (Segal, 2012). Our evidence
6
Another important paper by Gneezy and Rustichini (2000a) uses a within-subject design, but focusing
on the introduction of a temporary fine. They find that imposing a small fine for picking children up late
from daycare actually worsens behavior, and that there is no improvement after the fine is removed. Our
paper is different because of the focus on workers. Also, we have an incentive that is powerful enough to
elicit better performance while it is in place. See also Fehr and Gaechter (2002) for laboratory evidence
that fines can be worse than no incentives, and Carpenter and Dolifka (2014) for lab evidence on how
the impact of piece rates varies with perceived incentives of the employer. Charness and Gneezy (2011)
also test the psychological effect of incentives in the health domain.
7
Babcock et al. (2011) investigate the impact of team compensation in the domain of exercise.
8
See Heckman (2000); Bowles et al. (2001a);Heckman and Rubinstein (2001); Carneiro and Heckman
(2003); Persico et al. (2004); Kuhn and Weinberger (2005); Segal (2008); Lindqvist and Westman
(2011); Borghans et al., (2011); Becker et al. (2012).
6
complements this literature, providing some of the first evidence about how these traits
relate to high-frequency data on performance in the workplace, and also to the way that
individuals respond to introduction, and removal, of incentives.
There is a literature using field experiments or natural experiments to study the impact of incentives on effective labor supply or effort (Lazear, 2000 ; Paarsch and Shearer,
1999; Nagin et al., 2002; Fehr and Goette, 2007; Goldberg, 2013; Shi, 2010; Al-Ubaydli
et al., 2014 and many others). Our study adds evidence about the impact of temporary
incentives. Fehr and Goette (2007) study the impact of a temporary change in the performance pay rate for bicycle messengers, but do not observe behavior in the absence
of performance pay. Temporary incentives are useful for shedding light on how performance pay interacts with non-monetary motivations, but they are also important to study
in their own right. They potentially allow employers to add performance incentives at
selected times, in response to shocks to the marginal value of worker effort. They also
occur quite often in practice, e.g., in the form of temporary employee sales contests; total
expenditures on such contests are estimated to have been more than 26 billion dollars in
2000 in the U.S. (Lim et al., 2009). While there is some work in the business literature
on the impact of temporary incentives on worker behavior, to date such analyses have not
typically involved a control group, making interpretation difficult (see Lim et al., 2009).
3
3.1
Design of the Experiment
Work setting and nature of the task
The study took place during an afternoon street festival in a major U.S. city. The festival
extended for six blocks of a major shopping street, which was closed to auto traffic.
Businesses along the street, which included retailers as well as restaurants, operated booths
during the festival that featured their products. The business mix was similar, and there
were no differences in the types of activities available, for different areas of the festival.
The festival lasted five hours. An estimated 60,000 people visited the festival, so that
there were large crowds present at all times.
The workers used in our study were hired to assist a start-up company, but were
directly employed by a marketing agency; the start-up contracted with the agency, which
7
provides workers for promotional events. The agency advertised the job using an online
job site, offering a fixed wage of $18 per hour, and ultimately provided 39 workers.9 Any
reputation concerns workers might have had would have been focused on the marketing
agency and not the start-up; the interaction of the agency with the start-up was essentially
one-shot, because the promotion was expected to be a one-time event.
The workers were assigned the task of mingling with the crowd, and trying to convince attendees to sign up for the company’s database by sending a text. A text received
by the company’s server automatically established a database entry linked to the individual’s cell phone number. Registration in the database would allow the company to
send marketing materials about the service in the future, and also provided a measure of
latent demand that was potentially relevant for appealing to venture capitalists. While
there was a clear quantity dimension for output, there was less scope for quality variation.
Workers were given only basic information about the product. Essentially their job was to
get the customer to sign up in order to receive more detailed information. Thus, getting
a customer to sign up reflected the key achievement for the worker.
Our measure of individual worker output comes from the fact that the text sent by
the customer included the unique ID number of the worker with whom they spoke. Each
worker was given a laminated card containing their ID number and the content of the text
that the customer needed to send, including the ID. We observe the precise time each text
arrived on the start-up company’s server over the course of the festival.
The work setting is attractive for studying non-monetary motivations because it
has two features: (1) weak monitoring of worker output by the employer; (2) accurate
monitoring of output by researchers. Clearly, in most work settings employers do not have
perfect monitoring, which means that shirking is possible and non-monetary motivations
of workers are important for output. In such settings, however, it may be even more
difficult for researchers to have data on worker performance. In our study there is a
“wedge” between monitoring by employer and by researchers, which comes from the fact
that the start-up provided the data on sign-ups to the researchers, but not to the marketing
agency that was the direct employer of the workers. The fact that sign-up data was not
shared with the agency was made clear to workers at the beginning of the festival, and
the rational given for use of ID numbers was to allow the start-up “to track sales in
9
This was typical of the wages offered by the agency.
8
different areas of the festival”. When performance pay was introduced, it was made clear
that the calculation of payments, and payments themselves, would come directly from
the start-up. Thus, there was no contradiction between workers receiving performance
pay and the agency not receiving the data on sign-ups. There were three managers for
the marketing agency present during the festival, but workers could easily avoid visual
monitoring by losing themselves in the huge crowd. They were also free to go up side
streets to talk to customers, making it even easier to avoid observation and to explain
ex post to management why they were not observed. In summary, while output was
observable for the purposes of the study, workers has substantial latitude for shirking.
3.2
Treatment assignment
We randomly assigned workers to treatment and control the day before the festival, stratifying on information about age, gender, and previous experience provided by the marketing
agency. Thus, these observables were balanced across treatment and control at the outset.
On the day of the festival, workers arrived early for a training session. At that point
they received laminated cards with their ID numbers. The laminated card also told the
worker where in the festival they would be working, and gave them a schedule of water
breaks and a longer rest break. Importantly, the workers were used to being assigned
to work groups. Indeed, the marketing agency routinely randomly assigned workers to
work groups, stratified based on characteristics, to maximize “marketing effectiveness” at
promotion events. Therefore, the fact that workers were assigned to groups should not
have caused any suspicions about there being an experiment in progress (there were also
no suspicions voiced by workers during the experiment).
To avoid awareness that an experiment was taking place, and also to comply with
the wishes of the start-up, we kept the treatment and control groups separate from each
other throughout the whole festival. We randomly determined beforehand that treatment
workers would be assigned to stay between 20th and 17th street while control workers
would be told to stay between 17th street and 14th street. The rule about not crossing
17th street was explained as being important in order to have “equal coverage” of the
festival, and was enforced by having a manager of the marketing agency posted at 17th
street. During the festival, each group took water breaks, and a longer rest break, at tents
at their respective ends of the festival. We describe the procedures regarding rest breaks in
9
more detail below. The treatment assignment is summarized in Figure 1, using a map of
the festival. The triangles marked “West” and “East” indicate the tents where treatment
and control workers came for water breaks, respectively. The “Middle” tent was another
tent for the start-up but was not used by our workers.
Figure 1: Treatment and control locations
!
!
!
!!!!!!!!!!!!Treatment!Workers!!
!!!!!!!!!!!!!!Control!Workers!
19th!
Street!
!
!
!
Principle)Street)
!
Mid!
East!
Temt!
!
!
!
!
West!
th
20 !
Street!
18th!
Street!
17th!
Street!
16th!
Street!
15th!
Street!
14th!
Street!
!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Given the geographic separation of treatment and control workers during the festival,
it is important to consider whether there could have been a failure of randomization, in the
sense that the portion of the festival randomly assigned to treatment workers turned out
to have different “characteristics” that mattered for the productivity of workers. There
are several reasons why such differences are unlikely. First, the festival took place in a
quite narrowly-defined geographic area of the city, and the types of businesses and booths
were similar along the six blocks of the festival. Second, crowds flowed into and out
of the festival from all directions throughout the festival, so there was not a directional
flow of potential customers that might cause one end of the festival to have more “fresh”
customers. While there was one fewer side street in the treatment group area of the
festival (19th street ends when it meets the principal street) there was instead a pedestrian
path, so there was no important difference in accessibility to customers. Third, sustained
subjective observations by the experimenters indicated no discernible differences in the size
or composition of the crowds at either end of the festival; the experimenters were at each
end of the festival throughout the event, photographs were taken of the crowd on an hourly
basis, and experimenters also switched ends of the festival periodically. Fourth, we check
robustness of the results to normalizing the estimates of the treatment effect by the initial
baseline outputs of treatment and control workers. Thus, even if there were a difference
in the level of customers available to workers at one end of the festival, compared to the
10
other, this is taken care of by the difference-in-difference. Finally, although normalizing by
baseline outputs would not solve differential time trends in availability of customers across
the two halves of the festival, there were so many potential customers relative to workers
(60,000 vs. 39) that even if such time trends were present, availability of customers was
unlikely to have been a binding constraint.
3.3
Timeline for the experiment
Figure 2 summarizes the timeline of the experiment. The first time interval, denoted
baseline, started at 11:45. By that time workers had finished the training, and positioned
themselves in their respective assigned portions of the festival. There was a substantial
crowd already gathered even though the festival had not officially started, and workers
began mingling with the crowd and getting sign-ups; the first sign-ups for treatment and
control occurred about one minute apart. At that point the treatment group was unaware
that performance pay would be introduced.
Figure 2: Timeline for the experiment
At 1:00 pm both groups checked in at their respective tents for a water break. Once
everyone was present, an brief announcement was made to the treatment group. They
were told: “For the next hour only there will be a special promotion from [start-up name
here], where you get an extra $5 for every text that comes in with your ID number. This
is on top of the $18. Note that this only lasts for the next hour, so any text that comes
11
in after 14:00 will not count for the $5. There won’t be any promotions later in the day.”
After the hour with performance pay, half of the treatment group and half of the
control group went on break for 30 minutes. They rested in the shade during this time
at their respective ends of the festival, eating sandwiches that were provided and being
watched (unobtrusively) by one of the research team. Subsequently these workers went
back out, and the other half of treatment and control came in for a 30 minute break. The
staggered rest break was requested by the start-up, to have continuous brand presence on
the street at all times. After the second group was done with their break, all workers were
back on the street for the remaining two hours of the festival.
3.4
Questionnaire
After the experiment was completed, workers were contacted by the agency and asked
to fill out an online questionnaire being conducted by researchers who were studying the
sales data. Responses were entered anonymously, but included the worker’s ID number, to
allow matching to the productivity data. It was explained in an informed consent screen
that the agency would never learn the survey responses of any individual. Furthermore,
the agency did not even know the content of the survey. The researchers would simply
inform the agency which workers had completed the survey, and authorize the agency to
pay the workers $15 for participation plus any additional earnings from the trust game,
described below. Out of 39 workers, 34 completed the survey.
The first part of the questionnaire asked about a well-known measure of personality from psychology, the “Big Five,” which consists of five traits: conscientiousness,
extraversion, agreeableness, intellect, and emotional stability. We used a standard battery
of questions for the elicitation.10 The questionnaire also asked a series of other questions,
about demographics and about the experience of working at the festival. Particularly
important were a series of questions about fatigue, and a question about non-monetary
motivations. We provide the exact wordings of all questions used in the analysis as we
discuss the results (wording is sometimes given in a footnote).
Respondents also participated in a modified version of the “trust game” developed
by Berg et al. (1995), with other survey respondents. In this game a player can exhibit
10
The questions for the Big Five are available at http : //ipip.ori.org/N ew IP IP − 50 − item − scale.htm.
12
reciprocity, by choosing to “return a favor” in a one-shot anonymous interaction even
though doing so entails a financial cost.11
3.5
Sample characteristics and randomization check
Table 1 provides descriptive statistics on demographics for workers in our sample. The
table shows that 58 percent of workers were female, and the average age was 25 (the
youngest worker was 20 and the oldest was 40). About half of the workers were “veterans” according to the manager, and using questionnaire responses a similar fraction are
“experienced” in the sense of having participated in at least three previous promotion
events. For 10 percent of workers, the first language was not English. About 80 percent
of workers had completed at least some college education.
Table 1: Sample characteristics
Mean age
Fraction female
Fraction veteran
Fraction experienced
Fraction english second language
Fraction some college
Sample statistic
Std. deviation
25.23
0.59
0.54
0.51
0.10
0.85
(4.56)
(0.50)
(0.51)
(0.51)
(0.31)
(0.37)
Notes: Based on 39 worker observations for Age, Female, and Veteran. Other statistics
are based on the 34 survey respondents.
As discussed above, age, gender, and the veteran indicator were balanced across
treatment and control ex ante. Furthermore, a test for randomization cannot reject that
all of the worker characteristics in Table 1, as well as reciprocity and the big five personality
11
The game involved two players, each with an endowment of $10. The first mover could choose to keep
the endowment, or pass all of it to the second mover. If the money was passed, it was tripled by the
experimenter, so that the second mover received $30 in addition to his or her initial endowment. The
second mover had a binary choice to keep all of the money, in the event that money was passed, or
to send back $20 to the first mover. Importantly, there was no financial motive for the second mover
to send anything back. Respondents knew that it would be randomly determined who they would be
matched with, among the other respondents, and which role they would play. They were asked to make
a choice for both roles. We used role reversal because we use the second-mover decision as a binary
measure of reciprocity. This is an incentive compatible method to elicit the choices, as either one could
end up being relevant for the respondent’s payoff. After survey responses were collected, the random
matching of players was done. As explained to the subjects, we randomly selected five pairs to actually
be paid, based on the combination of their choices. Payments were distributed along with the payments
for participating in the survey. Subjects were reminded in the instructions that the agency did not know
the content of the survey.
13
traits, are balanced across the treatment and control groups. None of the variables are
significantly different individually, between treatment and control, in a Probit regression
of the treatment dummy on all observables, and the variables are also not significant
according to a joint test (Chi-square; p < 0.59). We also check robustness of the treatment
estimates to including worker fixed effects, which control for all time invariant worker
traits.
4
Behavioral Predictions
In this section we briefly discuss the predictions of three types of models: A simple canonical model, a modified canonical model that includes fatigue effects, and a model that
includes crowding out of non-monetary motivations. Predictions are standard comparative statics exercises so we relegate proofs to the online appendix.
For simplicity all models have just two periods, and workers maximize utility by
choosing effort levels for each period.12 In period 1 control and treatment workers get a
base wage w, but treatment workers also have a performance pay rate of z. In period 2
both groups of workers just get w. In all models workers have convex costs of effort in each
period, ct (et ), but we introduce a more complex effort cost function with fatigue spillovers
in Model (2). Although reputation concerns were likely rather weak in our work setting, we
allow for reputation concerns in a reduced form way, to allow canonical models a chance to
predict non-zero effort in periods without performance pay. We assume that in each period
there is a constant probability that managers observe a worker, and choosing higher effort
means that the manager will be more impressed, and more likely to re-hire the worker
in the future, conditional on the worker being observed. The product of the probability
of observation, and the benefit of impressing the manager, is denoted by p(et ), with p(·)
increasing and concave. In all models we assume that the marginal utility of income
is constant and unaffected by earnings, because the magnitude of earnings accumulated
12
There is no need for a baseline period as any model predicts the same behavior for treatment and control
in such a period. The representation of the worker’s problem as involving one choice variable, effort
(leisure), in a time-separable utility function that is linear in income, and convex in effort, is equivalent,
along the optimal path, to a standard model of inter-temporal labor supply in which a worker has two
choice variables, consumption and effort. Intuitively, the maximization problem in two variables can be
reduced to a single variable problem by substitution of the first order condition for consumption; the
convexity of effort costs in the resulting condition follows from concavity of utility in consumption (see,
e.g., Browning et al, 1985; Fehr and Goette, 2007).
14
during a few hours of work is trivial relative to lifetime income. We normalize the marginal
utility of income to 1.
Model (1): Canonical model:
In the simplest canonical model treatment group and control group workers maximize
utility functions VT and VC :
VT = z · e1 + w + p(e1 ) − c(e1 ) + w + p(e2 ) − c(e2 )
(1)
VC = w + p(e1 ) − c(e1 ) + w + p(e2 ) − c(e2 ).
(2)
Proposition 1: In a canonical model treatment group workers work harder than control
when performance pay is present, but choose the same effort level once performance pay
is removed.
It is straightforward to see that treatment workers have a higher optimal effort in
period 1 than control workers, because performance pay increases the marginal benefit of
effort. In period 2, however, the maximization problem for treatment and control workers,
and thus optimal effort, is identical. Note that because the marginal utility of income is
assumed to be unaffected by piece rate earnings, treatment workers have no greater taste
for leisure in period 2 than control workers (no “standard inter-temporal substitution”
effect).
Model (2): Model with fatigue spillovers:
Next, we modify the canonical model to allow for fatigue. In quite general terms, fatigue
can be though of as a stock that increases the marginal cost of effort. The stock should
be higher if the worker chose high effort in the previous period, and lower or zero if the
worker rested instead. Specifically, we assume the same convex cost function as in the
canonical model, except that the cost of effort in period 2 also depends on k, a fatigue
stock: c = c(e2 , k). We assume
∂c(e2 ,k)
∂e2 ∂k
> 0, so that the marginal cost of effort in period 2
is increasing in the fatigue stock.
We first consider a case with no rest break between period 1 and period 2, so that
higher period 1 effort increases the fatigue stock for period 2:
15
∂k(e1 )
∂e1
> 0. Treatment and
control workers maximize the following:
VT = ze1 + w + p(e1 ) − c(e1 ) + w + p(e2 ) − c(e2 , k(e1 ))
(3)
VC = w + p(e1 ) − c(e1 ) + w + p(e2 ) − c(e2 , k(e1 )).
(4)
Proposition 2: In the case of fatigue spillovers and no rest break, if treatment group workers
exert more effort than control while performance pay is present, they reduce effort relative
to control once performance pay is removed.
Workers are assumed to be forward looking and take into account that period 1
effort makes it harder to exert effort in period 2. If financial incentives are strong enough,
however, it will make sense for treatment workers to increase period 1 effort relative to
control workers, even though this makes it harder to exert effort later on.13 In this case
it is clear that treatment workers have lower optimal effort than control workers in period
2, because their marginal cost of effort is higher in period 2.
If there is a sufficient rest break between period 1 and period 2, however, higher
effort in period 1 for treatment workers need not imply lower effort than control in period
2. Sufficient means a rest break long enough to reduce the fatigue stock to zero by the
beginning of period 2. In this case k(e1 ) = 0 and
∂k
∂e1
= 0 and we have the following
proposition.
Proposition 3: With fatigue spillovers, but also a sufficient rest break following the performance pay episode, treatment group workers may exert more effort than control when
performance pay is present, but the same as control after performance pay is removed.
With a sufficient rest break the cost of effort in period 2 is just c(e2 ), and the
optimization problem facing the worker is the same as in the canonical model. Thus, the
model predicts equal effort for treatment and control group workers in period 2. It is an
empirical question whether the work task had significant fatigue spillovers and whether
the rest break was long enough to eliminate any fatigue stock built up due to extra effort
under performance pay. We investigate this question is various ways in the analysis.14
13
The marginal benefit of higher effort in period 1 needs to offset the marginal cost in period 1 as well as
the increased marginal cost in period 2.
14
The model also predicts a downward sloping effort profile for both control and treatment workers. As
16
Model (3): Model with negative impact of performance pay on non-monetary
motivation:
In this model we introduce non-monetary motivations to work hard in a reduced form way,
including an additional term in the utility function, θ, which increases the marginal utility
of effort. This could arise for various reasons, for example because the task is enjoyable
or because working hard satisfies a social norm. To capture the crowding out hypothesis,
we assume that non-monetary motivation is lower if the worker is experiencing, or has
recently experienced, performance pay:
∂θt
∂z
< 0. Workers maximize the following:
VT = θ(z > 0)e1 + ze1 + w + p(e1 ) − c(e1 ) + θ(z > 0)e2 + w + p(e2 ) − c(e2 )
(5)
VC = θ(0)e1 + w + p(e1 ) − c(e1 ) + θ(0)e2 + w + p(e2 ) − c(e2 )
(6)
t
Proposition 4: Given − ∂θ
∂z < 1, treatment group workers exert more effort than control
while performance pay is present. After performance pay is removed, effort of treatment
workers is unambiguously lower than for control.
Intuitively, non-monetary motivations are lower in period 1 if there is performance
pay. This works against a positive effect of financial incentives, but if incentives are strong
enough, output will still increase in period 1. A necessary condition for an increase in pet
riod 1 effort is that the reduction in period 1 non-monetary motivation, − ∂θ
∂z , is less than
the marginal utility of income, which in this case is set equal to 1. In period 2, the reduction in non-monetary motivation caused by the previous experience of performance pay
unambiguously lowers effort of treatment group workers relative to control group workers,
because unlike in period 1 there are no offsetting financial incentives. There are alternative
ways to model the interaction of performance pay with worker psychology, which would
lead to similar predictions to Model (3). Our focus is not on disentangling subtly different psychological mechanisms that generate such predictions, but rather checking whether
shown in Goette and Huffman (2006), however, fatigue effects do become more complex in a model with
more than two periods. For example, forward looking workers might exhibit a u-shaped effort profile if
there are three or more periods: at the beginning of the day there is no fatigue stock, so marginal cost is
low, and the worker puts in some extra effort; during the middle of the work day the worker paces him
or herself; before a rest break, or before the end of the workday, the worker might increase effort again,
knowing that the future rest period wipes out the consequences in terms of accumulated fatigue stock.
We abstract away from such effects in the analysis, focusing on more basic predictions of the fatigue
model, e.g., that if treatment workers have higher effort in a given period, they should exert less effort
in the next effort than control workers, all else equal.
17
the basic prediction of the crowding out hypothesis from psychology, lower output for
treatment workers in post-incentive periods, is is born out in the data.15
5
Results
5.1
Treatment comparisons with raw data on total sign-ups
We begin the analysis with some simple calculations. The raw data on sign-ups are shown
in Panel (a) of Table 2, and Panel (b) calculates treatment differences in percentage and
absolute terms. We see in Panel (b) that that output was somewhat lower for treatment
than control in the baseline period, by 18 percent. In the hour when incentives were
introduced, however, output was 94 percent higher for treatment than control. In the
hour immediately after the removal of incentives, output in treatment dropped to the same
level as control. For the final two hours, output was substantially lower in treatment than
control, by about 74 percent and 65 percent, respectively. The same qualitative features
emerge looking at absolute differences in hourly output: total output was substantially
higher in treatment than control while incentives were active, but substantially lower in
the post-incentive hours, particularly for the final two hours.
The calculations so far treat the modest baseline difference as just an idiosyncratic
shock, but a more conservative approach is to treat the whole difference as a time invariant difference in productivity, and calculate the treatment comparison as a difference-indifference. In Panel (c) of Table 2 we first do the difference-in-difference in percentage
terms, subtracting the percent difference in baseline from the percent difference in later
hours. The basic results are unchanged with this normalization. The treatment effect while
15
To pick one example, one could assume that workers have an “output target,” because they feel that they
“owe” the employer a certain amount of output due to social norms to do a decent job. If performance
pay causes treatment workers to reach the target earlier in the festival (requiring that output targets are
somewhat “sticky” and do not adjust to the circumstance of receiving performance pay), then they might
have reduced psychological motivations to keep working hard later in the festival, relative to control. This
would lead to similar predictions as as Model (3), but due to a subtly different mechanism: rather than
reducing non-monetary motivations directly, performance pay causes workers to reach psychologicallymotivating goals more quickly. A related type of model, discussed in some previous research, involves
workers in a piece rate setting having a psychological motivation (loss aversion) not to fall short of a
daily “income target” (e.g., Camerer et al., 1997; Koszegi and Rabin, 2006). Our setting is different
because workers have a fixed wage at all times except during the incentive hour. This means that in
post-incentive hours effort did not translate into progress towards an income target, and so income
target related motives could not have affected the marginal incentives to exert effort. This means that
income targeting cannot explain a difference in output in the post-incentive hours for treatment and
control, similar to the canonical model.
18
incentives were active grows to 112 percent. The difference for the first post-incentive hour
becomes modestly positive, 24 percent. In the final two hours there are still substantial
negative differences, of 57 and 48 percent, respectively. On average over the three postincentive hours, output was lower by 27 percent for the treatment group.
An alternative way to do the difference-in-difference is to subtract the absolute
difference in baseline from the absolute differences in later hours. One issue with this
approach is a floor effect: because treatment starts from a lower level in baseline, there is
simply “less room” for an absolute fall in output. Thus, any absolute reduction in output
for treatment versus control after the normalization is a lower bound, compared to what
would be expected had treatment and control had more similar outputs in baseline. The
normalization in percentage terms avoids such floor effects.
As shown in Panel (c) of Table 2, doing the difference-in-difference in absolute terms
leads to a difference of 77 sign-ups during the incentive hour. In the first post-incentive
hour, we have 8 more sign-ups per worker for treatment relative to control. Note that for
this latter calculation it is important to subtract only the half of the baseline difference in
sign-ups, to account for the fact that half the number of workers were on the job compared
to in the baseline hour. In the final two hours there are negative differences, of 17 and
3 sign-ups, respectively, between treatment and control. On average over the three postincentive hours output was lower by 4 sign-ups per hour for treatment than control. To
put this amount in perspective, the drop implies a widening of the baseline gap in output
by 25 percent.
The treatment differences in output for post-incentive hours are not consistent with
the canonical model considered in the predictions section, Model (1), since treatment and
control workers faced identical monetary incentives during those times. We turn next to
assessing the statistical significance of the treatment differences.
5.2
Econometric models
In this section we use our data on the hourly output of individual workers to estimate
econometric models, and assess the statistical significance of treatment differences.
19
20
-12
-0.18
56
68
77
1.12
65
0.94
134
69
Incentive hour
-0.57
-0.27
-17
-4
8†
-29
-14
-0.74
-0.44
10
39
15:00-16:00
0.24
2
0.07
32
30
14:00-15:00
-3
-0.48
-15
-0.65
8
23
16:00-17:00
−Ct
Notes: ∗ Percent difference is TtC
. † Half of the workers were on break at any given time during 14:00-15:00 so we subtract only half of the absolute
t
difference in baseline sign-ups when normalizing the 14:00-15:00 output difference.
Abs. difference in t - difference in baseline
Ave. for post-incentive hours
(c): T vs. C normalizing by baseline
% difference in t - %difference in baseline
Ave. for post-incentive hours
Absolute difference in outputs
Ave. for post-incentive hours
(b): T vs. C
Percent difference in outputs∗
Ave. for post-incentive hours
(a) Total output by hour
T
C
Baseline
Table 2: Raw data on total sign-ups by hour and calculations of treatment differences
5.2.1
Estimating the treatment effect
For the estimations we aggregate worker sign-ups to the hour. Aggregating reduces noise
in the effort measure and an hourly basis is natural given the structure of the experiment.
One minor complication is that the baseline period was slightly longer than one hour; for
the estimation, we just attribute the sign-ups generated in the brief period before festival
started to the baseline hour, applying the same procedure to treatment and control.16
Our preferred estimation method is negative binomial regression, because this deals
econometrically with two features of the data: (1) sign-ups are count data, i.e., data that
can only take on integer values and cannot be negative; (2) the distribution of sign-ups
is skewed with many observations of zero hourly sign-ups. Negative binomial regression
models a Poisson like process in which “success” is the occurrence of a sign-up, but relaxes
some restrictive assumptions of a standard Poisson regression.17 The negative binomial
distribution is a good fit for the empirical distribution as shown in Figure 3. Coefficients
from this estimation approach are interpreted in percentage terms. The estimates thus
correspond to comparing treatment and control in percentage terms, similar to Table 2
except that the regression analysis is at the worker level rather than in terms of total
output. We also check that the estimation results are robust to using alternative countdata methods: we show results for zero-inflated Poisson, but results are similar with
standard Poisson, or zero-inflated negative binomial.18
We also estimate the treatment effect in terms of absolute levels rather than percentages, mirroring the approach in Table 2. We run Tobit regressions, where the coefficients
give the absolute change in sign-ups for a change in an independent variable, but account
econometrically for the mass of the distribution at zero.
16
There were only 13 additional sign-ups generated in this previous time interval. The procedure tends
to “inflate” performance in the baseline slightly, but it is applied to treatment and control in the same
way and thus does not affect the treatment comparison. We find similar results if we instead include an
additional time category for each worker, capturing the sign-ups before festival start.
17
While standard Poisson regression imposes the assumption that the mean of the dependent variable
equals the standard deviation, negative binomial allows for a variance that is greater than the mean,
matching our skewed distribution with a mass at zero and a long right tail.
18
Zero-inflated Poisson is a Poisson regression but with added flexibility to account for the frequency of
zeros in the data. In the absence of variables that plausibly affect having positive sign-ups, but not
number of sign-ups, the equation for zero sign-ups is based on a constant term. Zero-inflated negative
binomial is similar, modeling zeros separately from non-zero observations.
21
Figure 3: Empirical distribution of total worker sign-ups vs. fitted negative
binomial distribution
0
.1
fraction
.2 .3 .4
.5
Empirical distribution
0
1
2
3
4
5
6
7
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Count of hourly sign−ups
0
.1
Fraction
.2 .3 .4
.5
Fitted negative binomial distribution
0
1
2
3
4
5
6
7
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Count of hourly sign−ups
Our main treatment estimation is based on the following regression model:
sit = γ1 h1 + ... + γ5 h5 + φ1 h1 · T + ... + φ5 h5 · T + it
(7)
The dependent variable sit is the number (count) of sign-ups for worker i in hour t. There
is no constant term. The variables h1 · · · h5 are dummy variables for work hours 1 through
5, and the variables h1 · T · · · h5 · T are interaction terms between hour and a treatment
dummy T . The φ coefficients, on the interaction terms, show the effect of interest: the
percent (absolute in the case of Tobit) difference in sign-ups in output per worker for
treatment versus control for each hour of the festival. Standard errors in this and all other
regressions are robust, and corrected to allow for arbitrary correlation of the error term
within worker.
To estimate the difference-in-difference version of the treatment effect, we run the
following regression:
sit = β + αC + γ2 h2 + ... + γ5 h5 + φ2 h2 · T + ... + φ5 h5 · T + it
(8)
With negative binomial and zero-inflated Poisson this corresponds to normalizing by the
percent difference in baseline outputs, and with Tobit it involves normalizing by the absolute difference in baseline outputs. The dependent variable is again the count of worker
22
sign-ups in an hour. The regression now includes a constant term, β, and omits dummy
variables for the baseline period. The dummy variable C shows the difference in baseline sign-ups for control versus treatment. The φ coefficients on the interaction terms are
again the coefficients of interest, but with baseline as an omitted category these now show
the difference in sign-ups between treatment and control workers in hours 2 through 5
normalised by the baseline difference. We also check robustness to including worker fixed
effects that control for all time invariant worker traits.
5.2.2
Treatment estimates
Column (1) of Table 3 shows the results of our negative binomial regression. The coefficients on the interaction terms show that the average worker has lower output in treatment
than control during the baseline period, but the difference is not statistically significant
(p < 0.48). Output is significantly higher for treatment than control, however, in the hour
when incentives were active (p < 0.06). Output is not significantly different in the first
post-incentive hour, but for each of the final two hours output in treatment is significantly
lower than in control (p < 0.01; p < 0.09). In Column (2) we show that results are similar
using an alternative count-data estimation method, the zero-inflated Poisson regression.
Column (3) of Table 3 reports results of a Tobit regression, which compares treatment and control in terms of absolute levels. To reflect the fact that only half the workers were working at any given time during the first post-incentive hour, we double each
worker’s output for the break hour before running the regression. This has no impact
on the qualitative results or statistical significance, but means that the coefficient for the
break hour more accurately reflects the absolute magnitude of output per worker in that
hour.19 The coefficients on the interaction terms show that the positive difference in the
incentive hour is not quite statistically significant (p < 0.11). The drop in output for
the fourth hour is statistically significant (p < 0.04), and the negative coefficients for the
fourth and fifth hour are jointly significant (Chi-square test, p < 0.09).
19
We do not actually observe the counterfactual output with double the number of workers, but doubling
observed output seems like a reasonable assumption.
23
24
Estimation method
Worker fixed effects
Observations
Constant
Control
16:00-17:00
15:00-16:00
N. Bin.
No
195
(2)
-0.25
(0.35)
0.61*
(0.33)
0.01
(0.45)
-1.41***
(0.52)
-1.11*
(0.64)
1.28***
(0.15)
1.29***
(0.24)
0.46*
(0.25)
0.72**
(0.33)
0.19
(0.42)
Z. In. Poi..
No
195
(3)
0.20
(0.30)
0.60**
(0.29)
0.50
(0.39)
-1.52***
(0.50)
-1.40*
(0.72)
1.33***
(0.14)
1.52***
(0.22)
0.83***
(0.22)
1.33***
(0.29)
1.04**
(0.47)
Tobit
No
195
(0.31)
(4)
-2.20
(1.48)
3.25
(2.03)
-1.05
(2.33)
-3.72**
(1.81)
-2.94
(2.10)
3.40***
(0.64)
2.89**
(1.15)
1.75
(1.31)
-0.04
(1.34)
-2.12
(1.58)
N. Bin.
No
195
0.01
(0.25)
-0.82***
(0.27)
-0.56
(0.34)
-1.08**
(0.44)
0.25
(0.35)
1.03***
(0.14)
0.86**
(0.36)
0.26
(0.39)
-1.17***
(0.43)
-0.86
(0.57)
(5)
N. Bin.
Yes
195
-0.13
(0.29)
-0.91***
(0.30)
-0.73**
(0.36)
-1.19***
(0.45)
-1.93***
(0.20)
2.42***
(0.16)
1.24***
(0.39)
0.33
(0.42)
-0.97**
(0.48)
-0.73
(0.61)
(6)
Z. In. Poi..
Yes
195
0.04
(0.27)
-0.66**
(0.27)
-0.40
(0.34)
-0.50
(0.65)
-1.56***
(0.24)
2.11***
(0.92)
0.76**
(0.39)
0.27
(0.38)
-1.32***
(0.41)
-1.30*
(0.71)
(7)
Tobit
Yes
195
-0.52
(1.11)
-1.48
(1.27)
-3.22**
(1.29)
-4.86***
(1.45)
-9.27***
(1.12)
10.42***
5.93***
(2.01)
1.04
(2.09)
-3.39*
(2.04)
-2.62
(2.36)
standard errors in parentheses, adjusted for clustering on worker. ***, **,* indicate significance at 1-, 5-, and 10-percent level, respectively.
and 2), the coefficients for hour dummies give the log of sign-ups. Coefficients for Tobit are in terms of absolute level of sign-ups. Robust
change in the dependent variable. With the constant omitted from the negative binomial and zero-inflated Poisson models (Columns 1
Notes: Coefficients for Negative Binomial and Zero-Inflated Poisson give the percent change in the frequency of sign-ups for a one unit
Post-incentive hours
13:00-14:00
Incentive period
14:00-15:00
12:00-13:00
16:00-17:00 * Treatment
15:00-16:00 * Treatment
Baseline period
Post-incentive hours
13:00-14:00 * Treatment
Incentive period
14:00-15:00 * Treatment
(1)
12:00-13:00 * Treatment
Baseline period
Table 3: Econometric models for the treatment effect
In Columns (4) to (7) we investigate the difference-in-difference estimates. Column
(4) shows the normalized treatment effect without worker fixed effects. The treatment
difference is statistically significant for the incentive hour (p < 0.02). The positive output
difference in the first post-incentive hour is not statistically significant (p < 0.51). The
drop for treatment in the fourth hour is significant (p < 0.01), and the coefficients for the
fourth and fifth hour are jointly significant (Chi-square test, p < 0.02). In Columns (5) to
(6) we see that results are similar using worker fixed effects or zero-inflated Poisson.
In Column (7) we use Tobit to estimate the treatment difference subtracting the
absolute difference in baseline. By adjusting for censoring at zero, we are addressing the
floor effect issue econometrically. The regression includes worker fixed effects. We see
from the regression estimates that the positive normalized difference in the incentive hour
is statistically significant (p < 0.01), whereas the modest positive difference in the first
post-incentive hour is not significant. There is again a large negative difference for the
fourth hour (p < 0.09).
In summary, we find a statistically significant positive effect of incentives while they
are active, and significant negative effects for the last two post-incentive hours. We turn
below to explanations based on fatigue effects or changes in non-monetary motivations.
5.3
Investigating the role of fatigue
As discussed in the behavioral predictions fatigue spillovers might be able to explain a
reduction in post-incentive output for treatment workers relative to control, assuming the
rest break was not sufficient to eliminate any differences in fatigue.
The finding that treatment had the same or greater output than control in the first
post-incentive hour is seemingly inconsistent with the fatigue model. Treatment workers
had higher output and effort during the incentive hour than control workers. In the
context of a model with fatigue effects, the treatment workers who kept working in the
first half-hour after the incentive period should have had a higher marginal cost of effort
than their counterparts in the control group, and thus lower effort. The observed behavior
is contrary to this prediction.
We investigated two possible ways that the fatigue model in the predictions section
might be augmented to explain this anomaly. One is that workers might have made a
25
mistake in keeping track of time, and thought that they still had the bonus incentive for
the first part of the half hour. Another is that workers could have been in conversations
with customers in the last few minutes of the incentive hour, and the resulting output was
not realized until the early part of the next half hour. Both of these mechanisms would
imply a particularly large “spike” in output right after the 14:00 cutoff. We find little
support for these explanations, based on looking at worker output in five minute intervals
during the break hour. There is a spike in output in the first five minutes for treatment
workers, but spikes of exactly the same size for control workers after 15 minutes and 30
minutes. Also, there are spikes for treatment workers of similar size at 35 and 40 minutes.
Thus, the disaggregated output data do not support an explanation in which there is
something special about the first five to ten minutes of the hour for treatment workers. In
the follow-up questionnaire we also find that 95 percent of treatment workers knew exactly
when it was 14:00 and the incentive hour ended. Thus, it seems that treatment workers
exerted similar or perhaps even more effort than control workers in the hour following
incentives, which seems inconsistent with treatment workers being more fatigued.
If we observed that workers who achieved an especially large number sign-ups during
the incentive hour had an especially large drop in output later in the festival, this would
be consistent with the fatigue explanation.20 We estimated a regression explaining hourly
sign-ups in the last three hours of the festival relative to baseline, for treatment group
workers. We included a dummy variable for greater than median sign-ups during the
incentive hour, dummy variables for hour 3, 4, and 5, and interactions between the hour
dummies and the indicator for greater than median sign-ups. The interaction terms are all
far from significant, indicating that workers with a particularly large number of sign-ups
during the incentive hour did not have an especially large drop later. Thus, we do not
find evidence that a higher number of sign-ups generated strong fatigue spillovers.
Typically effort costs are unobserved and thus are a degree of freedom in economic
models, but our study is unusual in that we have some direct measures that speak to
the question of whether fatigue spillovers were an important factor. Specifically, in the
questionnaire we asked workers to rate their level of fatigue on a five-point scale, for 1:00,
3:00, and 5:00 during the festival. There was no significant difference in fatigue between
the two groups at any of the three times (Mann-Whitney; p < 0.19, p < 0.71, p < 0.36). If
20
A particularly high number of sign-ups could also partially reflect luck rather than effort.
26
anything, treatment group workers start out more tired than control, before the incentive
hour begins, and then have slightly lower levels of fatigue than the control group at later
points. Notably, both treatment and control do report increasing fatigue levels over the
course of the festival. These findings are consistent with workers becoming tired mainly
due to passing hours of the day rather than the number of sign-ups achieved.
The questionnaire also asked treatment group workers a question about fatigue and
performance pay: “After having rested during the lunch break, did you still feel tired
(mentally or physically) from your work during the time of the $5 bonus?” Almost all
workers, 85 percent, answered either: “N.A. because I never got tired from the work I did
between 1 pm and 2 pm.” or else ”The lunch break was sufficient for me to feel refreshed.”
Only 15 percent chose “I was still tired from the work I did during the time of the $5 bonus
(between 1 pm and 2 pm), even after the lunch break.” These survey responses provide
evidence that almost all workers were either not fatigued at all from the incentive hour,
or else found the rest break sufficient to feel refreshed, and no longer tired, from effort
during the incentive hour.
As another check, we simply exclude all workers who mentioned still being tired
after the rest break, and re-estimate our difference-in-difference econometric model of
the treatment effect using the remaining non-fatigued workers. The results are essentially
unchanged relative to using the whole sample. The increase in output during the incentive
hour, and the subsequent large drop in the fifth hour, are both still large and statistically
significant (negative binomial regression; p < 0.06; p < 0.02). The drop for the final two
hours is also still jointly significant (Chi-square test; p < 0.06).21
5.4
Investigating changes in non-monetary motivations
The treatment differences for post-incentive hours are not consistent with the canonical
model, and fatigue effects do not appear to be a leading explanation, so we turn to the
hypothesis that performance pay affects non-monetary motivations of workers.
We used the questionnaire to elicit direct (albeit self-reported) evidence on whether
performance pay affected non-monetary motivations. Specifically, the questionnaire asked
treatment workers about a possible link between performance pay and task enjoyment
21
The coefficients and standard errors for the interaction terms 13:00-14:00*Treatment . . . 16:0017:00*Treatment are as follows: 0.79* (0.42); 0.10 (0.38); -1.16** (0.49); -0.79 (0.63).
27
in the post-incentive period: “How did the experience of getting a $5 bonus per text
(between 1 and 2 pm) affect your enjoyment of the work later in the day?” There were
four possible response categories, relating to fatigue, reduced non-monetary motivations,
no impact, and increased non-monetary motivations.22 We included fatigue as one of the
response categories to distinguish this form of discomfort from other sources of reduced
task enjoyment.23
We find that 53 percent of the workers found working less fun in the post-incentive
hours due to the experience of the bonus. Interestingly, however, about 21 percent said
that the experience of performance pay actually increased enjoyment of work in the postincentive period. Only 16 percent of workers said that they experienced fatigue, and
10 percent said the bonus had no impact on subsequent task enjoyment. These survey
responses are consistent with the view that performance pay affects the non-monetary
motivations of workers. The response of the majority is in line with the hypothesis that
performance pay can have negative psychological effects, and is consistent with the overall
lower output for treatment than control during the post-incentive ours. On the other hand,
the survey responses also provide an indication that heterogeneity may be important: For
a substantial minority, a more appropriate model might be one in which non-monetary
motivations are increased by performance pay, i.e.,
∂θ
∂z
> 0.
The time profile for aggregate output is also not fully consistent with the crowding out hypothesis. Specifically, if treatment workers have a reduction in non-monetary
motivations as soon as performance pay is introduced, they should exhibit lower output
than control group workers already in the first hour after performance pay is removed.
Instead, we observe similar or even greater output for treatment relative to control in the
first post-incentive hour, with the large drop in output for treatment workers occurring in
22
The exact wording for the response categories is as follows:
1. “Because I worked a lot harder during the hour with the bonus, I was very tired and found working
later on painful due to fatigue.”
2. “Getting the bonus made the work seem less fun later on, when there was no bonus.”
3. “Getting the bonus had no impact on my enjoyment of the work later on.”
4. “Getting the bonus made the work seem more fun later on even though there was no bonus.”
23
The question focused on motivations in the post-incentive period because it is difficult to ask workers
to disentangle monetary and non-monetary motivations while incentives are active. We did not ask how
motivation in post-incentive hours changed relative to motivation in baseline, in light of the intervening
performance pay, to avoid the question becoming too complex.
28
the final two hours. Thus, while performance pay may affect non-monetary motivations,
the effect is more complex than the standard crowding out hypothesis would suggest. One
explanation could be that the psychological effects of experiencing performance pay take
some time to emerge. The potential presence of individual heterogeneity, however, makes
it more complex to interpret the shape of the aggregate output profile. We discuss possible
composition effects below.
To shed further light on the role of non-monetary motivations, and potential heterogeneity, we took three traits selected ex ante as possible carriers of non-monetary
motivations and interacted these with the treatment effect. We selected the personality
trait conscientiousness due to evidence that it explains working hard on laboratory tasks
even in the absence of incentives, and because it tends to be correlated with positive
labor market outcomes (Judge et al., 1999; Segal, 2006). We took the personality trait
of extraversion as a potential measure of intrinsic task enjoyment, as extraverts report
liking to do things like “talk to strangers at parties,” which is analogous to approaching
strangers at the festival.24 We selected positive reciprocity due to laboratory evidence that
reciprocal subjects tend to reward generous payments payments even if rewarding is costly
for them (see Fehr and Falk, 2002). In the incentivized trust game conducted with our
questionnaire, respondents had a binary choice to return a favor or not; we use this as a
binary indicator of positively reciprocal tendencies.
We estimated an econometric model where we interacted the difference-in-difference
with worker traits.25 Figure 4 provides an easy way to see the key results. Each panel plots
coefficients showing how moving along the dimension of a given trait changes the treatment
difference (we report the regression results underlying the figure in the Appendix). To
facilitate comparison of coefficients across traits we use standardized measures of the
traits; coefficients give the impact of a one standard deviation change in the trait on the
normalized treatment effect, except reciprocity which is the dichotomous change.
The top panel of Figure 4 shows no change in the treatment effect for the incentive
24
The standard personality inventory we used involved ten items for each of the five personality traits.
Respondents indicated how well the item described them as a person using a five point scale. The exact
wording of all items is available at http : //ipip.ori.org/N ew IP IP − 50 − item − scale.htm.
25
Specifically, we estimate the following regression equation: sit = β + αC + γ2 h2 + ... + γ5 h5 + φ2 h2 · T +
... + φ5 h5 · T + θ2k h2 · T raitk + ... + θ5k h5 · T raitk + α2k h2 · T · T raitk + ... + α5k h5 · T · T raitk + T raitk + it .
The αtk ’s are the coefficients of interest, showing whether there is a statistically significant change in
the difference-in-difference at a given time t as trait k changes. Each panel in the figure shows results
from a regression including the three traits simultaneously, and plots the α coefficients.
29
hour, for conscientious workers, but statistically significant change in the treatment difference for post-incentive hours in the negative direction, i.e., an especially strong drop
in output (p < 0.001; p < 0.001; p < 0.002). For extraverted and reciprocal types the
treatment effect is significantly different. Among reciprocal workers there was a significantly stronger positive response during the incentive period (p < 0.001). There was also
a significant change in the treatment difference in the positive direction for the fourth hour
(where the aggregate profile shows the strongest output drop; p < 0.001). Among more
extraverted workers there was a consistent, statistically significant change of the treatment effect in the positive direction for all post-incentive hours (p < 0.001; p < 0.002;
p < 0.001). Thus, we see that the treatment difference changes in different directions for
extraverted and reciprocal workers compared to conscientious workers.
The other panels of Figure 4 show that various other traits had little bearing on the
response to the treatment. We find little relationship of the treatment effect to other personality traits in the Big Five. The treatment also did not interact with non-psychological
factors corresponding roughly to motives in the canonical or fatigue model, namely selfreported reputation concerns, or self-reported fatigue.26 We also asked workers if they
experienced a process of learning by doing, to check whether positive effects of the treatment in post-incentive hours might reflect a greater accumulation of skills for treatment
workers, but we see little evidence of this in Figure 4.27 Although the prediction that conscientiousness, extraversion, and reciprocity would be particularly important was formed
ex ante, one might still be concerned that with nine traits and four coefficients for each
trait some coefficients could be statistically significant just due to chance. We performed
a conservative Bonferroni correction for multiple hypothesis testing and find that almost
all coefficients for conscientiousness, reciprocity, and extraversion remain statistically significant (none of the coefficients for the other traits are statistically significant).28
26
We measured reputation concerns by asking: “Did you try to get a lot of people to send texts to your
number, because you thought this would give you a good reputation with [the agency name here], and
make it likely that you would be hired again in the future?” Answers were on a six-point scale. We
elicited self-reported fatigue at 15:00 as an indicator for the fatigue level of the worker (this question is
described above).
27
To capture learning by doing we asked how well the following statement captured the worker’s experience
at the festival: “There was a learning process for me: Over time, I learned how to be more successful
in convincing people to send texts.” Answers were on a six-point scale.
28
Using the 36 coefficients in Figure 4 as the number of comparisons, the adjusted threshold for statistical
significance is 0.05/36 = 0.001. Two coefficients, one for conscientiousness and one for extraversion, are
no longer significant but are “close” to the threshold (p < 0.002; p < 0.002).
30
−2
Percent change
0
2
4
Figure 4: Change in the normalized treatment effect by worker traits
13:00−14:00
14:00−15:00
15:00−16:00
Time of day
Conscientious
16:00−17:00
Extraverted
−2
Percent change
0
2
4
Reciprocal
13:00−14:00
14:00−15:00
15:00−16:00
Time of day
Intellect
16:00−17:00
Agreeableness
−2
Percent change
0
2
4
Emotional stability
13:00−14:00
14:00−15:00
15:00−16:00
Time of day
Reputation concerns
16:00−17:00
Learning by doing
Fatigue
Negative binomial estimates. Coefficients show the impact on the normalized
treatment effect of a one s.d. increase in the trait, or dichotomous change in
the case of reciprocity. The normalized treatment effect is the effect of the
treatment differencing by the baseline difference in output. The questionnaire
measured the Big Five personality traits, as well as self-reported reputation
concerns, learning by doing, and fatigue as of 15:00 (footnotes in text give
exact wordings). Error bars show +/- one standard error. Standard errors
adjusted for clustering on worker.
31
Figure 5: Normalized treatment effect by source of non-monetary motivation
0
17
:0
0
16
:0
0−
16
:0
0
0−
:0
15
:0
0−
15
:0
0
Time of day
14
13
:0
0−
14
:0
0
17
:0
0
0−
:0
16
15
:0
0−
16
:0
0
:0
15
0−
:0
14
13
:0
0−
14
:0
0
Percent change
−2 −1 0 1 2
Extraverted
Percent change
−2 −1 0 1 2
Conscientious
Time of day
00
00
−1
7:
00
16
:
00
−1
6:
00
−1
5:
15
:
00
14
:
13
:
00
−1
4:
00
Percent change
−2 −1 0 1 2
Reciprocal
Time of day
Negative binomial estimates. Coefficients show the normalized treatment effect associated with a one standard deviation increase in the trait, or the binary change in
the case of reciprocity. Error bars show +/- one standard error. S.e. adjusted for
clustering on worker.
Having seen that the treatment effect changes for conscientious, extraverted, and reciprocal workers, we investigate whether these changes imply qualitatively different treatment effects. Figure 5 shows the difference-in-difference profile taking into account the
effect of a high score on a given trait. In this case we focus one trait at a time, and plot the
sums of the relevant coefficients (the basic difference-in-difference profile plus the change
caused by having a high score on the trait) and appropriately calculated standard errors
(we provide the regression results underlying the figure in the Appendix).29
29
We estimate the following regression for each trait: sit = β + αC + γ2 h2 + ... + γ5 h5 + φ2 h2 · T + ... +
φ5 h5 · T + θ2 h2 · T raitk + ... + θ5 h5 · T raitk + α2 h2 · T · T raitk + ... + α5 h5 · T · T raitk + T raitk + it .
The figure plots φt + αt for each hour, and the standard error for the sum of the random variables.
32
We see in Figure 5 that treatment workers had a statistically significant positive
response to the treatment when incentives were active, regardless of whether they were
conscientious, extraverted, or reciprocal, although the magnitude was especially large for
reciprocal types. For subsequent hours, the stronger negative treatment effect for conscientious types translated into an immediate drop in output, albeit not as strong of a drop as
in later hours. Thus, the model of crowding out considered in the predictions section, with
an immediate and negative psychological impact of performance pay, provides a relatively
better description of behavior for conscientious workers.
By contrast, among reciprocal and extraverted workers we see that the differencein-difference profile is qualitatively different from the aggregate profile. The different
treatment effects for these types translated into positive output differences for the hour or
two after incentives, and no relative drop during later hours. For these workers, behavior
is more consistent with performance pay crowding in non-monetary motivations, although
the effect seems to dissipate over time once incentives are removed. If the positive treatment differences in post-incentive hours reflect increased non-monetary motivations, we
might expect to find that such workers reported stronger motivations. Consistent with this
hypothesis, we do find that extraverted workers in the treatment group were significantly
more likely to report that the experience of the performance pay made work “more fun”
(correlation of 0.40; p < 0.09). There is not a significant relationship between reciprocity
and reporting that work became more fun, suggesting that the crowding in associated with
reciprocity reflected a different channel besides enhanced task enjoyment. One implication
of individual heterogeneity is that the shape of the aggregate output profile might reflect
a composition effect. For example, suppose that some workers have a positive psychological response, which is relatively short lived, while others have a negative response that
persists: This could lead to an aggregate profile with a delayed drop in output.
In summary, the findings are broadly consistent with the view that performance pay
may have an impact on workers’ non-monetary motivations: we observe post-incentive
treatment differences that are hard to explain with a canonical model or fatigue effects;
workers report changes in non-monetary motivations as a result of performance pay;traits
selected as carriers of non-monetary motivation are important for explaining responses
These show the difference-in-difference profile for treatment vs. control for workers who score high on
the trait.
33
to the treatment. The findings are clearly not consistent, however, with a simple model
based on the crowding out hypothesis in which the psychological impact of performance
pay is uniformly negative.
6
Conclusion
The psychology literature and some management textbooks caution that performancecontingent pay may reduce workers’ other, non-monetary motivations to do a good job,
but there is little evidence from a real work setting (Kreps, 1997). In a real work setting
we implemented the classic design from psychology, which involves the treatment group
temporarily receiving performance-based payments. Treatment group workers produced
more during incentives, but less after incentives were removed, relative to control. Investigating explanations for the post-incentive treatment differences, we find little support for
explanations based on fatigue spillovers. On the other hand, the majority of treatment
workers report reduced non-monetary motivation.
While our findings are broadly supportive of the hypothesis that performance pay
can affect non-monetary motivations in the workplace, our results are not consistent with a
simple model where performance pay has uniformly negative effects. Indeed, a substantial
minority of treatment workers report enhanced non-monetary motivations as the result
of performance pay. Also, different sources of non-monetary motivations, captured by
personality traits and social preferences, are associated with qualitatively different treatment effects. These findings add to our understanding of the psychological impact of
performance pay, but they also raise a series of questions for future research.
The results of this study highlight the importance of future research on the precise
mechanisms that underly the psychological effects of performance pay in the workplace.
Various theories differ in their assumptions about what performance pay signals to workers
– low task enjoyability (Benabou and Tirole, 2003), market rather than social transaction
(Heyman and Ariely, 2004), appreciation of hard work (Fehr and Gaechter, 2002) – but little is known empirically about which of these mechanisms is most important. Our findings
on heterogeneity suggest that multiple mechanisms may be at work simultaneously, with
different types of workers viewing the same incentives in different ways. This complicates
the process of designing optimal incentive systems.
34
The interaction of performance pay with worker personality and preferences highlights the value of further research on the role of non-cognitive skills in the workplace.
Studying larger workforces would be useful, as would considering a range of different
types of work tasks, since the ways that particular personality traits interact with performance pay may depend on the task. For example, while extraversion could lead to positive
psychological effects for jobs involving personal interactions, the effect might be different
for jobs involving data entry.
Our findings also point to the importance or more research on the timing and duration of psychological effects of performance pay. Varying the duration of performance
pay, and the length of the post-incentive period, could shed light more light on the time
profile of psychological effects. For example, it might turn out that positive psychological
effects are more or less long lasting than negative effects, with implications for the long-run
impacts of performance pay. More work is also needed on understanding the nature of
psychological effects at the time when performance pay is active. The approach in psychology has been to have temporary performance pay, and use behavior after the removal
of incentives to shed light on the state of non-monetary motivations. A potential concern
with this approach is that the observed behavior could partly reflect a psychological response to removal of incentives (Fehr and Falk, 2002). Our version of the experimental
design minimizes such effects by warning workers ahead of time about the removal of performance pay; this should minimize mechanisms such as disappointment. Nevertheless,
it is important to think about new types of designs and measures that might shed more
light on this question.
Evidence of heterogeneity also calls for future work on how self-selection into jobs
affects the psychological impact of performance pay. This is important, to better understand the practical implications of the psychological effects of incentives for workplace
behavior. For example, if it happened that the types of workers who respond positively
to performance pay also self-select into jobs that advertise this type of compensation, this
has important implications for the desirability of performance pay from both and employer
and employee perspective. Our study randomly assigned workers to performance pay, and
recruiting did not emphasize this aspect of compensation. This was necessary to establish
the causal impact of performance pay on workplace behavior and to test different models
of labor supply. It would be possible, however, to conduct an experiment where workers
35
can self-select into jobs that are identical except that one is known to offer fixed wages
and another to offer temporary performance pay. Comparing across jobs would show how
behavior in the classic experimental design changes when we add the realism of workers
being able to self-select based on the nature of the incentive scheme. Self-selection is likely
to be multi-dimensional (Dohmen and Falk, 2011), involving bundles of traits ranging from
ability to sources of non-monetary motivations, with potentially fundamental implications
for the psychological impact and overall effectiveness of performance pay.
References
Al-Ubaydli, O., S. Andersen, U. Gneezy, and J. A. List (2014): “Carrots that
look like sticks: Toward an understanding of multitasking incentive schemes,” Southern
Economic Journal.
Babcock, P., K. Bedard, G. Charness, J. Hartman, and H. Royer (2011): “Letting down the team? Evidence of social effects of team incentives,” Discussion paper,
National Bureau of Economic Research.
Baron, J. N., and D. M. Kreps (1999): Strategic human resources: Frameworks for
general managers. Wiley New York.
Becker, A., T. Deckers, T. Dohmen, A. Falk, and F. Kosse (2012): “The Relationship Between Economic Preferences and Psychological Personality Measures,” Annual
Review of Economics, 4, 453–78.
Benabou, R., and J. Tirole (2003): “Intrinsic and extrinsic motivation,” The Review
of Economic Studies, 70(3), 489–520.
(2006): “Incentives and Prosocial Behavior,” The American Economic Review,
pp. 1652–1678.
Berg, J., J. Dickhaut, and K. McCabe (1995): “Trust, reciprocity, and social history,” Games and economic behavior, 10(1), 122–142.
Borghans, L., B. H. Golsteyn, J. Heckman, and J. E. Humphries (2011): “Identification problems in personality psychology,” Personality and Individual Differences,
51(3), 315–320.
Bowles, S., H. Gintis, and M. Osborne (2001): “Incentive-enhancing preferences:
Personality, behavior, and earnings,” The American Economic Review, 91(2), 155–158.
Browning, M., A. Deaton, and M. Irish (1985): “A profitable approach to labor
supply and commodity demands over the life-cycle,” Econometrica: Journal of the
Econometric Society, pp. 503–543.
Camerer, C., L. Babcock, G. Loewenstein, and R. Thaler (1997): “Labor supply
of New York City cabdrivers: One day at a time,” The Quarterly Journal of Economics,
112(2), 407–441.
36
Carneiro, P., and J. Heckman (2003): “Human Capital Policy,” in In Inequality in
America: What Role for Human Capital Policy?, ed. by A. Krueger, and J. Heckman,
p. 77240, Massachusetts. MIT Press.
Carpenter, J. P., and D. Dolifka (2013): “Exploitation aversion: When financial
incentives fail to motivate agents,” Discussion paper, IZA Discussion Paper.
Charness, G., and U. Gneezy (2009): “Incentives to exercise,” Econometrica, 77(3),
909–931.
Cohn, A., E. Fehr, and L. Goette (2014): “Fair wages and effort: Evidence from a
field experiment,” forthcoming in Management Science.
Cohn, A., E. Fehr, B. Herrmann, and F. Schneider (2014): “Social comparison
and effort provision: evidence from a field experiment,” forthcoming in Journal of the
European Economic Association.
Deci, E. (1971): “Effects of externally mediated rewards on intrinsic motivation,” Journal
of personality and Social Psychology, 18(1), 105–115.
Deci, E., R. Koestner, and R. Ryan (1999): “A meta-analytic review of experiments
examining the effects of extrinsic rewards on intrinsic motivation.,” Psychological bulletin, 125(6), 627.
Dohmen, T., and A. Falk (2011): “Performance pay and multidimensional sorting:
Productivity, preferences, and gender,” The American Economic Review, pp. 556–590.
Fehr, E., and A. Falk (2002): “Psychological foundations of incentives,” European
Economic Review, 46(4), 687–724.
Fehr, E., and L. Goette (2007): “Do Workers Work More if Wages Are High? Evidence
from a Randomized Field Experiment,” The American Economic Review, 97(1), 298–
317.
Gneezy, U., and J. List (2006): “Putting behavioral economics to work: Testing for gift
exchange in labor markets using field experiments,” Econometrica, 74(5), 1365–1384.
Gneezy, U., S. Meier, and P. Rey-Biel (2011): “When and why incentives (don’t)
work to modify behavior,” The Journal of Economic Perspectives, pp. 191–209.
Gneezy, U., and P. Rey-Biel (2014): “On the Relative Efficiency of Performance Pay
and Noncontingent Incentives,” Journal of the European Economic Association, 12(1),
62–72.
Gneezy, U., and A. Rustichini (2000a): “A Fine Is a Price,” J. Legal Stud., 29, 1.
(2000b): “Pay enough or don’t pay at all,” The Quarterly Journal of Economics,
115(3), 791–810.
Goette, L., and D. Huffman (2006): “Incentives and the Allocation of Effort Over
Time: The Joint Role of Affective and Cognitive Decision Making,” IZA Discussion
Paper, No. 2400.
37
Goldberg, J. (2013): “Kwacha Gonna Do? Experimental Evidence about Labor Supply
in Rural Malawi,” University of Maryland Working Paper.
Heckman, J. (2000): “Policies to foster human capital,” Research in Economics, 54,
3–56.
Heckman, J., and Y. Rubinstein (2001): “The importance of noncognitive skills:
Lessons from the GED testing program,” The American Economic Review, 91(2), 145–
149.
Heyman, J., and D. Ariely (2004): “Effort for payment a tale of two markets,” Psychological science, 15(11), 787–793.
Jordan, P. (1986): “Effects of an extrinsic reward on intrinsic motivation: A field experiment,” The Academy of Management Journal, 29(2), 405–412.
Judge, T., C. Higgins, C. Thoresen, and M. Barrick (1999): “The big five personality traits, general mental ability, and career success across the life span,” Personnel
psychology, 52(3), 621–652.
˝ szegi, B., and M. Rabin (2006): “A model of reference-dependent preferences,” The
Ko
Quarterly Journal of Economics, 121(4), 1133–1165.
Kreps, D. (1997): “Intrinsic motivation and extrinsic incentives,” The American Economic Review, 87(2), 359–364.
´chal, and C. Puppe (2008): “The currency of reciprocity: GiftKube, S., M. Mare
exchange in the workplace,” forthcoming in American Economic Review.
Kuhn, P., and C. Weinberger (2005): “Leadership skills and wages,” Journal of Labor
Economics, 23(3), 395–436.
Lazear, E. (2000): “Performance Pay and Productivity,” The American Economic Review, 90(5), 1346–1361.
Lepper, M., D. Greene, and R. Nisbett (1973): “Undermining children’s intrinsic
interest with extrinsic reward: A test of the” overjustification” hypothesis.,” Journal of
Personality and social Psychology, 28(1), 129.
Levitt, S. D., and J. A. List (2009): “Field experiments in economics: the past, the
present, and the future,” European Economic Review, 53(1), 1–18.
Lim, N., M. Ahearne, and S. Ham (2009): “Designing sales contests: Does the prize
structure matter?,” Journal of Marketing Research, 46(3), 356–371.
Lindqvist, E., and R. Vestman (2011): “The labor market returns to cognitive and
noncognitive ability: Evidence from the Swedish enlistment,” American Economic Journal: Applied Economics, 3(1), 101–128.
Nagin, D., J. Rebitzer, S. Sanders, and L. Taylor (2002): “Monitoring, Motivation,
and Management: The Determinants of Opportunistic Behavior in a Field Experiment,”
The American Economic Review, 92(4), 850–873.
38
Paarsch, H., and B. Shearer (1999): “The response of worker effort to piece rates: Evidence from the british columbia tree-planting industry,” Journal of Human Resources,
pp. 643–667.
Persico, N., A. Postlewaite, and D. Silverman (2004): “The Effect of Adolescent
Experience on Labor Market Outcomes: The Case of Height,” Journal of Political
Economy, 112(5).
Ross, L., and R. E. Nisbett (1991): The person and the situation: Perspectives of
social psychology. Mcgraw-Hill Book Company.
Segal, C. (2006): “Motivation, Test Scores, and Economic Success,” Job Market Paper,
Harvard Business School.
Segal, C. (2008): “Classroom behavior,” Journal of Human Resources, 43(4), 783–814.
Segal, C. (2012): “Misbehavior, Education, and Labor Market Outcomes,” forthcoming
in Journal of the European Economic Association.
Shi, L. (2010): “Incentive Effect of Piece-Rate Contracts: Evidence from Two Small Field
Experiments,” The BE Journal of Economic Analysis & Policy, 10(1).
Staw, B. M., B. J. Calder, R. K. Hess, and L. E. Sandelands (1980): “Intrinsic
Motivation and norms about payment1,” Journal of Personality, 48(1), 1–14.
Stutzer, A., L. Goette, and M. Zehnder (2011): “Active Decisions and Prosocial
Behaviour: a Field Experiment on Blood Donation,” The Economic Journal, 121(556),
F476–F493.
Titmuss, R. (1970): The Gift Relationship: From Human Blood to Social Policy. New
Press.
Watanabe, S., and Y. Kanazawa (2009): “A test of a personality-based view of intrinsic
motivation,” Japanese Journal of Administrative Science, 22, 117–130.
39
A
Appendix: For online publication only
A.1 Additional tables
Table A1: Regression estimates underlying Figure 4
13:00-14:00 * Treatment * Conscientiousness
14:00-15:00 * Treatment * Conscientiousness
15:00-16:00 * Treatment * Conscientiousness
16:00-17:00 * Treatment * Conscientiousness
13:00-14:00 * Treatment * Extraversion
14:00-15:00 * Treatment * Extraversion
15:00-16:00 * Treatment * Extraversion
16:00-17:00 * Treatment * Extraversion
13:00-14:00 * Treatment * Reciprocal
14:00-15:00 * Treatment * Reciprocal
15:00-16:00 * Treatment * Reciprocal
16:00-17:00 * Treatment * Reciprocal
(1)
0.29
(0.34)
-2.33***
(0.55)
-2.02***
(0.61)
-2.18***
(0.72)
-0.05
(0.30)
2.06***
(0.44)
2.14***
(0.70)
2.36***
(0.71)
2.32***
(0.62)
-0.48
(0.77)
3.75***
(0.95)
-0.08
(1.22)
(2)
0.12
(0.44)
0.92*
(0.53)
0.49
(0.45)
-0.35
(0.71)
-0.44
(0.40)
-0.77
(0.60)
-1.48**
(0.66)
0.83
(0.80)
0.45
(0.39)
0.01
(0.45)
1.08
(0.85)
-0.31
(0.78)
(3)
0.43
(0.37)
0.33
(0.56)
0.09
(0.55)
-0.42
(0.51)
-0.15
(0.33)
0.34
(0.51)
-0.70
(0.61)
0.45
(0.55)
-0.30
(0.37)
0.54
(0.50)
0.05
(0.49)
0.38
(0.77)
Other rhs. variables, coefficients supressed:
13:00-14:00 . . . 16:00-15:00
13:00-14:00 * Treatment . . . 16:00-15:00 * Treatment
13:00-14:00 * Conscientiousness . . . 16:00-15:00 * Conscientiousness
13:00-14:00 * Extraversion. . . 16:00-15:00 * Extraversion
13:00-14:00 * Reciprocal . . . 16:00-15:00 * Reciprocal
Conscientiousness
Extraversion
Reciprocal
Constant
Estimation method
Observations
N. Bin.
170
N. Bin.
170
N. Bin.
170
Notes: Negative binomial estimates. All traits are standardized so that a
coefficient gives the impact of a one standard deviation change, with the
exception of reciprocity which gives the dichotomous change. The sample is
all workers who responded to the questionnaire. Robust standard errors in
parentheses, adjusted for clustering on worker. ***, **,* indicate significance
at 1-, 5-, and 10-percent level, respectively.
40
Table A2: Regression estimates underlying Figure 5
α2
13:00-14:00 * Treatment * Trait
α3
14:00-15:00 * Treatment * Trait
α4
15:00-16:00 * Treatment * Trait
α5
16:00-17:00 * Treatment * Trait
φ2
13:00-14:00 * Treatment
φ3
14:00-15:00 * Treatment
φ4
15:00-16:00 * Treatment
φ5
16:00-17:00 * Treatment
Conscientiousness
(1)
0.36
(0.30)
-0.83*
(0.49)
-0.61
(0.48)
-0.73
(0.58)
0.86**
(0.37)
0.24
(0.41)
-1.08**
(0.48)
-0.57
(0.53)
Extraversion
(2)
0.24
(0.30)
0.80**
(0.39)
0.63
(0.49)
0.70
(0.59)
0.87**
(0.37)
0.33
(0.45)
-0.91*
(0.53)
-0.50
(0.55)
Reciprocal
(3)
2.32***
(0.55)
0.85
(0.92)
3.69***
(1.26)
0.61
(1.27)
-0.26
(0.46)
-0.03
(0.59)
-2.39***
(0.87)
-0.72
(0.80)
N. Bin.
170
N. Bin.
170
N. Bin.
170
1.12**
(0.51)
1.14*
(0.63)
-0.28
(0.59)
0.19
(0.79)
2.06***
(0.49)
0.82
(0.70)
1.30
(0.81)
-0.11
(0.88)
Other rhs. variables, coefficients supressed:
13:00-14:00 . . . 16:00-15:00
13:00-14:00 * Trait . . . 16:00-15:00 * Trait
Trait
Constant
Estimation method
Observations
Normalized treatment effects plotted in Figure 5
α2 + φ2
1.22***
(0.47)
-0.59
(0.65)
-1.68**
(0.77)
-1.30
(0.82)
α3 + φ3
α4 + φ4
α5 + φ5
Notes: Negative binomial estimates. Traits for Columns (1) to (3) are conscientiousness, extraversion, and reciprocity, respectively. αt + φt gives the normalized treatment effect for workers who
score high on a given trait. All traits are standardized so that a coefficient gives the impact of
a one standard deviation change, with the exception of reciprocity which gives the dichotomous
change. Standard errors are calculated for a sum of random variables. Statistical significance
is based on Chi-squared test. The sample is all workers who responded to the questionnaire.
Robust standard errors in parentheses, adjusted for clustering on worker. ***, **,* indicate
significance at 1-, 5-, and 10-percent level, respectively.
41
A.2 Proofs for propositions 1, 2, 3, and 4
This portion of the appendix provides proofs of the propositions in the behavioral predictions section.
Proof of Proposition 1 First order conditions for a treatment group worker are given
by:
∂v
= z + p0 (e1 ) − c0 (e1 ) = 0
∂e1
∂v
= p0 (e2 ) − c0 (e2 ) = 0
∂e2
By inspection, optimal effort for treatment workers in period 1, e∗T
1 , is increasing in the
piece rate, given concavity of p(·) and convexity of c(·). Optimal effort in period 2, e∗T
2 ,
however, is independent of the piece rate. First order conditions for a control group worker
are given by:
∂v
= p01 − c0 (e1 ) = 0
∂e1
∂v
= p02 − c0 (e2 ) = 0
∂e2
Optimal effort in period 1 for control workers, e∗C
1 is obviously less than optimal effort for
treatment workers, given that control workers have z = 0 in the first order condition for
period 1. The first order condition for period 2 effort is identical for treatment and control
∗T
so we have e∗C
2 = e2 regardless of the level of z. Thus, z > 0 causes treatment group
workers to work harder in period 1 than control group workers, but effort is the same for
treatment and control in period 2.
Proof of Proposition 2:
First order conditions for a treatment group worker are given by:
∂v
= z + p0 (e1 ) − c0 (e1 ) − c0 (e2 , k(e1 ))k 0 (e1 ) = 0
∂e1
∂v
= p0 (e2 ) − c0 (e2 , k(e1 )) = 0
∂e2
Totally differentiating with respect to z yields:
00
∂e1 c (e1 ) + c00 (e2 , k(e1 ))k 0 (e1 ) − p00 (e1 )
c00 (e2 , k(e1 ))k 0 (e1 )
1
∂z
=
2
0
c00 (e2 , k(e1 ))k 0 (e1 )
c00 (e2 , k(e2 )) − p00 (e2 ) ∂e
∂z
Denote the first matrix, consisting of second derivatives, by H. Second order conditions
for an (interior) maximum imply that the determinant of H must be positive. Applying
Cramer’s Rule, the derivatives of the first and second period effort levels with respect to
z are given by
00
1
c00 (e2 , k(e1 ))k 0 (e1 )
c (e1 ) + c00 (e2 , k(e1 ))k 0 (e1 ) − p00 (e1 ) 1
det
det
0 c00 (e2 , k(e2 )) − p00 (e2 )
c00 (e2 , k(e1 ))k 0 (e1 )
0
∂e1
∂e2
=
and
=
∂z
|H|
∂z
|H|
42
∂e2
1
In order to have ∂e
∂z > 0 and ∂z < 0, the signs of the determinants of the two numerator
matrices must be positive and negative, respectively. Writing out the determinants, these
conditions can be stated
c00 (e2 , k(e2 )) − p00 (e2 ) > 0
and
− c00 (e2 , k(e1 ))k 0 (e1 ) < 0.
Both of these hold unambiguously given assumptions about convexity of c(·), concavity of
p(·), and k 0 (e1 ) > 0. This proves the claim in Proposition 2.
Proof of Proposition 3:
With a rest break between period 1 and period 2 sufficient to have a fatigue stock of zero
regardless of period 1 effort, we have k 0 (e1 ) = 0. As shown in the proof for Proposition
2
2 this implies ∂e
∂z = 0, so period 2 effort of treatment workers is unaffected by having
performance pay in period 1. Given c(e2 , 0) = c(e2 ), treatment and control workers have
the same first order conditions and optimal effort levels for period 2 effort. This proves
the claim in Proposition 3.
Proof of Proposition 4:
First order conditions for a treatment group worker are given by:
∂v
= z + θ1 + p01 − c01 = 0
∂e1
∂v
= θ2 + p02 − c02 = 0
∂e2
Totally differentiating with respect to z yields
∂e1 00
1 + θ10
c1 − p001
0
∂z
=
2
θ20
0
c002 − p002 ∂e
∂z
Denote the first matrix by H. Note that, due to convexity of c(·) and concavity of p(·),
|H| = (c001 − p001 )(c002 − p002 ) > 0.
Applying Cramer’s rule, the derivatives of the first and second period effort levels
with respect to the piece rate are given by
00
1 + θ10
0
c1 − p001 1 + θ100
det
det
θ20
c002 − p002
0
θ200
∂e1
∂e2
=
and
=
∂z
|H|
∂z
|H|
∂e1
∂e2
In order to have ∂z > 0 and ∂z < 0, the signs of the determinants of the two numerator
matrices must be positive and negative, respectively. Writing out the determinants, these
conditions can be stated
(1 + θ10 )(c002 − p002 ) > 0
and
(c001 − p001 )θ20 < 0.
0
The first condition holds as long as 1 > −θ1 . Recall that the marginal utility of income
was normalized to 1. Thus, the first condition states that period 1 effort is increasing
in the wage, as long as the marginal utility of an additional dollar is greater than the
marginal reduction in intrinsic motivation from introducing z > 0. The second condition
always holds, given our assumptions: introducing performance pay (for period 1) must
2
reduce period 2 effort because it decreases intrinsic motivation in that period ( ∂θ
∂z < 0),
without generating any offsetting financial incentives in period 2. This proves the claim
in Proposition 4.
43