Performance Pay and Workers’ Non-Monetary Motivations: Evidence from a Natural Field Experiment David Huffman1 Michael Bognanno2 April 2, 2015 Abstract A literature in psychology and behavioral economics cautions that paying workers for good performance may undermine non-monetary motivations to do a good job. This paper provides the first implementation of the standard experimental design from psychology in a real work setting with paid workers. The findings are consistent with the view that performance pay may have negative psychological effects, but there is also evidence of heterogeneity. A sub-group of workers report positive psychological effects, and there is suggestive evidence that different responses to incentives are related to worker personalities and preferences. 1 2 Keywords: Incentives, non-cognitive skills, experiment, intrinsic motivation JEL codes: D03, J22, J33 University of Oxford and IZA; e-mail: [email protected] Temple University and IZA; e-mail: [email protected] 1 Introduction Literature in psychology, behavioral economics, and prominent management textbooks, caution that paying workers for good performance may undermine non-monetary motivations for doing a good job (Deci, 1971; Lepper et al., 1973; Kreps, 1997; Baron and Kreps, 1999; Gneezy et al., 2011). The psychology literature has mainly used student subjects in a lab setting performing tasks such as solving puzzles. The typical experimental design involves treatment subjects going through three stages, ABA, where A involves no monetary incentives and B involves incentives for performance. Control subjects never get incentives, going through AAA. The key stylized fact is that treatment subjects have lower output than control in the third stage, consistent with performance pay having reduced non-monetary motivations (for a meta analysis see Deci and Ryan, 1999). Some studies even find lower output for treatment in the second stage, when incentives are active, consistent with the monetary incentives being too weak to offset a reduction in non-monetary motivations (e.g., Deci, 1971). There are various proposed mechanisms for such effects, many of which have in common that the introduction of performance pay “signals” something to the worker, about the enjoyability of the task, or the relevance of social norms calling for hard work, or the beliefs of the employer about the worker’s trustworthiness (Gneezy and Rustichini, 2000a and 200b; Benabou and Tirole, 2003 and 2006; Heyman and Ariely, 2004; Gneezy et al., 2011; Carpenter and Dolifka, 2014). We refer to these as changes in non-monetary motivations, rather than using the term “intrinsic motivation” from psychology, which is sometimes taken to mean task enjoyment alone (Fehr and Falk, 2002). If performance pay does affect worker’s non-monetary motivations, this has important implications for economics and for managers. It means that even though performance pay tends to increase output in the workplace (e.g., Lazear, 2000), the size of the impact depends partly on “psychological variables” that are left of out economic models, and the impact might be even greater in the absence of negative psychological effects (Benabou and Tirole, 2003). This raises new issues for thinking about the optimal design of incentives, in terms of whether different ways of delivering incentives might have better psychological properties. This paper makes two main contributions: It provides the first implementation 1 of the “classic” design from psychology in the context of a real work setting with paid workers, and it explores potential heterogeneity in the psychological effects of performance pay.1 One reason why it is particularly important to have evidence from a workplace is that most previous studies have focused on traditionally unpaid activities, such as solving puzzles, or volunteer activities like collecting donations or contributing to a student newspaper.2 Understanding the impact of incentives on pro-social behaviors is clearly important, but the extent and nature of non-monetary motivations in the domain of paid work may be quite different (Titmuss, 1970; Staw and Calder, 1980; Kreps, 1997; Fehr and Falk, 2002; Gneezy and Rustichini, 2000b; Stutzer et al., 2011). Evidence from a real work context, where payment is the norm, is also crucial because the psychological effects of performance pay are thought to work through signaling, and context can affect how individuals interpret the meaning of a given action (Ross and Nisbett, 1991). Investigating individual heterogeneity in psychological effects is important, because heterogeneity would imply that the impact of performance pay depends on the mix of worker types in the job. Our subjects were 39 workers, hired to mingle with the crowd at a street festival for 5 hours and convince attendees to register in a company’s database. We were able to measure the “sign-ups” generated by each worker on a minute-by-minute basis, for 195 worker-hour observations. Workers were randomly assigned to a control group receiving a fixed wage of $18 per hour, or a treatment group who received the same base wage but also an additional $5 per sign-up during the second hour. Treatment workers learned about performance pay only at the beginning of the second hour, and learned that it would be temporary, lasting exactly one hour. The design involves several methodological innovations relative to the standard design from psychology. First, we deliberately chose a “high-powered” incentive, $5 per sign-up, to see whether there are negative psychological effects even when incentives are strong; some previous studies have found the strongest negative psychological effects of incentives when incentives were quite weak (Gneezy and Rustichini, 2000b). Second, we 1 Jordan (1989) studies intrinsic motivation in the workplace, but measures only self-reported motivation and job satisfaction, not actual work performance, and also does not have random assignment of workers to treatment and control. 2 This includes one of the initial crowding out experiments where subjects were students writing headlines for a student newspaper (Deci, 1971); while this comes closer to a work context, student newspapers are traditionally done on a volunteer basis and for the purposes of education and skill development. This study also involved only 8 subjects, and the measure of output was speed in completing headlines, without any assessment of quality. 2 added a substantial rest break right after the incentive hour for both treatment and control, to help rule out differential levels of fatigue later on, and a follow-up questionnaire asked questions about fatigue; fatigue effects would be an alternative explanation for why treatment group workers exert less effort than control later in the workday, having worked harder earlier on.3 Third, we informed treatment workers ahead of time, at the time performance pay was introduced, about the future removal of incentives, to minimize surprise or “disappointment” effects that could otherwise play a role. Fourth, our questionnaire measured worker traits such as personality and social preferences, as well as self-reported changes in non-monetary motivations, facilitating the study of individual heterogeneity; research on intrinsic motivation in psychology has typically not considered the role of personality type (Watanabe and Kanazawa, 2009). Fifth, our design allows observing the time profile of worker output after the removal of incentives; this can shed light on the timing and duration of any treatment effects on non-monetary motivations. Sixth, unlike many previous studies, our subjects were unaware of being in a study;4 this is important for ruling out experimenter demand as a source of non-monetary motivations to work hard (Levitt and List, 2009). To preserve unawareness of being in an experiment and avoid treatment contamination, treatment and control workers were randomly assigned to different areas of the festival and kept separate throughout the work day. This raises a potential concern in terms of randomization failure, namely if different areas of the festival turned out by chance to have different levels of customer availability. We discuss why differences in customer availability are unlikely to be a binding constraint, because there were tens of thousands of potential customers, but we also check robustness of the results to a difference-in-difference analysis that corrects for any time-invariant differences in productivity for treatment and control. Our main finding is that treatment group workers had substantially higher output than control workers while performance pay was active, but lower output than control when considering the hours after incentives were removed. The lower output during the post-incentive hours did not manifest immediately, however, but rather grew over time. The pattern is robust to using a difference-in-difference analysis. The difference in out3 One of the early psychology experiments did have a substantial break of some weeks between phases of the study, but in that case treatment subjects did less of the task already during the incentive phase and thus fatigue effects were not an issue for interpreting the results (Deci, 1971). 4 We obtained a waiver of consent from the IRB, under the relevant section of the federal guidelines. 3 put profiles for treatment and control after removal of performance pay is not consistent with a simple, canonical economic model, because both groups of workers faced the same monetary incentives in those hours.5 Turning to potential explanations, we consider a modified version of the canonical economic model that includes fatigue effects, and a model that incorporates a negative impact of performance pay on non-monetary motivations. We find little support for explanations based on fatigue spillovers; all indications suggest that treatment workers were no more fatigued than control workers for the post-incentive hours. On the other hand we do find a negative effect of performance pay on non-monetary motivations (self-reported), with the majority of treatment group workers indicating that the experience of performance pay made work “less fun” in subsequent hours. This is consistent with a role for changed non-monetary motivations in explaining the observed behavior. Interestingly, however, there was individual heterogeneity, with a substantial minority reporting that the experience of performance pay made work seem “more fun” even after incentives were removed. Thus, the findings are not consistent with a simple model in which the psychological impact of incentives is uniformly negative. The shape of the aggregate output profile, with its delayed drop in output, is also not fully consistent with the crowding out hypothesis, which predicts an immediate negative psychological impact of performance pay. Investigating the role of non-monetary motivations and individual heterogeneity in more depth, we find that several types of non-cognitive skills, selected ex ante as potential carriers of non-monetary motivation (personality traits of conscientiousness and extraversion, and social preferences in the form of positive reciprocity), are important for explaining how workers responded to the experience of incentives. Conscientious workers exhibit lower output immediately in the post-incentive hours, and the drop in output is particularly pronounced. Extraverted and reciprocal workers, by contrast, actually exhibit positive effects of the treatment on output in the post-incentive hours. The results on noncognitive skills need to be treated with caution as they are based on a modest number of individuals, but they suggest that heterogeneity in psychological effects of incentives may be systematically related to worker traits. We discuss how the shape of the aggregate 5 Notably, standard inter-temporal substitution motives from life-cycle models are not relevant, under the plausible assumption that one hour of piece rate earnings leaves the marginal utility of income unchanged. 4 output profile, in particular the delayed drop, could possibly reflect composition effects stemming from individual heterogeneity. In summary, we took the crowding out hypothesis from psychology, and the standard approach to testing this hypothesis, to a real work setting. Several aspects of our findings are consistent with the view that psychological effects of performance pay are a relevant consideration for real work settings: we observe treatment differences in behavior that are hard to explain with a canonical model of labor supply, or a model with fatigue effects; workers report changed non-monetary motivations; worker traits related to non-monetary motivations are important for explaining the treatment effect. At the same time, our results show that the issue is complex, and challenge the view that performance pay necessarily has negative psychological effects. A substantial minority of workers reports increased non-monetary motivations, and some worker traits are associated with crowding in of non-monetary motivations. This heterogeneity implies that the overall psychological effects of performance pay could be negative, positive, or zero depending on the mix of worker types in the job. Thus, despite a large literature on crowding out in the lab, our findings point to the need for further research in real work settings. In the conclusion we discuss several directions for future research: uncovering the specific mechanisms that underly the psychological impact of incentives in the workplace; better understanding of the timing and duration of psychological effects; research on the determinants of individual heterogeneity, and how the impact of worker traits may vary with the nature of the job; research on how the ability of workers to self-select into jobs affects the psychological impact of performance pay. 2 Related Literature Our findings complement a previous literature in behavioral economics on incentives and non-monetary motivations. A seminal paper by Gneezy and Rustichini (2000b) showed that a group of subjects paid a low piece rate collected fewer contributions for charity than a group of volunteers, while a group paid a higher piece rate collected about as much as the volunteers. This evidence is valuable as it shows how low-powered incentives can have no impact or even reduce performance on an important type of activity. Our study is dif- 5 ferent because we focus on a work setting, where payment is the norm, and we investigate heterogeneity. Also, we take our investigation in a different direction, towards understanding whether high-powered incentives have psychological effects; this seems useful because we know from previous evidence that employers may use performance incentives that are strong enough to increase output substantially (Lazear, 2000 and others). One explanation for negative psychological effects of incentives, if incentives are low, is a signal of low “social value” of the task (Gneezy and Rustichini; 2000b). This raises the question whether high-powered incentives might avoid negative psychological effects, by signalling high task value. Our findings are consistent with even high-powered incentives reducing non-monetary motivations, for at least some workers. This suggests that signals of task value may not be the only mechanism underlying psychological effects of incentives. It also underlines the importance of research on psychological effects, as they are not eliminated simply by using strong incentives.6 Various studies in behavioral economics have explored whether compensation affects non-monetary motivations in the workplace, but mainly focusing on changes in the level of fixed wages (e.g., Gneezy and List, 2006; Kube et al., 2008; Cohn et al., 2014; Cohn et al., 2014; Gneezy and Rey-Biel, 2014). Our paper differs because we explore how changing the payment mode – introducing performance pay – might change worker behavior by either crowding out or crowding in non-monetary motivations.7 The results on worker personality also contribute to the growing literature on “noncognitive skills” in economics. Economists are increasingly interested in studying how such traits can be important determinants of labor market success and life outcomes.8 Non-cognitive skills have also been shown to affect task motivation in the absence of financial incentives, in the context of tests of cognitive ability (Segal, 2012). Our evidence 6 Another important paper by Gneezy and Rustichini (2000a) uses a within-subject design, but focusing on the introduction of a temporary fine. They find that imposing a small fine for picking children up late from daycare actually worsens behavior, and that there is no improvement after the fine is removed. Our paper is different because of the focus on workers. Also, we have an incentive that is powerful enough to elicit better performance while it is in place. See also Fehr and Gaechter (2002) for laboratory evidence that fines can be worse than no incentives, and Carpenter and Dolifka (2014) for lab evidence on how the impact of piece rates varies with perceived incentives of the employer. Charness and Gneezy (2011) also test the psychological effect of incentives in the health domain. 7 Babcock et al. (2011) investigate the impact of team compensation in the domain of exercise. 8 See Heckman (2000); Bowles et al. (2001a);Heckman and Rubinstein (2001); Carneiro and Heckman (2003); Persico et al. (2004); Kuhn and Weinberger (2005); Segal (2008); Lindqvist and Westman (2011); Borghans et al., (2011); Becker et al. (2012). 6 complements this literature, providing some of the first evidence about how these traits relate to high-frequency data on performance in the workplace, and also to the way that individuals respond to introduction, and removal, of incentives. There is a literature using field experiments or natural experiments to study the impact of incentives on effective labor supply or effort (Lazear, 2000 ; Paarsch and Shearer, 1999; Nagin et al., 2002; Fehr and Goette, 2007; Goldberg, 2013; Shi, 2010; Al-Ubaydli et al., 2014 and many others). Our study adds evidence about the impact of temporary incentives. Fehr and Goette (2007) study the impact of a temporary change in the performance pay rate for bicycle messengers, but do not observe behavior in the absence of performance pay. Temporary incentives are useful for shedding light on how performance pay interacts with non-monetary motivations, but they are also important to study in their own right. They potentially allow employers to add performance incentives at selected times, in response to shocks to the marginal value of worker effort. They also occur quite often in practice, e.g., in the form of temporary employee sales contests; total expenditures on such contests are estimated to have been more than 26 billion dollars in 2000 in the U.S. (Lim et al., 2009). While there is some work in the business literature on the impact of temporary incentives on worker behavior, to date such analyses have not typically involved a control group, making interpretation difficult (see Lim et al., 2009). 3 3.1 Design of the Experiment Work setting and nature of the task The study took place during an afternoon street festival in a major U.S. city. The festival extended for six blocks of a major shopping street, which was closed to auto traffic. Businesses along the street, which included retailers as well as restaurants, operated booths during the festival that featured their products. The business mix was similar, and there were no differences in the types of activities available, for different areas of the festival. The festival lasted five hours. An estimated 60,000 people visited the festival, so that there were large crowds present at all times. The workers used in our study were hired to assist a start-up company, but were directly employed by a marketing agency; the start-up contracted with the agency, which 7 provides workers for promotional events. The agency advertised the job using an online job site, offering a fixed wage of $18 per hour, and ultimately provided 39 workers.9 Any reputation concerns workers might have had would have been focused on the marketing agency and not the start-up; the interaction of the agency with the start-up was essentially one-shot, because the promotion was expected to be a one-time event. The workers were assigned the task of mingling with the crowd, and trying to convince attendees to sign up for the company’s database by sending a text. A text received by the company’s server automatically established a database entry linked to the individual’s cell phone number. Registration in the database would allow the company to send marketing materials about the service in the future, and also provided a measure of latent demand that was potentially relevant for appealing to venture capitalists. While there was a clear quantity dimension for output, there was less scope for quality variation. Workers were given only basic information about the product. Essentially their job was to get the customer to sign up in order to receive more detailed information. Thus, getting a customer to sign up reflected the key achievement for the worker. Our measure of individual worker output comes from the fact that the text sent by the customer included the unique ID number of the worker with whom they spoke. Each worker was given a laminated card containing their ID number and the content of the text that the customer needed to send, including the ID. We observe the precise time each text arrived on the start-up company’s server over the course of the festival. The work setting is attractive for studying non-monetary motivations because it has two features: (1) weak monitoring of worker output by the employer; (2) accurate monitoring of output by researchers. Clearly, in most work settings employers do not have perfect monitoring, which means that shirking is possible and non-monetary motivations of workers are important for output. In such settings, however, it may be even more difficult for researchers to have data on worker performance. In our study there is a “wedge” between monitoring by employer and by researchers, which comes from the fact that the start-up provided the data on sign-ups to the researchers, but not to the marketing agency that was the direct employer of the workers. The fact that sign-up data was not shared with the agency was made clear to workers at the beginning of the festival, and the rational given for use of ID numbers was to allow the start-up “to track sales in 9 This was typical of the wages offered by the agency. 8 different areas of the festival”. When performance pay was introduced, it was made clear that the calculation of payments, and payments themselves, would come directly from the start-up. Thus, there was no contradiction between workers receiving performance pay and the agency not receiving the data on sign-ups. There were three managers for the marketing agency present during the festival, but workers could easily avoid visual monitoring by losing themselves in the huge crowd. They were also free to go up side streets to talk to customers, making it even easier to avoid observation and to explain ex post to management why they were not observed. In summary, while output was observable for the purposes of the study, workers has substantial latitude for shirking. 3.2 Treatment assignment We randomly assigned workers to treatment and control the day before the festival, stratifying on information about age, gender, and previous experience provided by the marketing agency. Thus, these observables were balanced across treatment and control at the outset. On the day of the festival, workers arrived early for a training session. At that point they received laminated cards with their ID numbers. The laminated card also told the worker where in the festival they would be working, and gave them a schedule of water breaks and a longer rest break. Importantly, the workers were used to being assigned to work groups. Indeed, the marketing agency routinely randomly assigned workers to work groups, stratified based on characteristics, to maximize “marketing effectiveness” at promotion events. Therefore, the fact that workers were assigned to groups should not have caused any suspicions about there being an experiment in progress (there were also no suspicions voiced by workers during the experiment). To avoid awareness that an experiment was taking place, and also to comply with the wishes of the start-up, we kept the treatment and control groups separate from each other throughout the whole festival. We randomly determined beforehand that treatment workers would be assigned to stay between 20th and 17th street while control workers would be told to stay between 17th street and 14th street. The rule about not crossing 17th street was explained as being important in order to have “equal coverage” of the festival, and was enforced by having a manager of the marketing agency posted at 17th street. During the festival, each group took water breaks, and a longer rest break, at tents at their respective ends of the festival. We describe the procedures regarding rest breaks in 9 more detail below. The treatment assignment is summarized in Figure 1, using a map of the festival. The triangles marked “West” and “East” indicate the tents where treatment and control workers came for water breaks, respectively. The “Middle” tent was another tent for the start-up but was not used by our workers. Figure 1: Treatment and control locations ! ! ! !!!!!!!!!!!!Treatment!Workers!! !!!!!!!!!!!!!!Control!Workers! 19th! Street! ! ! ! Principle)Street) ! Mid! East! Temt! ! ! ! ! West! th 20 ! Street! 18th! Street! 17th! Street! 16th! Street! 15th! Street! 14th! Street! ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Given the geographic separation of treatment and control workers during the festival, it is important to consider whether there could have been a failure of randomization, in the sense that the portion of the festival randomly assigned to treatment workers turned out to have different “characteristics” that mattered for the productivity of workers. There are several reasons why such differences are unlikely. First, the festival took place in a quite narrowly-defined geographic area of the city, and the types of businesses and booths were similar along the six blocks of the festival. Second, crowds flowed into and out of the festival from all directions throughout the festival, so there was not a directional flow of potential customers that might cause one end of the festival to have more “fresh” customers. While there was one fewer side street in the treatment group area of the festival (19th street ends when it meets the principal street) there was instead a pedestrian path, so there was no important difference in accessibility to customers. Third, sustained subjective observations by the experimenters indicated no discernible differences in the size or composition of the crowds at either end of the festival; the experimenters were at each end of the festival throughout the event, photographs were taken of the crowd on an hourly basis, and experimenters also switched ends of the festival periodically. Fourth, we check robustness of the results to normalizing the estimates of the treatment effect by the initial baseline outputs of treatment and control workers. Thus, even if there were a difference in the level of customers available to workers at one end of the festival, compared to the 10 other, this is taken care of by the difference-in-difference. Finally, although normalizing by baseline outputs would not solve differential time trends in availability of customers across the two halves of the festival, there were so many potential customers relative to workers (60,000 vs. 39) that even if such time trends were present, availability of customers was unlikely to have been a binding constraint. 3.3 Timeline for the experiment Figure 2 summarizes the timeline of the experiment. The first time interval, denoted baseline, started at 11:45. By that time workers had finished the training, and positioned themselves in their respective assigned portions of the festival. There was a substantial crowd already gathered even though the festival had not officially started, and workers began mingling with the crowd and getting sign-ups; the first sign-ups for treatment and control occurred about one minute apart. At that point the treatment group was unaware that performance pay would be introduced. Figure 2: Timeline for the experiment At 1:00 pm both groups checked in at their respective tents for a water break. Once everyone was present, an brief announcement was made to the treatment group. They were told: “For the next hour only there will be a special promotion from [start-up name here], where you get an extra $5 for every text that comes in with your ID number. This is on top of the $18. Note that this only lasts for the next hour, so any text that comes 11 in after 14:00 will not count for the $5. There won’t be any promotions later in the day.” After the hour with performance pay, half of the treatment group and half of the control group went on break for 30 minutes. They rested in the shade during this time at their respective ends of the festival, eating sandwiches that were provided and being watched (unobtrusively) by one of the research team. Subsequently these workers went back out, and the other half of treatment and control came in for a 30 minute break. The staggered rest break was requested by the start-up, to have continuous brand presence on the street at all times. After the second group was done with their break, all workers were back on the street for the remaining two hours of the festival. 3.4 Questionnaire After the experiment was completed, workers were contacted by the agency and asked to fill out an online questionnaire being conducted by researchers who were studying the sales data. Responses were entered anonymously, but included the worker’s ID number, to allow matching to the productivity data. It was explained in an informed consent screen that the agency would never learn the survey responses of any individual. Furthermore, the agency did not even know the content of the survey. The researchers would simply inform the agency which workers had completed the survey, and authorize the agency to pay the workers $15 for participation plus any additional earnings from the trust game, described below. Out of 39 workers, 34 completed the survey. The first part of the questionnaire asked about a well-known measure of personality from psychology, the “Big Five,” which consists of five traits: conscientiousness, extraversion, agreeableness, intellect, and emotional stability. We used a standard battery of questions for the elicitation.10 The questionnaire also asked a series of other questions, about demographics and about the experience of working at the festival. Particularly important were a series of questions about fatigue, and a question about non-monetary motivations. We provide the exact wordings of all questions used in the analysis as we discuss the results (wording is sometimes given in a footnote). Respondents also participated in a modified version of the “trust game” developed by Berg et al. (1995), with other survey respondents. In this game a player can exhibit 10 The questions for the Big Five are available at http : //ipip.ori.org/N ew IP IP − 50 − item − scale.htm. 12 reciprocity, by choosing to “return a favor” in a one-shot anonymous interaction even though doing so entails a financial cost.11 3.5 Sample characteristics and randomization check Table 1 provides descriptive statistics on demographics for workers in our sample. The table shows that 58 percent of workers were female, and the average age was 25 (the youngest worker was 20 and the oldest was 40). About half of the workers were “veterans” according to the manager, and using questionnaire responses a similar fraction are “experienced” in the sense of having participated in at least three previous promotion events. For 10 percent of workers, the first language was not English. About 80 percent of workers had completed at least some college education. Table 1: Sample characteristics Mean age Fraction female Fraction veteran Fraction experienced Fraction english second language Fraction some college Sample statistic Std. deviation 25.23 0.59 0.54 0.51 0.10 0.85 (4.56) (0.50) (0.51) (0.51) (0.31) (0.37) Notes: Based on 39 worker observations for Age, Female, and Veteran. Other statistics are based on the 34 survey respondents. As discussed above, age, gender, and the veteran indicator were balanced across treatment and control ex ante. Furthermore, a test for randomization cannot reject that all of the worker characteristics in Table 1, as well as reciprocity and the big five personality 11 The game involved two players, each with an endowment of $10. The first mover could choose to keep the endowment, or pass all of it to the second mover. If the money was passed, it was tripled by the experimenter, so that the second mover received $30 in addition to his or her initial endowment. The second mover had a binary choice to keep all of the money, in the event that money was passed, or to send back $20 to the first mover. Importantly, there was no financial motive for the second mover to send anything back. Respondents knew that it would be randomly determined who they would be matched with, among the other respondents, and which role they would play. They were asked to make a choice for both roles. We used role reversal because we use the second-mover decision as a binary measure of reciprocity. This is an incentive compatible method to elicit the choices, as either one could end up being relevant for the respondent’s payoff. After survey responses were collected, the random matching of players was done. As explained to the subjects, we randomly selected five pairs to actually be paid, based on the combination of their choices. Payments were distributed along with the payments for participating in the survey. Subjects were reminded in the instructions that the agency did not know the content of the survey. 13 traits, are balanced across the treatment and control groups. None of the variables are significantly different individually, between treatment and control, in a Probit regression of the treatment dummy on all observables, and the variables are also not significant according to a joint test (Chi-square; p < 0.59). We also check robustness of the treatment estimates to including worker fixed effects, which control for all time invariant worker traits. 4 Behavioral Predictions In this section we briefly discuss the predictions of three types of models: A simple canonical model, a modified canonical model that includes fatigue effects, and a model that includes crowding out of non-monetary motivations. Predictions are standard comparative statics exercises so we relegate proofs to the online appendix. For simplicity all models have just two periods, and workers maximize utility by choosing effort levels for each period.12 In period 1 control and treatment workers get a base wage w, but treatment workers also have a performance pay rate of z. In period 2 both groups of workers just get w. In all models workers have convex costs of effort in each period, ct (et ), but we introduce a more complex effort cost function with fatigue spillovers in Model (2). Although reputation concerns were likely rather weak in our work setting, we allow for reputation concerns in a reduced form way, to allow canonical models a chance to predict non-zero effort in periods without performance pay. We assume that in each period there is a constant probability that managers observe a worker, and choosing higher effort means that the manager will be more impressed, and more likely to re-hire the worker in the future, conditional on the worker being observed. The product of the probability of observation, and the benefit of impressing the manager, is denoted by p(et ), with p(·) increasing and concave. In all models we assume that the marginal utility of income is constant and unaffected by earnings, because the magnitude of earnings accumulated 12 There is no need for a baseline period as any model predicts the same behavior for treatment and control in such a period. The representation of the worker’s problem as involving one choice variable, effort (leisure), in a time-separable utility function that is linear in income, and convex in effort, is equivalent, along the optimal path, to a standard model of inter-temporal labor supply in which a worker has two choice variables, consumption and effort. Intuitively, the maximization problem in two variables can be reduced to a single variable problem by substitution of the first order condition for consumption; the convexity of effort costs in the resulting condition follows from concavity of utility in consumption (see, e.g., Browning et al, 1985; Fehr and Goette, 2007). 14 during a few hours of work is trivial relative to lifetime income. We normalize the marginal utility of income to 1. Model (1): Canonical model: In the simplest canonical model treatment group and control group workers maximize utility functions VT and VC : VT = z · e1 + w + p(e1 ) − c(e1 ) + w + p(e2 ) − c(e2 ) (1) VC = w + p(e1 ) − c(e1 ) + w + p(e2 ) − c(e2 ). (2) Proposition 1: In a canonical model treatment group workers work harder than control when performance pay is present, but choose the same effort level once performance pay is removed. It is straightforward to see that treatment workers have a higher optimal effort in period 1 than control workers, because performance pay increases the marginal benefit of effort. In period 2, however, the maximization problem for treatment and control workers, and thus optimal effort, is identical. Note that because the marginal utility of income is assumed to be unaffected by piece rate earnings, treatment workers have no greater taste for leisure in period 2 than control workers (no “standard inter-temporal substitution” effect). Model (2): Model with fatigue spillovers: Next, we modify the canonical model to allow for fatigue. In quite general terms, fatigue can be though of as a stock that increases the marginal cost of effort. The stock should be higher if the worker chose high effort in the previous period, and lower or zero if the worker rested instead. Specifically, we assume the same convex cost function as in the canonical model, except that the cost of effort in period 2 also depends on k, a fatigue stock: c = c(e2 , k). We assume ∂c(e2 ,k) ∂e2 ∂k > 0, so that the marginal cost of effort in period 2 is increasing in the fatigue stock. We first consider a case with no rest break between period 1 and period 2, so that higher period 1 effort increases the fatigue stock for period 2: 15 ∂k(e1 ) ∂e1 > 0. Treatment and control workers maximize the following: VT = ze1 + w + p(e1 ) − c(e1 ) + w + p(e2 ) − c(e2 , k(e1 )) (3) VC = w + p(e1 ) − c(e1 ) + w + p(e2 ) − c(e2 , k(e1 )). (4) Proposition 2: In the case of fatigue spillovers and no rest break, if treatment group workers exert more effort than control while performance pay is present, they reduce effort relative to control once performance pay is removed. Workers are assumed to be forward looking and take into account that period 1 effort makes it harder to exert effort in period 2. If financial incentives are strong enough, however, it will make sense for treatment workers to increase period 1 effort relative to control workers, even though this makes it harder to exert effort later on.13 In this case it is clear that treatment workers have lower optimal effort than control workers in period 2, because their marginal cost of effort is higher in period 2. If there is a sufficient rest break between period 1 and period 2, however, higher effort in period 1 for treatment workers need not imply lower effort than control in period 2. Sufficient means a rest break long enough to reduce the fatigue stock to zero by the beginning of period 2. In this case k(e1 ) = 0 and ∂k ∂e1 = 0 and we have the following proposition. Proposition 3: With fatigue spillovers, but also a sufficient rest break following the performance pay episode, treatment group workers may exert more effort than control when performance pay is present, but the same as control after performance pay is removed. With a sufficient rest break the cost of effort in period 2 is just c(e2 ), and the optimization problem facing the worker is the same as in the canonical model. Thus, the model predicts equal effort for treatment and control group workers in period 2. It is an empirical question whether the work task had significant fatigue spillovers and whether the rest break was long enough to eliminate any fatigue stock built up due to extra effort under performance pay. We investigate this question is various ways in the analysis.14 13 The marginal benefit of higher effort in period 1 needs to offset the marginal cost in period 1 as well as the increased marginal cost in period 2. 14 The model also predicts a downward sloping effort profile for both control and treatment workers. As 16 Model (3): Model with negative impact of performance pay on non-monetary motivation: In this model we introduce non-monetary motivations to work hard in a reduced form way, including an additional term in the utility function, θ, which increases the marginal utility of effort. This could arise for various reasons, for example because the task is enjoyable or because working hard satisfies a social norm. To capture the crowding out hypothesis, we assume that non-monetary motivation is lower if the worker is experiencing, or has recently experienced, performance pay: ∂θt ∂z < 0. Workers maximize the following: VT = θ(z > 0)e1 + ze1 + w + p(e1 ) − c(e1 ) + θ(z > 0)e2 + w + p(e2 ) − c(e2 ) (5) VC = θ(0)e1 + w + p(e1 ) − c(e1 ) + θ(0)e2 + w + p(e2 ) − c(e2 ) (6) t Proposition 4: Given − ∂θ ∂z < 1, treatment group workers exert more effort than control while performance pay is present. After performance pay is removed, effort of treatment workers is unambiguously lower than for control. Intuitively, non-monetary motivations are lower in period 1 if there is performance pay. This works against a positive effect of financial incentives, but if incentives are strong enough, output will still increase in period 1. A necessary condition for an increase in pet riod 1 effort is that the reduction in period 1 non-monetary motivation, − ∂θ ∂z , is less than the marginal utility of income, which in this case is set equal to 1. In period 2, the reduction in non-monetary motivation caused by the previous experience of performance pay unambiguously lowers effort of treatment group workers relative to control group workers, because unlike in period 1 there are no offsetting financial incentives. There are alternative ways to model the interaction of performance pay with worker psychology, which would lead to similar predictions to Model (3). Our focus is not on disentangling subtly different psychological mechanisms that generate such predictions, but rather checking whether shown in Goette and Huffman (2006), however, fatigue effects do become more complex in a model with more than two periods. For example, forward looking workers might exhibit a u-shaped effort profile if there are three or more periods: at the beginning of the day there is no fatigue stock, so marginal cost is low, and the worker puts in some extra effort; during the middle of the work day the worker paces him or herself; before a rest break, or before the end of the workday, the worker might increase effort again, knowing that the future rest period wipes out the consequences in terms of accumulated fatigue stock. We abstract away from such effects in the analysis, focusing on more basic predictions of the fatigue model, e.g., that if treatment workers have higher effort in a given period, they should exert less effort in the next effort than control workers, all else equal. 17 the basic prediction of the crowding out hypothesis from psychology, lower output for treatment workers in post-incentive periods, is is born out in the data.15 5 Results 5.1 Treatment comparisons with raw data on total sign-ups We begin the analysis with some simple calculations. The raw data on sign-ups are shown in Panel (a) of Table 2, and Panel (b) calculates treatment differences in percentage and absolute terms. We see in Panel (b) that that output was somewhat lower for treatment than control in the baseline period, by 18 percent. In the hour when incentives were introduced, however, output was 94 percent higher for treatment than control. In the hour immediately after the removal of incentives, output in treatment dropped to the same level as control. For the final two hours, output was substantially lower in treatment than control, by about 74 percent and 65 percent, respectively. The same qualitative features emerge looking at absolute differences in hourly output: total output was substantially higher in treatment than control while incentives were active, but substantially lower in the post-incentive hours, particularly for the final two hours. The calculations so far treat the modest baseline difference as just an idiosyncratic shock, but a more conservative approach is to treat the whole difference as a time invariant difference in productivity, and calculate the treatment comparison as a difference-indifference. In Panel (c) of Table 2 we first do the difference-in-difference in percentage terms, subtracting the percent difference in baseline from the percent difference in later hours. The basic results are unchanged with this normalization. The treatment effect while 15 To pick one example, one could assume that workers have an “output target,” because they feel that they “owe” the employer a certain amount of output due to social norms to do a decent job. If performance pay causes treatment workers to reach the target earlier in the festival (requiring that output targets are somewhat “sticky” and do not adjust to the circumstance of receiving performance pay), then they might have reduced psychological motivations to keep working hard later in the festival, relative to control. This would lead to similar predictions as as Model (3), but due to a subtly different mechanism: rather than reducing non-monetary motivations directly, performance pay causes workers to reach psychologicallymotivating goals more quickly. A related type of model, discussed in some previous research, involves workers in a piece rate setting having a psychological motivation (loss aversion) not to fall short of a daily “income target” (e.g., Camerer et al., 1997; Koszegi and Rabin, 2006). Our setting is different because workers have a fixed wage at all times except during the incentive hour. This means that in post-incentive hours effort did not translate into progress towards an income target, and so income target related motives could not have affected the marginal incentives to exert effort. This means that income targeting cannot explain a difference in output in the post-incentive hours for treatment and control, similar to the canonical model. 18 incentives were active grows to 112 percent. The difference for the first post-incentive hour becomes modestly positive, 24 percent. In the final two hours there are still substantial negative differences, of 57 and 48 percent, respectively. On average over the three postincentive hours, output was lower by 27 percent for the treatment group. An alternative way to do the difference-in-difference is to subtract the absolute difference in baseline from the absolute differences in later hours. One issue with this approach is a floor effect: because treatment starts from a lower level in baseline, there is simply “less room” for an absolute fall in output. Thus, any absolute reduction in output for treatment versus control after the normalization is a lower bound, compared to what would be expected had treatment and control had more similar outputs in baseline. The normalization in percentage terms avoids such floor effects. As shown in Panel (c) of Table 2, doing the difference-in-difference in absolute terms leads to a difference of 77 sign-ups during the incentive hour. In the first post-incentive hour, we have 8 more sign-ups per worker for treatment relative to control. Note that for this latter calculation it is important to subtract only the half of the baseline difference in sign-ups, to account for the fact that half the number of workers were on the job compared to in the baseline hour. In the final two hours there are negative differences, of 17 and 3 sign-ups, respectively, between treatment and control. On average over the three postincentive hours output was lower by 4 sign-ups per hour for treatment than control. To put this amount in perspective, the drop implies a widening of the baseline gap in output by 25 percent. The treatment differences in output for post-incentive hours are not consistent with the canonical model considered in the predictions section, Model (1), since treatment and control workers faced identical monetary incentives during those times. We turn next to assessing the statistical significance of the treatment differences. 5.2 Econometric models In this section we use our data on the hourly output of individual workers to estimate econometric models, and assess the statistical significance of treatment differences. 19 20 -12 -0.18 56 68 77 1.12 65 0.94 134 69 Incentive hour -0.57 -0.27 -17 -4 8† -29 -14 -0.74 -0.44 10 39 15:00-16:00 0.24 2 0.07 32 30 14:00-15:00 -3 -0.48 -15 -0.65 8 23 16:00-17:00 −Ct Notes: ∗ Percent difference is TtC . † Half of the workers were on break at any given time during 14:00-15:00 so we subtract only half of the absolute t difference in baseline sign-ups when normalizing the 14:00-15:00 output difference. Abs. difference in t - difference in baseline Ave. for post-incentive hours (c): T vs. C normalizing by baseline % difference in t - %difference in baseline Ave. for post-incentive hours Absolute difference in outputs Ave. for post-incentive hours (b): T vs. C Percent difference in outputs∗ Ave. for post-incentive hours (a) Total output by hour T C Baseline Table 2: Raw data on total sign-ups by hour and calculations of treatment differences 5.2.1 Estimating the treatment effect For the estimations we aggregate worker sign-ups to the hour. Aggregating reduces noise in the effort measure and an hourly basis is natural given the structure of the experiment. One minor complication is that the baseline period was slightly longer than one hour; for the estimation, we just attribute the sign-ups generated in the brief period before festival started to the baseline hour, applying the same procedure to treatment and control.16 Our preferred estimation method is negative binomial regression, because this deals econometrically with two features of the data: (1) sign-ups are count data, i.e., data that can only take on integer values and cannot be negative; (2) the distribution of sign-ups is skewed with many observations of zero hourly sign-ups. Negative binomial regression models a Poisson like process in which “success” is the occurrence of a sign-up, but relaxes some restrictive assumptions of a standard Poisson regression.17 The negative binomial distribution is a good fit for the empirical distribution as shown in Figure 3. Coefficients from this estimation approach are interpreted in percentage terms. The estimates thus correspond to comparing treatment and control in percentage terms, similar to Table 2 except that the regression analysis is at the worker level rather than in terms of total output. We also check that the estimation results are robust to using alternative countdata methods: we show results for zero-inflated Poisson, but results are similar with standard Poisson, or zero-inflated negative binomial.18 We also estimate the treatment effect in terms of absolute levels rather than percentages, mirroring the approach in Table 2. We run Tobit regressions, where the coefficients give the absolute change in sign-ups for a change in an independent variable, but account econometrically for the mass of the distribution at zero. 16 There were only 13 additional sign-ups generated in this previous time interval. The procedure tends to “inflate” performance in the baseline slightly, but it is applied to treatment and control in the same way and thus does not affect the treatment comparison. We find similar results if we instead include an additional time category for each worker, capturing the sign-ups before festival start. 17 While standard Poisson regression imposes the assumption that the mean of the dependent variable equals the standard deviation, negative binomial allows for a variance that is greater than the mean, matching our skewed distribution with a mass at zero and a long right tail. 18 Zero-inflated Poisson is a Poisson regression but with added flexibility to account for the frequency of zeros in the data. In the absence of variables that plausibly affect having positive sign-ups, but not number of sign-ups, the equation for zero sign-ups is based on a constant term. Zero-inflated negative binomial is similar, modeling zeros separately from non-zero observations. 21 Figure 3: Empirical distribution of total worker sign-ups vs. fitted negative binomial distribution 0 .1 fraction .2 .3 .4 .5 Empirical distribution 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Count of hourly sign−ups 0 .1 Fraction .2 .3 .4 .5 Fitted negative binomial distribution 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Count of hourly sign−ups Our main treatment estimation is based on the following regression model: sit = γ1 h1 + ... + γ5 h5 + φ1 h1 · T + ... + φ5 h5 · T + it (7) The dependent variable sit is the number (count) of sign-ups for worker i in hour t. There is no constant term. The variables h1 · · · h5 are dummy variables for work hours 1 through 5, and the variables h1 · T · · · h5 · T are interaction terms between hour and a treatment dummy T . The φ coefficients, on the interaction terms, show the effect of interest: the percent (absolute in the case of Tobit) difference in sign-ups in output per worker for treatment versus control for each hour of the festival. Standard errors in this and all other regressions are robust, and corrected to allow for arbitrary correlation of the error term within worker. To estimate the difference-in-difference version of the treatment effect, we run the following regression: sit = β + αC + γ2 h2 + ... + γ5 h5 + φ2 h2 · T + ... + φ5 h5 · T + it (8) With negative binomial and zero-inflated Poisson this corresponds to normalizing by the percent difference in baseline outputs, and with Tobit it involves normalizing by the absolute difference in baseline outputs. The dependent variable is again the count of worker 22 sign-ups in an hour. The regression now includes a constant term, β, and omits dummy variables for the baseline period. The dummy variable C shows the difference in baseline sign-ups for control versus treatment. The φ coefficients on the interaction terms are again the coefficients of interest, but with baseline as an omitted category these now show the difference in sign-ups between treatment and control workers in hours 2 through 5 normalised by the baseline difference. We also check robustness to including worker fixed effects that control for all time invariant worker traits. 5.2.2 Treatment estimates Column (1) of Table 3 shows the results of our negative binomial regression. The coefficients on the interaction terms show that the average worker has lower output in treatment than control during the baseline period, but the difference is not statistically significant (p < 0.48). Output is significantly higher for treatment than control, however, in the hour when incentives were active (p < 0.06). Output is not significantly different in the first post-incentive hour, but for each of the final two hours output in treatment is significantly lower than in control (p < 0.01; p < 0.09). In Column (2) we show that results are similar using an alternative count-data estimation method, the zero-inflated Poisson regression. Column (3) of Table 3 reports results of a Tobit regression, which compares treatment and control in terms of absolute levels. To reflect the fact that only half the workers were working at any given time during the first post-incentive hour, we double each worker’s output for the break hour before running the regression. This has no impact on the qualitative results or statistical significance, but means that the coefficient for the break hour more accurately reflects the absolute magnitude of output per worker in that hour.19 The coefficients on the interaction terms show that the positive difference in the incentive hour is not quite statistically significant (p < 0.11). The drop in output for the fourth hour is statistically significant (p < 0.04), and the negative coefficients for the fourth and fifth hour are jointly significant (Chi-square test, p < 0.09). 19 We do not actually observe the counterfactual output with double the number of workers, but doubling observed output seems like a reasonable assumption. 23 24 Estimation method Worker fixed effects Observations Constant Control 16:00-17:00 15:00-16:00 N. Bin. No 195 (2) -0.25 (0.35) 0.61* (0.33) 0.01 (0.45) -1.41*** (0.52) -1.11* (0.64) 1.28*** (0.15) 1.29*** (0.24) 0.46* (0.25) 0.72** (0.33) 0.19 (0.42) Z. In. Poi.. No 195 (3) 0.20 (0.30) 0.60** (0.29) 0.50 (0.39) -1.52*** (0.50) -1.40* (0.72) 1.33*** (0.14) 1.52*** (0.22) 0.83*** (0.22) 1.33*** (0.29) 1.04** (0.47) Tobit No 195 (0.31) (4) -2.20 (1.48) 3.25 (2.03) -1.05 (2.33) -3.72** (1.81) -2.94 (2.10) 3.40*** (0.64) 2.89** (1.15) 1.75 (1.31) -0.04 (1.34) -2.12 (1.58) N. Bin. No 195 0.01 (0.25) -0.82*** (0.27) -0.56 (0.34) -1.08** (0.44) 0.25 (0.35) 1.03*** (0.14) 0.86** (0.36) 0.26 (0.39) -1.17*** (0.43) -0.86 (0.57) (5) N. Bin. Yes 195 -0.13 (0.29) -0.91*** (0.30) -0.73** (0.36) -1.19*** (0.45) -1.93*** (0.20) 2.42*** (0.16) 1.24*** (0.39) 0.33 (0.42) -0.97** (0.48) -0.73 (0.61) (6) Z. In. Poi.. Yes 195 0.04 (0.27) -0.66** (0.27) -0.40 (0.34) -0.50 (0.65) -1.56*** (0.24) 2.11*** (0.92) 0.76** (0.39) 0.27 (0.38) -1.32*** (0.41) -1.30* (0.71) (7) Tobit Yes 195 -0.52 (1.11) -1.48 (1.27) -3.22** (1.29) -4.86*** (1.45) -9.27*** (1.12) 10.42*** 5.93*** (2.01) 1.04 (2.09) -3.39* (2.04) -2.62 (2.36) standard errors in parentheses, adjusted for clustering on worker. ***, **,* indicate significance at 1-, 5-, and 10-percent level, respectively. and 2), the coefficients for hour dummies give the log of sign-ups. Coefficients for Tobit are in terms of absolute level of sign-ups. Robust change in the dependent variable. With the constant omitted from the negative binomial and zero-inflated Poisson models (Columns 1 Notes: Coefficients for Negative Binomial and Zero-Inflated Poisson give the percent change in the frequency of sign-ups for a one unit Post-incentive hours 13:00-14:00 Incentive period 14:00-15:00 12:00-13:00 16:00-17:00 * Treatment 15:00-16:00 * Treatment Baseline period Post-incentive hours 13:00-14:00 * Treatment Incentive period 14:00-15:00 * Treatment (1) 12:00-13:00 * Treatment Baseline period Table 3: Econometric models for the treatment effect In Columns (4) to (7) we investigate the difference-in-difference estimates. Column (4) shows the normalized treatment effect without worker fixed effects. The treatment difference is statistically significant for the incentive hour (p < 0.02). The positive output difference in the first post-incentive hour is not statistically significant (p < 0.51). The drop for treatment in the fourth hour is significant (p < 0.01), and the coefficients for the fourth and fifth hour are jointly significant (Chi-square test, p < 0.02). In Columns (5) to (6) we see that results are similar using worker fixed effects or zero-inflated Poisson. In Column (7) we use Tobit to estimate the treatment difference subtracting the absolute difference in baseline. By adjusting for censoring at zero, we are addressing the floor effect issue econometrically. The regression includes worker fixed effects. We see from the regression estimates that the positive normalized difference in the incentive hour is statistically significant (p < 0.01), whereas the modest positive difference in the first post-incentive hour is not significant. There is again a large negative difference for the fourth hour (p < 0.09). In summary, we find a statistically significant positive effect of incentives while they are active, and significant negative effects for the last two post-incentive hours. We turn below to explanations based on fatigue effects or changes in non-monetary motivations. 5.3 Investigating the role of fatigue As discussed in the behavioral predictions fatigue spillovers might be able to explain a reduction in post-incentive output for treatment workers relative to control, assuming the rest break was not sufficient to eliminate any differences in fatigue. The finding that treatment had the same or greater output than control in the first post-incentive hour is seemingly inconsistent with the fatigue model. Treatment workers had higher output and effort during the incentive hour than control workers. In the context of a model with fatigue effects, the treatment workers who kept working in the first half-hour after the incentive period should have had a higher marginal cost of effort than their counterparts in the control group, and thus lower effort. The observed behavior is contrary to this prediction. We investigated two possible ways that the fatigue model in the predictions section might be augmented to explain this anomaly. One is that workers might have made a 25 mistake in keeping track of time, and thought that they still had the bonus incentive for the first part of the half hour. Another is that workers could have been in conversations with customers in the last few minutes of the incentive hour, and the resulting output was not realized until the early part of the next half hour. Both of these mechanisms would imply a particularly large “spike” in output right after the 14:00 cutoff. We find little support for these explanations, based on looking at worker output in five minute intervals during the break hour. There is a spike in output in the first five minutes for treatment workers, but spikes of exactly the same size for control workers after 15 minutes and 30 minutes. Also, there are spikes for treatment workers of similar size at 35 and 40 minutes. Thus, the disaggregated output data do not support an explanation in which there is something special about the first five to ten minutes of the hour for treatment workers. In the follow-up questionnaire we also find that 95 percent of treatment workers knew exactly when it was 14:00 and the incentive hour ended. Thus, it seems that treatment workers exerted similar or perhaps even more effort than control workers in the hour following incentives, which seems inconsistent with treatment workers being more fatigued. If we observed that workers who achieved an especially large number sign-ups during the incentive hour had an especially large drop in output later in the festival, this would be consistent with the fatigue explanation.20 We estimated a regression explaining hourly sign-ups in the last three hours of the festival relative to baseline, for treatment group workers. We included a dummy variable for greater than median sign-ups during the incentive hour, dummy variables for hour 3, 4, and 5, and interactions between the hour dummies and the indicator for greater than median sign-ups. The interaction terms are all far from significant, indicating that workers with a particularly large number of sign-ups during the incentive hour did not have an especially large drop later. Thus, we do not find evidence that a higher number of sign-ups generated strong fatigue spillovers. Typically effort costs are unobserved and thus are a degree of freedom in economic models, but our study is unusual in that we have some direct measures that speak to the question of whether fatigue spillovers were an important factor. Specifically, in the questionnaire we asked workers to rate their level of fatigue on a five-point scale, for 1:00, 3:00, and 5:00 during the festival. There was no significant difference in fatigue between the two groups at any of the three times (Mann-Whitney; p < 0.19, p < 0.71, p < 0.36). If 20 A particularly high number of sign-ups could also partially reflect luck rather than effort. 26 anything, treatment group workers start out more tired than control, before the incentive hour begins, and then have slightly lower levels of fatigue than the control group at later points. Notably, both treatment and control do report increasing fatigue levels over the course of the festival. These findings are consistent with workers becoming tired mainly due to passing hours of the day rather than the number of sign-ups achieved. The questionnaire also asked treatment group workers a question about fatigue and performance pay: “After having rested during the lunch break, did you still feel tired (mentally or physically) from your work during the time of the $5 bonus?” Almost all workers, 85 percent, answered either: “N.A. because I never got tired from the work I did between 1 pm and 2 pm.” or else ”The lunch break was sufficient for me to feel refreshed.” Only 15 percent chose “I was still tired from the work I did during the time of the $5 bonus (between 1 pm and 2 pm), even after the lunch break.” These survey responses provide evidence that almost all workers were either not fatigued at all from the incentive hour, or else found the rest break sufficient to feel refreshed, and no longer tired, from effort during the incentive hour. As another check, we simply exclude all workers who mentioned still being tired after the rest break, and re-estimate our difference-in-difference econometric model of the treatment effect using the remaining non-fatigued workers. The results are essentially unchanged relative to using the whole sample. The increase in output during the incentive hour, and the subsequent large drop in the fifth hour, are both still large and statistically significant (negative binomial regression; p < 0.06; p < 0.02). The drop for the final two hours is also still jointly significant (Chi-square test; p < 0.06).21 5.4 Investigating changes in non-monetary motivations The treatment differences for post-incentive hours are not consistent with the canonical model, and fatigue effects do not appear to be a leading explanation, so we turn to the hypothesis that performance pay affects non-monetary motivations of workers. We used the questionnaire to elicit direct (albeit self-reported) evidence on whether performance pay affected non-monetary motivations. Specifically, the questionnaire asked treatment workers about a possible link between performance pay and task enjoyment 21 The coefficients and standard errors for the interaction terms 13:00-14:00*Treatment . . . 16:0017:00*Treatment are as follows: 0.79* (0.42); 0.10 (0.38); -1.16** (0.49); -0.79 (0.63). 27 in the post-incentive period: “How did the experience of getting a $5 bonus per text (between 1 and 2 pm) affect your enjoyment of the work later in the day?” There were four possible response categories, relating to fatigue, reduced non-monetary motivations, no impact, and increased non-monetary motivations.22 We included fatigue as one of the response categories to distinguish this form of discomfort from other sources of reduced task enjoyment.23 We find that 53 percent of the workers found working less fun in the post-incentive hours due to the experience of the bonus. Interestingly, however, about 21 percent said that the experience of performance pay actually increased enjoyment of work in the postincentive period. Only 16 percent of workers said that they experienced fatigue, and 10 percent said the bonus had no impact on subsequent task enjoyment. These survey responses are consistent with the view that performance pay affects the non-monetary motivations of workers. The response of the majority is in line with the hypothesis that performance pay can have negative psychological effects, and is consistent with the overall lower output for treatment than control during the post-incentive ours. On the other hand, the survey responses also provide an indication that heterogeneity may be important: For a substantial minority, a more appropriate model might be one in which non-monetary motivations are increased by performance pay, i.e., ∂θ ∂z > 0. The time profile for aggregate output is also not fully consistent with the crowding out hypothesis. Specifically, if treatment workers have a reduction in non-monetary motivations as soon as performance pay is introduced, they should exhibit lower output than control group workers already in the first hour after performance pay is removed. Instead, we observe similar or even greater output for treatment relative to control in the first post-incentive hour, with the large drop in output for treatment workers occurring in 22 The exact wording for the response categories is as follows: 1. “Because I worked a lot harder during the hour with the bonus, I was very tired and found working later on painful due to fatigue.” 2. “Getting the bonus made the work seem less fun later on, when there was no bonus.” 3. “Getting the bonus had no impact on my enjoyment of the work later on.” 4. “Getting the bonus made the work seem more fun later on even though there was no bonus.” 23 The question focused on motivations in the post-incentive period because it is difficult to ask workers to disentangle monetary and non-monetary motivations while incentives are active. We did not ask how motivation in post-incentive hours changed relative to motivation in baseline, in light of the intervening performance pay, to avoid the question becoming too complex. 28 the final two hours. Thus, while performance pay may affect non-monetary motivations, the effect is more complex than the standard crowding out hypothesis would suggest. One explanation could be that the psychological effects of experiencing performance pay take some time to emerge. The potential presence of individual heterogeneity, however, makes it more complex to interpret the shape of the aggregate output profile. We discuss possible composition effects below. To shed further light on the role of non-monetary motivations, and potential heterogeneity, we took three traits selected ex ante as possible carriers of non-monetary motivations and interacted these with the treatment effect. We selected the personality trait conscientiousness due to evidence that it explains working hard on laboratory tasks even in the absence of incentives, and because it tends to be correlated with positive labor market outcomes (Judge et al., 1999; Segal, 2006). We took the personality trait of extraversion as a potential measure of intrinsic task enjoyment, as extraverts report liking to do things like “talk to strangers at parties,” which is analogous to approaching strangers at the festival.24 We selected positive reciprocity due to laboratory evidence that reciprocal subjects tend to reward generous payments payments even if rewarding is costly for them (see Fehr and Falk, 2002). In the incentivized trust game conducted with our questionnaire, respondents had a binary choice to return a favor or not; we use this as a binary indicator of positively reciprocal tendencies. We estimated an econometric model where we interacted the difference-in-difference with worker traits.25 Figure 4 provides an easy way to see the key results. Each panel plots coefficients showing how moving along the dimension of a given trait changes the treatment difference (we report the regression results underlying the figure in the Appendix). To facilitate comparison of coefficients across traits we use standardized measures of the traits; coefficients give the impact of a one standard deviation change in the trait on the normalized treatment effect, except reciprocity which is the dichotomous change. The top panel of Figure 4 shows no change in the treatment effect for the incentive 24 The standard personality inventory we used involved ten items for each of the five personality traits. Respondents indicated how well the item described them as a person using a five point scale. The exact wording of all items is available at http : //ipip.ori.org/N ew IP IP − 50 − item − scale.htm. 25 Specifically, we estimate the following regression equation: sit = β + αC + γ2 h2 + ... + γ5 h5 + φ2 h2 · T + ... + φ5 h5 · T + θ2k h2 · T raitk + ... + θ5k h5 · T raitk + α2k h2 · T · T raitk + ... + α5k h5 · T · T raitk + T raitk + it . The αtk ’s are the coefficients of interest, showing whether there is a statistically significant change in the difference-in-difference at a given time t as trait k changes. Each panel in the figure shows results from a regression including the three traits simultaneously, and plots the α coefficients. 29 hour, for conscientious workers, but statistically significant change in the treatment difference for post-incentive hours in the negative direction, i.e., an especially strong drop in output (p < 0.001; p < 0.001; p < 0.002). For extraverted and reciprocal types the treatment effect is significantly different. Among reciprocal workers there was a significantly stronger positive response during the incentive period (p < 0.001). There was also a significant change in the treatment difference in the positive direction for the fourth hour (where the aggregate profile shows the strongest output drop; p < 0.001). Among more extraverted workers there was a consistent, statistically significant change of the treatment effect in the positive direction for all post-incentive hours (p < 0.001; p < 0.002; p < 0.001). Thus, we see that the treatment difference changes in different directions for extraverted and reciprocal workers compared to conscientious workers. The other panels of Figure 4 show that various other traits had little bearing on the response to the treatment. We find little relationship of the treatment effect to other personality traits in the Big Five. The treatment also did not interact with non-psychological factors corresponding roughly to motives in the canonical or fatigue model, namely selfreported reputation concerns, or self-reported fatigue.26 We also asked workers if they experienced a process of learning by doing, to check whether positive effects of the treatment in post-incentive hours might reflect a greater accumulation of skills for treatment workers, but we see little evidence of this in Figure 4.27 Although the prediction that conscientiousness, extraversion, and reciprocity would be particularly important was formed ex ante, one might still be concerned that with nine traits and four coefficients for each trait some coefficients could be statistically significant just due to chance. We performed a conservative Bonferroni correction for multiple hypothesis testing and find that almost all coefficients for conscientiousness, reciprocity, and extraversion remain statistically significant (none of the coefficients for the other traits are statistically significant).28 26 We measured reputation concerns by asking: “Did you try to get a lot of people to send texts to your number, because you thought this would give you a good reputation with [the agency name here], and make it likely that you would be hired again in the future?” Answers were on a six-point scale. We elicited self-reported fatigue at 15:00 as an indicator for the fatigue level of the worker (this question is described above). 27 To capture learning by doing we asked how well the following statement captured the worker’s experience at the festival: “There was a learning process for me: Over time, I learned how to be more successful in convincing people to send texts.” Answers were on a six-point scale. 28 Using the 36 coefficients in Figure 4 as the number of comparisons, the adjusted threshold for statistical significance is 0.05/36 = 0.001. Two coefficients, one for conscientiousness and one for extraversion, are no longer significant but are “close” to the threshold (p < 0.002; p < 0.002). 30 −2 Percent change 0 2 4 Figure 4: Change in the normalized treatment effect by worker traits 13:00−14:00 14:00−15:00 15:00−16:00 Time of day Conscientious 16:00−17:00 Extraverted −2 Percent change 0 2 4 Reciprocal 13:00−14:00 14:00−15:00 15:00−16:00 Time of day Intellect 16:00−17:00 Agreeableness −2 Percent change 0 2 4 Emotional stability 13:00−14:00 14:00−15:00 15:00−16:00 Time of day Reputation concerns 16:00−17:00 Learning by doing Fatigue Negative binomial estimates. Coefficients show the impact on the normalized treatment effect of a one s.d. increase in the trait, or dichotomous change in the case of reciprocity. The normalized treatment effect is the effect of the treatment differencing by the baseline difference in output. The questionnaire measured the Big Five personality traits, as well as self-reported reputation concerns, learning by doing, and fatigue as of 15:00 (footnotes in text give exact wordings). Error bars show +/- one standard error. Standard errors adjusted for clustering on worker. 31 Figure 5: Normalized treatment effect by source of non-monetary motivation 0 17 :0 0 16 :0 0− 16 :0 0 0− :0 15 :0 0− 15 :0 0 Time of day 14 13 :0 0− 14 :0 0 17 :0 0 0− :0 16 15 :0 0− 16 :0 0 :0 15 0− :0 14 13 :0 0− 14 :0 0 Percent change −2 −1 0 1 2 Extraverted Percent change −2 −1 0 1 2 Conscientious Time of day 00 00 −1 7: 00 16 : 00 −1 6: 00 −1 5: 15 : 00 14 : 13 : 00 −1 4: 00 Percent change −2 −1 0 1 2 Reciprocal Time of day Negative binomial estimates. Coefficients show the normalized treatment effect associated with a one standard deviation increase in the trait, or the binary change in the case of reciprocity. Error bars show +/- one standard error. S.e. adjusted for clustering on worker. Having seen that the treatment effect changes for conscientious, extraverted, and reciprocal workers, we investigate whether these changes imply qualitatively different treatment effects. Figure 5 shows the difference-in-difference profile taking into account the effect of a high score on a given trait. In this case we focus one trait at a time, and plot the sums of the relevant coefficients (the basic difference-in-difference profile plus the change caused by having a high score on the trait) and appropriately calculated standard errors (we provide the regression results underlying the figure in the Appendix).29 29 We estimate the following regression for each trait: sit = β + αC + γ2 h2 + ... + γ5 h5 + φ2 h2 · T + ... + φ5 h5 · T + θ2 h2 · T raitk + ... + θ5 h5 · T raitk + α2 h2 · T · T raitk + ... + α5 h5 · T · T raitk + T raitk + it . The figure plots φt + αt for each hour, and the standard error for the sum of the random variables. 32 We see in Figure 5 that treatment workers had a statistically significant positive response to the treatment when incentives were active, regardless of whether they were conscientious, extraverted, or reciprocal, although the magnitude was especially large for reciprocal types. For subsequent hours, the stronger negative treatment effect for conscientious types translated into an immediate drop in output, albeit not as strong of a drop as in later hours. Thus, the model of crowding out considered in the predictions section, with an immediate and negative psychological impact of performance pay, provides a relatively better description of behavior for conscientious workers. By contrast, among reciprocal and extraverted workers we see that the differencein-difference profile is qualitatively different from the aggregate profile. The different treatment effects for these types translated into positive output differences for the hour or two after incentives, and no relative drop during later hours. For these workers, behavior is more consistent with performance pay crowding in non-monetary motivations, although the effect seems to dissipate over time once incentives are removed. If the positive treatment differences in post-incentive hours reflect increased non-monetary motivations, we might expect to find that such workers reported stronger motivations. Consistent with this hypothesis, we do find that extraverted workers in the treatment group were significantly more likely to report that the experience of the performance pay made work “more fun” (correlation of 0.40; p < 0.09). There is not a significant relationship between reciprocity and reporting that work became more fun, suggesting that the crowding in associated with reciprocity reflected a different channel besides enhanced task enjoyment. One implication of individual heterogeneity is that the shape of the aggregate output profile might reflect a composition effect. For example, suppose that some workers have a positive psychological response, which is relatively short lived, while others have a negative response that persists: This could lead to an aggregate profile with a delayed drop in output. In summary, the findings are broadly consistent with the view that performance pay may have an impact on workers’ non-monetary motivations: we observe post-incentive treatment differences that are hard to explain with a canonical model or fatigue effects; workers report changes in non-monetary motivations as a result of performance pay;traits selected as carriers of non-monetary motivation are important for explaining responses These show the difference-in-difference profile for treatment vs. control for workers who score high on the trait. 33 to the treatment. The findings are clearly not consistent, however, with a simple model based on the crowding out hypothesis in which the psychological impact of performance pay is uniformly negative. 6 Conclusion The psychology literature and some management textbooks caution that performancecontingent pay may reduce workers’ other, non-monetary motivations to do a good job, but there is little evidence from a real work setting (Kreps, 1997). In a real work setting we implemented the classic design from psychology, which involves the treatment group temporarily receiving performance-based payments. Treatment group workers produced more during incentives, but less after incentives were removed, relative to control. Investigating explanations for the post-incentive treatment differences, we find little support for explanations based on fatigue spillovers. On the other hand, the majority of treatment workers report reduced non-monetary motivation. While our findings are broadly supportive of the hypothesis that performance pay can affect non-monetary motivations in the workplace, our results are not consistent with a simple model where performance pay has uniformly negative effects. Indeed, a substantial minority of treatment workers report enhanced non-monetary motivations as the result of performance pay. Also, different sources of non-monetary motivations, captured by personality traits and social preferences, are associated with qualitatively different treatment effects. These findings add to our understanding of the psychological impact of performance pay, but they also raise a series of questions for future research. The results of this study highlight the importance of future research on the precise mechanisms that underly the psychological effects of performance pay in the workplace. Various theories differ in their assumptions about what performance pay signals to workers – low task enjoyability (Benabou and Tirole, 2003), market rather than social transaction (Heyman and Ariely, 2004), appreciation of hard work (Fehr and Gaechter, 2002) – but little is known empirically about which of these mechanisms is most important. Our findings on heterogeneity suggest that multiple mechanisms may be at work simultaneously, with different types of workers viewing the same incentives in different ways. This complicates the process of designing optimal incentive systems. 34 The interaction of performance pay with worker personality and preferences highlights the value of further research on the role of non-cognitive skills in the workplace. Studying larger workforces would be useful, as would considering a range of different types of work tasks, since the ways that particular personality traits interact with performance pay may depend on the task. For example, while extraversion could lead to positive psychological effects for jobs involving personal interactions, the effect might be different for jobs involving data entry. Our findings also point to the importance or more research on the timing and duration of psychological effects of performance pay. Varying the duration of performance pay, and the length of the post-incentive period, could shed light more light on the time profile of psychological effects. For example, it might turn out that positive psychological effects are more or less long lasting than negative effects, with implications for the long-run impacts of performance pay. More work is also needed on understanding the nature of psychological effects at the time when performance pay is active. The approach in psychology has been to have temporary performance pay, and use behavior after the removal of incentives to shed light on the state of non-monetary motivations. A potential concern with this approach is that the observed behavior could partly reflect a psychological response to removal of incentives (Fehr and Falk, 2002). Our version of the experimental design minimizes such effects by warning workers ahead of time about the removal of performance pay; this should minimize mechanisms such as disappointment. Nevertheless, it is important to think about new types of designs and measures that might shed more light on this question. Evidence of heterogeneity also calls for future work on how self-selection into jobs affects the psychological impact of performance pay. This is important, to better understand the practical implications of the psychological effects of incentives for workplace behavior. For example, if it happened that the types of workers who respond positively to performance pay also self-select into jobs that advertise this type of compensation, this has important implications for the desirability of performance pay from both and employer and employee perspective. Our study randomly assigned workers to performance pay, and recruiting did not emphasize this aspect of compensation. This was necessary to establish the causal impact of performance pay on workplace behavior and to test different models of labor supply. It would be possible, however, to conduct an experiment where workers 35 can self-select into jobs that are identical except that one is known to offer fixed wages and another to offer temporary performance pay. Comparing across jobs would show how behavior in the classic experimental design changes when we add the realism of workers being able to self-select based on the nature of the incentive scheme. Self-selection is likely to be multi-dimensional (Dohmen and Falk, 2011), involving bundles of traits ranging from ability to sources of non-monetary motivations, with potentially fundamental implications for the psychological impact and overall effectiveness of performance pay. References Al-Ubaydli, O., S. Andersen, U. Gneezy, and J. A. List (2014): “Carrots that look like sticks: Toward an understanding of multitasking incentive schemes,” Southern Economic Journal. Babcock, P., K. Bedard, G. Charness, J. Hartman, and H. Royer (2011): “Letting down the team? Evidence of social effects of team incentives,” Discussion paper, National Bureau of Economic Research. Baron, J. N., and D. M. Kreps (1999): Strategic human resources: Frameworks for general managers. Wiley New York. Becker, A., T. Deckers, T. Dohmen, A. Falk, and F. Kosse (2012): “The Relationship Between Economic Preferences and Psychological Personality Measures,” Annual Review of Economics, 4, 453–78. Benabou, R., and J. Tirole (2003): “Intrinsic and extrinsic motivation,” The Review of Economic Studies, 70(3), 489–520. (2006): “Incentives and Prosocial Behavior,” The American Economic Review, pp. 1652–1678. Berg, J., J. Dickhaut, and K. McCabe (1995): “Trust, reciprocity, and social history,” Games and economic behavior, 10(1), 122–142. Borghans, L., B. H. Golsteyn, J. Heckman, and J. E. Humphries (2011): “Identification problems in personality psychology,” Personality and Individual Differences, 51(3), 315–320. Bowles, S., H. Gintis, and M. Osborne (2001): “Incentive-enhancing preferences: Personality, behavior, and earnings,” The American Economic Review, 91(2), 155–158. Browning, M., A. Deaton, and M. Irish (1985): “A profitable approach to labor supply and commodity demands over the life-cycle,” Econometrica: Journal of the Econometric Society, pp. 503–543. Camerer, C., L. Babcock, G. Loewenstein, and R. Thaler (1997): “Labor supply of New York City cabdrivers: One day at a time,” The Quarterly Journal of Economics, 112(2), 407–441. 36 Carneiro, P., and J. Heckman (2003): “Human Capital Policy,” in In Inequality in America: What Role for Human Capital Policy?, ed. by A. Krueger, and J. Heckman, p. 77240, Massachusetts. MIT Press. Carpenter, J. P., and D. Dolifka (2013): “Exploitation aversion: When financial incentives fail to motivate agents,” Discussion paper, IZA Discussion Paper. Charness, G., and U. Gneezy (2009): “Incentives to exercise,” Econometrica, 77(3), 909–931. Cohn, A., E. Fehr, and L. Goette (2014): “Fair wages and effort: Evidence from a field experiment,” forthcoming in Management Science. Cohn, A., E. Fehr, B. Herrmann, and F. Schneider (2014): “Social comparison and effort provision: evidence from a field experiment,” forthcoming in Journal of the European Economic Association. Deci, E. (1971): “Effects of externally mediated rewards on intrinsic motivation,” Journal of personality and Social Psychology, 18(1), 105–115. Deci, E., R. Koestner, and R. Ryan (1999): “A meta-analytic review of experiments examining the effects of extrinsic rewards on intrinsic motivation.,” Psychological bulletin, 125(6), 627. Dohmen, T., and A. Falk (2011): “Performance pay and multidimensional sorting: Productivity, preferences, and gender,” The American Economic Review, pp. 556–590. Fehr, E., and A. Falk (2002): “Psychological foundations of incentives,” European Economic Review, 46(4), 687–724. Fehr, E., and L. Goette (2007): “Do Workers Work More if Wages Are High? Evidence from a Randomized Field Experiment,” The American Economic Review, 97(1), 298– 317. Gneezy, U., and J. List (2006): “Putting behavioral economics to work: Testing for gift exchange in labor markets using field experiments,” Econometrica, 74(5), 1365–1384. Gneezy, U., S. Meier, and P. Rey-Biel (2011): “When and why incentives (don’t) work to modify behavior,” The Journal of Economic Perspectives, pp. 191–209. Gneezy, U., and P. Rey-Biel (2014): “On the Relative Efficiency of Performance Pay and Noncontingent Incentives,” Journal of the European Economic Association, 12(1), 62–72. Gneezy, U., and A. Rustichini (2000a): “A Fine Is a Price,” J. Legal Stud., 29, 1. (2000b): “Pay enough or don’t pay at all,” The Quarterly Journal of Economics, 115(3), 791–810. Goette, L., and D. Huffman (2006): “Incentives and the Allocation of Effort Over Time: The Joint Role of Affective and Cognitive Decision Making,” IZA Discussion Paper, No. 2400. 37 Goldberg, J. (2013): “Kwacha Gonna Do? Experimental Evidence about Labor Supply in Rural Malawi,” University of Maryland Working Paper. Heckman, J. (2000): “Policies to foster human capital,” Research in Economics, 54, 3–56. Heckman, J., and Y. Rubinstein (2001): “The importance of noncognitive skills: Lessons from the GED testing program,” The American Economic Review, 91(2), 145– 149. Heyman, J., and D. Ariely (2004): “Effort for payment a tale of two markets,” Psychological science, 15(11), 787–793. Jordan, P. (1986): “Effects of an extrinsic reward on intrinsic motivation: A field experiment,” The Academy of Management Journal, 29(2), 405–412. Judge, T., C. Higgins, C. Thoresen, and M. Barrick (1999): “The big five personality traits, general mental ability, and career success across the life span,” Personnel psychology, 52(3), 621–652. ˝ szegi, B., and M. Rabin (2006): “A model of reference-dependent preferences,” The Ko Quarterly Journal of Economics, 121(4), 1133–1165. Kreps, D. (1997): “Intrinsic motivation and extrinsic incentives,” The American Economic Review, 87(2), 359–364. ´chal, and C. Puppe (2008): “The currency of reciprocity: GiftKube, S., M. Mare exchange in the workplace,” forthcoming in American Economic Review. Kuhn, P., and C. Weinberger (2005): “Leadership skills and wages,” Journal of Labor Economics, 23(3), 395–436. Lazear, E. (2000): “Performance Pay and Productivity,” The American Economic Review, 90(5), 1346–1361. Lepper, M., D. Greene, and R. Nisbett (1973): “Undermining children’s intrinsic interest with extrinsic reward: A test of the” overjustification” hypothesis.,” Journal of Personality and social Psychology, 28(1), 129. Levitt, S. D., and J. A. List (2009): “Field experiments in economics: the past, the present, and the future,” European Economic Review, 53(1), 1–18. Lim, N., M. Ahearne, and S. Ham (2009): “Designing sales contests: Does the prize structure matter?,” Journal of Marketing Research, 46(3), 356–371. Lindqvist, E., and R. Vestman (2011): “The labor market returns to cognitive and noncognitive ability: Evidence from the Swedish enlistment,” American Economic Journal: Applied Economics, 3(1), 101–128. Nagin, D., J. Rebitzer, S. Sanders, and L. Taylor (2002): “Monitoring, Motivation, and Management: The Determinants of Opportunistic Behavior in a Field Experiment,” The American Economic Review, 92(4), 850–873. 38 Paarsch, H., and B. Shearer (1999): “The response of worker effort to piece rates: Evidence from the british columbia tree-planting industry,” Journal of Human Resources, pp. 643–667. Persico, N., A. Postlewaite, and D. Silverman (2004): “The Effect of Adolescent Experience on Labor Market Outcomes: The Case of Height,” Journal of Political Economy, 112(5). Ross, L., and R. E. Nisbett (1991): The person and the situation: Perspectives of social psychology. Mcgraw-Hill Book Company. Segal, C. (2006): “Motivation, Test Scores, and Economic Success,” Job Market Paper, Harvard Business School. Segal, C. (2008): “Classroom behavior,” Journal of Human Resources, 43(4), 783–814. Segal, C. (2012): “Misbehavior, Education, and Labor Market Outcomes,” forthcoming in Journal of the European Economic Association. Shi, L. (2010): “Incentive Effect of Piece-Rate Contracts: Evidence from Two Small Field Experiments,” The BE Journal of Economic Analysis & Policy, 10(1). Staw, B. M., B. J. Calder, R. K. Hess, and L. E. Sandelands (1980): “Intrinsic Motivation and norms about payment1,” Journal of Personality, 48(1), 1–14. Stutzer, A., L. Goette, and M. Zehnder (2011): “Active Decisions and Prosocial Behaviour: a Field Experiment on Blood Donation,” The Economic Journal, 121(556), F476–F493. Titmuss, R. (1970): The Gift Relationship: From Human Blood to Social Policy. New Press. Watanabe, S., and Y. Kanazawa (2009): “A test of a personality-based view of intrinsic motivation,” Japanese Journal of Administrative Science, 22, 117–130. 39 A Appendix: For online publication only A.1 Additional tables Table A1: Regression estimates underlying Figure 4 13:00-14:00 * Treatment * Conscientiousness 14:00-15:00 * Treatment * Conscientiousness 15:00-16:00 * Treatment * Conscientiousness 16:00-17:00 * Treatment * Conscientiousness 13:00-14:00 * Treatment * Extraversion 14:00-15:00 * Treatment * Extraversion 15:00-16:00 * Treatment * Extraversion 16:00-17:00 * Treatment * Extraversion 13:00-14:00 * Treatment * Reciprocal 14:00-15:00 * Treatment * Reciprocal 15:00-16:00 * Treatment * Reciprocal 16:00-17:00 * Treatment * Reciprocal (1) 0.29 (0.34) -2.33*** (0.55) -2.02*** (0.61) -2.18*** (0.72) -0.05 (0.30) 2.06*** (0.44) 2.14*** (0.70) 2.36*** (0.71) 2.32*** (0.62) -0.48 (0.77) 3.75*** (0.95) -0.08 (1.22) (2) 0.12 (0.44) 0.92* (0.53) 0.49 (0.45) -0.35 (0.71) -0.44 (0.40) -0.77 (0.60) -1.48** (0.66) 0.83 (0.80) 0.45 (0.39) 0.01 (0.45) 1.08 (0.85) -0.31 (0.78) (3) 0.43 (0.37) 0.33 (0.56) 0.09 (0.55) -0.42 (0.51) -0.15 (0.33) 0.34 (0.51) -0.70 (0.61) 0.45 (0.55) -0.30 (0.37) 0.54 (0.50) 0.05 (0.49) 0.38 (0.77) Other rhs. variables, coefficients supressed: 13:00-14:00 . . . 16:00-15:00 13:00-14:00 * Treatment . . . 16:00-15:00 * Treatment 13:00-14:00 * Conscientiousness . . . 16:00-15:00 * Conscientiousness 13:00-14:00 * Extraversion. . . 16:00-15:00 * Extraversion 13:00-14:00 * Reciprocal . . . 16:00-15:00 * Reciprocal Conscientiousness Extraversion Reciprocal Constant Estimation method Observations N. Bin. 170 N. Bin. 170 N. Bin. 170 Notes: Negative binomial estimates. All traits are standardized so that a coefficient gives the impact of a one standard deviation change, with the exception of reciprocity which gives the dichotomous change. The sample is all workers who responded to the questionnaire. Robust standard errors in parentheses, adjusted for clustering on worker. ***, **,* indicate significance at 1-, 5-, and 10-percent level, respectively. 40 Table A2: Regression estimates underlying Figure 5 α2 13:00-14:00 * Treatment * Trait α3 14:00-15:00 * Treatment * Trait α4 15:00-16:00 * Treatment * Trait α5 16:00-17:00 * Treatment * Trait φ2 13:00-14:00 * Treatment φ3 14:00-15:00 * Treatment φ4 15:00-16:00 * Treatment φ5 16:00-17:00 * Treatment Conscientiousness (1) 0.36 (0.30) -0.83* (0.49) -0.61 (0.48) -0.73 (0.58) 0.86** (0.37) 0.24 (0.41) -1.08** (0.48) -0.57 (0.53) Extraversion (2) 0.24 (0.30) 0.80** (0.39) 0.63 (0.49) 0.70 (0.59) 0.87** (0.37) 0.33 (0.45) -0.91* (0.53) -0.50 (0.55) Reciprocal (3) 2.32*** (0.55) 0.85 (0.92) 3.69*** (1.26) 0.61 (1.27) -0.26 (0.46) -0.03 (0.59) -2.39*** (0.87) -0.72 (0.80) N. Bin. 170 N. Bin. 170 N. Bin. 170 1.12** (0.51) 1.14* (0.63) -0.28 (0.59) 0.19 (0.79) 2.06*** (0.49) 0.82 (0.70) 1.30 (0.81) -0.11 (0.88) Other rhs. variables, coefficients supressed: 13:00-14:00 . . . 16:00-15:00 13:00-14:00 * Trait . . . 16:00-15:00 * Trait Trait Constant Estimation method Observations Normalized treatment effects plotted in Figure 5 α2 + φ2 1.22*** (0.47) -0.59 (0.65) -1.68** (0.77) -1.30 (0.82) α3 + φ3 α4 + φ4 α5 + φ5 Notes: Negative binomial estimates. Traits for Columns (1) to (3) are conscientiousness, extraversion, and reciprocity, respectively. αt + φt gives the normalized treatment effect for workers who score high on a given trait. All traits are standardized so that a coefficient gives the impact of a one standard deviation change, with the exception of reciprocity which gives the dichotomous change. Standard errors are calculated for a sum of random variables. Statistical significance is based on Chi-squared test. The sample is all workers who responded to the questionnaire. Robust standard errors in parentheses, adjusted for clustering on worker. ***, **,* indicate significance at 1-, 5-, and 10-percent level, respectively. 41 A.2 Proofs for propositions 1, 2, 3, and 4 This portion of the appendix provides proofs of the propositions in the behavioral predictions section. Proof of Proposition 1 First order conditions for a treatment group worker are given by: ∂v = z + p0 (e1 ) − c0 (e1 ) = 0 ∂e1 ∂v = p0 (e2 ) − c0 (e2 ) = 0 ∂e2 By inspection, optimal effort for treatment workers in period 1, e∗T 1 , is increasing in the piece rate, given concavity of p(·) and convexity of c(·). Optimal effort in period 2, e∗T 2 , however, is independent of the piece rate. First order conditions for a control group worker are given by: ∂v = p01 − c0 (e1 ) = 0 ∂e1 ∂v = p02 − c0 (e2 ) = 0 ∂e2 Optimal effort in period 1 for control workers, e∗C 1 is obviously less than optimal effort for treatment workers, given that control workers have z = 0 in the first order condition for period 1. The first order condition for period 2 effort is identical for treatment and control ∗T so we have e∗C 2 = e2 regardless of the level of z. Thus, z > 0 causes treatment group workers to work harder in period 1 than control group workers, but effort is the same for treatment and control in period 2. Proof of Proposition 2: First order conditions for a treatment group worker are given by: ∂v = z + p0 (e1 ) − c0 (e1 ) − c0 (e2 , k(e1 ))k 0 (e1 ) = 0 ∂e1 ∂v = p0 (e2 ) − c0 (e2 , k(e1 )) = 0 ∂e2 Totally differentiating with respect to z yields: 00 ∂e1 c (e1 ) + c00 (e2 , k(e1 ))k 0 (e1 ) − p00 (e1 ) c00 (e2 , k(e1 ))k 0 (e1 ) 1 ∂z = 2 0 c00 (e2 , k(e1 ))k 0 (e1 ) c00 (e2 , k(e2 )) − p00 (e2 ) ∂e ∂z Denote the first matrix, consisting of second derivatives, by H. Second order conditions for an (interior) maximum imply that the determinant of H must be positive. Applying Cramer’s Rule, the derivatives of the first and second period effort levels with respect to z are given by 00 1 c00 (e2 , k(e1 ))k 0 (e1 ) c (e1 ) + c00 (e2 , k(e1 ))k 0 (e1 ) − p00 (e1 ) 1 det det 0 c00 (e2 , k(e2 )) − p00 (e2 ) c00 (e2 , k(e1 ))k 0 (e1 ) 0 ∂e1 ∂e2 = and = ∂z |H| ∂z |H| 42 ∂e2 1 In order to have ∂e ∂z > 0 and ∂z < 0, the signs of the determinants of the two numerator matrices must be positive and negative, respectively. Writing out the determinants, these conditions can be stated c00 (e2 , k(e2 )) − p00 (e2 ) > 0 and − c00 (e2 , k(e1 ))k 0 (e1 ) < 0. Both of these hold unambiguously given assumptions about convexity of c(·), concavity of p(·), and k 0 (e1 ) > 0. This proves the claim in Proposition 2. Proof of Proposition 3: With a rest break between period 1 and period 2 sufficient to have a fatigue stock of zero regardless of period 1 effort, we have k 0 (e1 ) = 0. As shown in the proof for Proposition 2 2 this implies ∂e ∂z = 0, so period 2 effort of treatment workers is unaffected by having performance pay in period 1. Given c(e2 , 0) = c(e2 ), treatment and control workers have the same first order conditions and optimal effort levels for period 2 effort. This proves the claim in Proposition 3. Proof of Proposition 4: First order conditions for a treatment group worker are given by: ∂v = z + θ1 + p01 − c01 = 0 ∂e1 ∂v = θ2 + p02 − c02 = 0 ∂e2 Totally differentiating with respect to z yields ∂e1 00 1 + θ10 c1 − p001 0 ∂z = 2 θ20 0 c002 − p002 ∂e ∂z Denote the first matrix by H. Note that, due to convexity of c(·) and concavity of p(·), |H| = (c001 − p001 )(c002 − p002 ) > 0. Applying Cramer’s rule, the derivatives of the first and second period effort levels with respect to the piece rate are given by 00 1 + θ10 0 c1 − p001 1 + θ100 det det θ20 c002 − p002 0 θ200 ∂e1 ∂e2 = and = ∂z |H| ∂z |H| ∂e1 ∂e2 In order to have ∂z > 0 and ∂z < 0, the signs of the determinants of the two numerator matrices must be positive and negative, respectively. Writing out the determinants, these conditions can be stated (1 + θ10 )(c002 − p002 ) > 0 and (c001 − p001 )θ20 < 0. 0 The first condition holds as long as 1 > −θ1 . Recall that the marginal utility of income was normalized to 1. Thus, the first condition states that period 1 effort is increasing in the wage, as long as the marginal utility of an additional dollar is greater than the marginal reduction in intrinsic motivation from introducing z > 0. The second condition always holds, given our assumptions: introducing performance pay (for period 1) must 2 reduce period 2 effort because it decreases intrinsic motivation in that period ( ∂θ ∂z < 0), without generating any offsetting financial incentives in period 2. This proves the claim in Proposition 4. 43
© Copyright 2025