An Experimental Investigation of Stochastic Stability∗ Wooyoung Lim† Philip R. Neary‡ March 30, 2015 Abstract This paper discusses an experiment designed to test which, if any, model of errorprone best-responses most accurately predicts long run behaviour in large populations. In other words, what is the correct noisy population dynamic to invoke when using the equilibrium selection technique of stochastic stability? The game the subjects play is the simplest possible setting in which different deterministic dynamics coupled with different noise components can select different long run outcomes. We find that the best-reply dynamic with uniform errors, where all players myopically best-respond each and every period with probability close to 1 (the deterministic component) and make mistakes independently of the payoff penalty (the noise component), yields the most accurate prediction. We also find a time trend to mistakes, with the magnitude tapering off as time progresses. This is in contrast to much of the literature that assumes a variety of other specifications of revision opportunities, and time-independent and payoff-dependent, “logit”, mistakes. Keywords: Stochastic Stability; Equilibrium Selection; Experiment; Evolutionary Game Theory. JEL Classification: C72, C73, C92. ∗ Special thanks to Vince Crawford, Xun Lu, and Jonathan Newton for detailed comments, and to Yong Kil Ahn for his excellent research assistance. We would also like to thank Jesper Bagger, Chris Gee, Sotiris Georganas, Jacob Goeree, Sung-Ha Hwang, Heinrich Nax, Peter Neary, Santiago Oliveros, Juan Pablo Rud, Hamid Sabourian, Ija Trapeznikova, and seminar participants at the North-American ESA Conference, NUI Maynooth, Royal Holloway, and Universit¨at St. Gallen. † Email : [email protected]; Web: http://ihome.ust.hk/∼wooyoung/index.html/. Address: Department of Economics, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong. ‡ Email : [email protected]; Web: https://sites.google.com/site/prneary/. Address: Department of Economics, Royal Holloway, University of London, Egham, Surrey, TW20 0EX. 1 1 Introduction The issue of equilibrium selection has long provided headaches for economic theorists. In economies with multiple equilibria, modelled as games with multiple equilibria, exactly what should we expect to observe?1 The original, so-called “deductive”, selection techniques, i.e., that some equilibria are more focal than others (Schelling, 1960), or that some equilibria are safer than others (Harsanyi and Selten, 1988), proved unsatisfying as they were silent on how equilibrium beliefs would come to being. The most commonly applied “inductive” selection technique is stochastic stability (Foster and Young, 1990), that makes a unique prediction in many simple coordination problems.2 Most famously, it leads to uniform adoption of the risk dominant action when a large population are repeatedly matched to play a symmetric 2×2 game of pure coordination (Foster and Young, 1990; Kandori, Mailath, and Rob, 1993; Young, 1993). There are three fundamental components to applying stochastic stability. First, there is some procedure, possibly deterministic, possibly random, that specifies the players to be afforded a revision opportunity in a given period. Second, it is assumed that individual players follow a simple updating rule, a heuristic, whenever they are afforded a revision opportunity. Finally, it is assumed that players occasionally deviate from the heuristic, by choosing an action that is not prescribed by it (this is interpreted as mutations in theoretical biology, and as mistakes/experimentation in economics).3 The first two components coupled together define a population dynamic (an “adjustment process”); appending the third yields a noisy population dynamic (a “stochastic adjustment process”).4 1 To be more precise, really we are considering games with multiple strict equilibria. Strict equilibria are those in which any unilateral deviation incurs a strict loss in utility. Strict equilibria are always strategically stable in the sense of Kohlberg and Mertens (1986) and Mertens (1989, 1991), and hence “un-refineable”. 2 In recent years the global games approach of Carlsson and Damme (1993) has gained huge popularity, in part due to its predictive power in many standard macro models where the alignment of expectations plays a prominent role. Morris and Shin (2003) is a detailed survey. 3 There is a fourth assumption: that time is infinite. While this is clearly not implementable in a laboratory setting, the issue of how much time is needed to simulate an infinite horizon is important. This is addressed in Section 3. 4 For a given noisy population dynamic, we will occasionally abuse terminology and refer to the 2 While the above is very general, in practice only a few such noisy dynamics are considered, with choices for each component made from relatively short menus. Regarding procedure, typically one assumes either that each player independently revises his action with equal probability, or that one randomly drawn player does so with certainty. Upon being afforded a revision opportunity, the most common updating rule is to take a myopic best-response to the previous period’s population profile.5 Vis-`a-vis the introduction of noise, almost all attention has focused on one of two mistake models: the uniform mistakes of Kandori, Mailath, and Rob (1993) and Young (1993), and the payoff-dependent, “logit”, mistakes introduced in Blume (1993). But if stochastic stability is to be taken seriously as a selection criterion, it is important to know what combination from the above menus yields the best proxy for aggregating individual behaviour. Unfortunately, for many simple games, noisy dynamics based on choices from each of these menus yield the same prediction, while for more complex games, computing stochastically stable equilibria is a non-trivial task, and so authors typically just assume one noisy dynamic and work only with it. In this paper, we describe a laboratory experiment designed to test which, if any, noisy best-response based dynamic is the most accurate predictor of long run behaviour. The game that the subjects play is the Language Game of Neary (2012), that was introduced in the hope of discerning how different group properties might affect long run outcomes, termed “conventions” (Lewis, 1969; Young, 1993, 1996), in large population settings with differing preferences. The Language Game describes an environment where actions are strategic complements, but preferences are not the conventional homogenous ones. It thereby provides a framework for studying the emergence of coordinated outcomes with network effects (e.g. technologies, standards, languages, etc.), when individuals differ in what they view as most desirable.6 first two components jointly as “deterministic”, and the third as the “noise” component. 5 There are many others. Fudenberg and Levine (1998) and Young (2005) are textbook treatments. 6 Most of the literature has focused on the (homogeneous) case where everybody wants the same coordinated outcome - the issue of interest then being whether or not the population will ultimately be successful in coordinating on this outcome. There do exist models with heterogeneous preferences the most simple one being a large population “Battle of the Sexes”. (See for example the asymmetric 3 There are always at least two Pareto efficient equilibria in the Language Game, so the tradeoff is not the standard one of risk dominance versus efficiency.7 Regarding equilibrium selection with uniform mistakes, it is not the case that one equilibrium (the “safer”, risk dominant one in homogeneous agent models) is selected due to its larger basin of attraction. In fact, the size of each equilibrium’s basin of attraction often has little to do with selection in the Language Game.8 Regarding equilibrium selection with logit mistakes, a group’s very strong preferences may tilt things in favour of its preferred outcome, even if its numbers are far fewer. The Language Game seems ideal for our experiment: it is simple enough to compute stochastically stable outcomes for each of the standard noisy dynamics, yet also sufficiently involved that different noisy dynamics can make different predictions for a wide range of parameters. Furthermore, unlike many experimental tests of theoretical models, the Language Game is both extremely accessible for subjects and easily codeable, and as such, we are not compelled to use a “watered down” version of it. We allow twelve populations of twenty subjects to play 200 rounds of the Language Game, each with one of three sets of parameters (that are chosen carefully so that the dynamics can be separated). Our interest is in (i) long run outcomes, and (ii) behaviour at the individual level. Comparing long run outcomes to those predicted by the different noisy dynamics is straightforward - simply record the time average of the various equilibria, and contrast this with the theoretical prediction of each.9 We find that observed play matches the prediction of the best-reply dynamic with uniform contests of Samuelson and Zhang (1992).) The limitation to this framework is that players only interact with those from the other group - women with men, and men with women - so the only bolstering from those with similar preferences comes via how they affect those in the other group. Furthermore, this framework does not seem particularly realistic for examples like technologies, standards, and languages, as one only interacts with those with those who have different preferences. 7 For a wide range of parameters, there is also a third pure strategy equilibrium. This equilibrium is somewhat unusual in that it involves a failure to coordinate across group. If, for example, actions are interpreted as operating systems, then such an outcome allows for the coexistence of more than one operating system in equilibrium - a seemingly natural feature missing from existing models. 8 Related to this is that the sufficient conditions of the various “radius-coradius” results in Ellison (2000) are frequently not satisfied. See Section 5.2 of Neary (2012). 9 Technically, stochastic stability can only be invoked once the population dynamics are, in a sense, “constant”. Given that our subjects were more mistake prone in earlier rounds, there is an argument that we should limit the analysis to behaviour after some ‘cutoff’ period. More on this in Section 4.2. 4 mistakes in majority of sessions. The other benchmark dynamics, best-response based dynamics with logit mistakes and best-response based dynamics with uniform mistakes but different revision protocols, are outperformed in all treatments where the forecasts disagree.10 The fact that a noisy dynamic based on agents making payoff-dependent mistakes does not make the best prediction is definitely a surprise. There is by now a wealth of evidence detailing that individual mistakes are highly systematic, and in particular that the payoff penalty associated with a mistaken action choice is a strong determinant of its likelihood. In experimental economics especially, parameterising payoff-dependent mistakes has gained a strong foothold due to the popularity of the concept of Quantal Response Equilibria introduced in McKelvey and Palfrey (1995).11 Ascertaining precisely how players err is important. It is well known since Bergin and Lipman (1996), that if the procedure and learning rule of a dynamic are fixed (component #s 1 and 2 of stochastic stability from the second paragraph), then any equilibrium can be selected for an appropriately defined model of mistakes. As such, they argue that determining the “nature of the mistake process must be analysed more carefully to derive some economically justifiable restrictions ... It is an open question whether and what kinds of interesting restrictions will emerge.” While our analysis shows that players err in a payoff independent manner, we do find a time component in how the mistakes occur. Stochastic stability with time-dependent mistakes are considered in both Pak (2008) 10 Each of these ‘different’ revision protocols are homogeneous, in the sense that in any given period, the likelihood of each agent being afforded a revision opportunity is equally likely. However, heterogeneity in learning is another interesting avenue of research. Neary (2012) shows that group dynamism, interpreted as a fixed number of discontent agents from each group best-responding each period, has a strong bearing on selected outcomes in the Language Game. While this is interesting, it is arguably somewhat artificial, and seems difficult to implement in the laboratory. However, if group-dynamism is interpreted as “speed of learning”, then related to this is the well-known paper of Cheung and Friedman (1997). They compared the performance of two deterministic dynamics: myopic best-reply to the current population profile and ‘fictitious play’ (myopic best-reply to a longer memory), and found that the longer memory process performed better. 11 Quantal response functions were first introduced in a model of individual choice by Luce (1959), but modelling mistakes as appearing in a priori ‘reasonable’ ways has other solid theoretical foundations. Myerson (1978) introduced the proper equilibrium, a refinement where, within the set of non best-responses, better performing strategies receive a probability of higher order than worse ones. Maruta (2002), Yi (2009) and Yi (2011) are evolutionary models with payoff-dependent mistakes. 5 and Robles (1998), and could also be related to a theory like that of Van Damme and Weibull (2002), where players control the probability of implementing the intended strategy by expending effort, but doing so becomes easier with time. That the noisy dynamic fitting our data best has a revision protocol where all players react each and every period may not seem surprising, since, in laboratory experiments with discrete time periods, it is standard for the experimental designer to allow this.12 However, theoretically at least, when coupled with uniform mistakes this deterministic dynamic can make very different predictions to ones extremely close to it. For example, consider a noisy dynamic where all players react every period with a probability that is ˙ and mistakes are uniform. Such a arbitrarily close to, but strictly less than 1, say ∼0.9, dynamic can generate very different predictions and hence would not fit our data. This addresses another issue of evolutionary dynamics that is often overlooked. That is, the Language Game shows that if the learning rule and the model of mistakes are held fixed (component #s 2 and 3 from the second paragraph), then the revision protocol can also matter hugely for selection.13 While the best-reply dynamic with uniform mistakes is the best predictor of population behaviour, we also inspect the data at the individual level in case the accurate prediction of this noisy dynamic is pure fluke, with the population outcome in fact driven by some other underlying process whose prediction just happens to coincide. This is particularly true given that although 200 rounds of play is definitely at the longer end of most experimental studies, it is still nowhere near the infinite number required for theoretical predictions to be felt with certainly. With this in mind, with our individual level data analysis, we are effectively trying to answer the following two-part question: “are players in this experiment behaving as imperfect myopic best-responders, and, if yes, when mistakes are made, are these mistakes payoff-dependent?” The answer to the first part is “yes”, but, contrary to what was anticipated, even at the individual level 12 Some recent experiments on continuous time games (see for example Friedman and Oprea (2012)) are interesting exceptions to this. 13 The revision protocol may not always matter however - for example when the game is a potential game (Shapley and Monderer, 1996) and mistakes are logit. See Al´os-Ferrer and Netzer (2010). 6 we find that individual mistakes are not best-approximated as logit. We find that mistakes occur as uniform, but with a time component where the mistake frequency decays over time. In other words, while we find evidence for social learning in the sense that subjects form expectations about population behaviour, that are themselves arrived at endogenously based on what has happened in the past, we also find that our subjects learn to control their own behaviour better as time progresses. This could be because the subjects attain a better understanding of the game, or perhaps because they are more confident that others will choose “correctly”, or even that they are learning from their past mistakes. The classic paper on experimental tests of equilibrium selection in coordination games is Van Huyck, Battalio, and Beil (1990). These authors looked at long run outcomes in the “minimum effort game”, and found that the inferior, risk-dominant equilibrium consistently emerged once population size exceeded a threshold. As pointed out by Crawford (1991), the minimum effort game is analogous to a multi-action stag hunt, and thus, their findings complement the theoretical predictions of the papers listed in the opening paragraph.14,15 The problem, as we see it, with these findings, is that in a homogeneous environment, supposedly only one property matters for equilibrium selection: risk-dominance. Our research program, of which this paper is one part, is to ascertain what noisy dynamic best-approximates large population behaviour, and ultimately, using this, to understand what group properties matter for long run outcomes in heterogeneous environments.16 Two other papers are important to mention. The first is a recent paper, Maes and 14 Each of the noisy dynamics considered in this paper select the risk dominant equilibrium in the minimum effort game, and as such, it cannot be used to parse them. 15 It is standard when studying stag hunts in large populations to normalise payoffs along the diagonal and set off-diagonal payoffs to zero. Using this and the argument of Crawford (1991), the Language Game is then strategically equivalent to a large population game with two homogeneous groups, where the minimal action for one group is the maximal action for the other, and vice versa. 16 Theoretically at least, other properties also matter for equilibrium selection. While we already mentioned group dynamism, it is not hard to see that network architecture can also matter (Neary, 2013); Network architecture has no effect on selection with the canonical homogeneous model - risk dominance always wins - except on a small class of networks (Ellison, 1993; Jackson and Watts, 2002; Peski, 2010). 7 Nax (2014), that poses a very similar question as this one. Maes and Nax (2014) analyse individual level data from a large population experiment (in a game that is akin to a networked version of the Language Game), and find that subjects’ deviations are sensitive to deviation costs. There are two main differences. First of all, subjects are not informed as to the structure of the network nor the payoffs of the other participants, which means that formally a very complex Bayesian Game is being played.17 Second is that they do not consider the relationship between individual-level mistakes and population-level predictions, and their definition of deviation costs is not the same as ours. The second such paper is Crawford (1995). The ‘adaptive dynamics’ studied here, tailored perfectly to understand behaviour in large population coordination problems, are the first to fuse both rules governing strategy updating and the priors with which players ‘enter’ the game. In other words, the model nests stochastic evolutionary dynamics and beliefs-based adaptive learning. While the dynamics apply only to coordination problems in which players’ roles are symmetric and symmetric strategy profiles are the only pure strategy equilibria, neither of which hold for the Language Game, “heterogeneizing” these dynamics seems a potentially very fruitful avenue to pursue. The balance of the paper is organised as follows. In the next section, we formally define the Language Game and provide insight as to why different noisy dynamics may select different equilibria. Section 3 describes the experimental design, while Section 4 presents and discusses the results. Section 5 concludes. 17 The lack of information afforded subjects in this experiment has the nice feature that it allows Maes and Nax (2014) to consider convergence to equilibrium under the ‘uncoupled’ and ‘completely uncoupled’ dynamics of Hart and Mas-Colell (2003) and Foster and Young (2006) respectively. 8 2 The Language Game 2.1 The Game The Language Game, G, is a simultaneous move game defined as the tuple {N , Π, S, G}, where N := {1, . . . , N } is the population of players; Π := {A, B} is a partition of N into two nonempty homogeneous groups A and B of sizes N A and N B respectively (N A , N B ≥ 2); S := {a, b} is the binary action set common to all players; G := GAA , GAB , GBB is a collection of local interactions, where GAA is the pairwise exchange between a player from Group A and a player from Group A, etc. These local interactions are given in Figure 1 below, where α, β ∈ (1/2, 1). Utilities are the sum of payoffs earned from playing the field, where the same action must be used with one and all.18 We assume players do not randomise. GAA GBB A2 A1 a b a α, α 0, 0 B2 b 0, 0 1 − α, 1 − α B1 a b a 1 − β, 1 − β 0, 0 b 0, 0 β, β GAB B A a b a α, 1 − β 0, 0 b 0, 0 1 − α, β Figure 1: The three local interactions, GAA , GAB , and GBB Fixing an order on the players, with each Group A player listed before any from Q A QN B Group B, we define S := N i=1 S× j=1 S, with typical element s. When a player chooses action s, from his perspective, action profile s ∈ S can be viewed as (s; s). It is important to note that despite the heterogeneity and the fact that players can be matched with 18 While this has a different interpretation to random matching, optimising behaviour is the same. 9 more than one ‘type’ of player, local interactions are opponent independent, in that one’s payoff is determined by his choice of action and the action choice of his opponent. That is, the opponent’s identity does not matter, just their behaviour. This feature then ‘scales up’ to the population level since a player cares only about the number of others using each action and not on how those others are distributed across the two groups. Given this, for any population profile s ∈ S, let na (s) denote the number of players choosing action a at s. (Clearly then the number of players choosing action b at s is equal to N − na (s).) With this notation, the utility a player in group K ∈ Π receives from taking action s ∈ {a, b} when the population profile is s, written U K (s; s), is given by U A (a; s) := na (s) − 1 α U A (b; s) := N − na (s) − 1 (1 − α) U B (a; s) := na (s) − 1 (1 − β) B U (b; s) := N − na (s) − 1 β (1) (2) (3) (4) By Theorem 1 in Neary (2012), the only pure equilibria are the group-symmetric profiles (a, a), (a, b), and (b, b), where the first boldface symbol refers to the action commonly chosen by those in Group A and the second to that commonly chosen by everyone in Group B. While (a, a) and (b, b) are always equilibria, profile (a, b) is an equilibrium if and only if the smaller group’s preferences are sufficiently strong. Profile (b, a) is never an equilibrium. Two things are worth mentioning. First, it is standard when studying large population games with a coordination aspect to divide them into games of “strategic complements” and those of “strategic substitutes”.19 However, while the Language Game is clearly in the former category, it is not a pure coordination game in the standard sense 19 Galeotti, Goyal, Jackson, Vega-Redondo, and Yariv (2010) make the distinction very clear, with games of strategic complements tend to receive more attention in the literature. Bramoull´e and Kranton (2007) is an example of a game of strategic substitutes; Boncinelli and Pin (2012) study the “best shot” game - a setting where actions are absolute strategic substitutes. 10 since it does not possess a unique Pareto efficient equilibrium (Group A players prefer (a, a), while Group B players prefer (b, b)).20 Thus, the standard tradeoff of Pareto efficiency versus risk dominance is not present.21 Furthermore, despite the lack of a unique pareto efficient equilibrium, it is by no means clear that one should categorise the Language Game as a coordination game with tension a` la a Battle of the Sexes (see the “asymmetric contests” of Samuelson and Zhang (1992) for the large population analog), since there are subsets of the population, the groups, wherein all agents agree on what population profile they would most like to see emerge and yet these agents also interact. This feature of the Language Game means it allows it to provide a more realistic framework for studying the emergence of standards and operating systems (Farrell and Saloner, 1985; Katz and Shapiro, 1985; Arthur, 1989), since it does not insist that preferences are homogeneous but still allows a given player to interact with everyone in the population. The second point concerns interpretation. There are two different, but straightforward, ways to interpret the Language Game. The first involves taking the most commonly used setting in the literature on large population games and simply “doubling” it. That is, suppose that there are two groups, A and B, each located on a distinct island. For each of these island economies, the local interactions are given by GAA and GBB respectively. The Language Game can then be thought of as an “opening up” of the islands to one another (thereby requiring the addition of the across group local interaction GAB ). The other interpretation supposes that the across group local interaction, GAB , is the primitive (note that this local interaction is a Battle of the Sexes). Thus, the Language 20 However, the Language Game is a ‘pure coordination problem’ according to Young (2001), who defines such a game as one wherein all players have m strategies, and strategy sets can be ordered such that it is a strict equilibrium for each player to play their mth strategy. 21 In fact, it is not clear how one ought define risk dominance in this setting. If the standard definition of a best-reply to 50% of the population using either action was adopted, then the risk dominant profile is always (a, b). But this could yield the undesirable conclusion that the risk dominant profile need not be an equilibrium. If an alternate definition of a best-reply to the profile (a, b) is used, then the risk dominant profile, while always an equilibrium, could involve players from the smaller group adopting the larger group’s preferred action, and this may not sit well either. 11 Game adds to a large population Battle of the Sexes that member of each “sex” also interact amongst themselves, a feature absent from existing large population games with asymmetry. To put it another way, it is not only the case that players are playing more than one game simultaneously - this is one interpretation of all large population setups - but rather that they are playing more than one type of game simultaneously. 2.2 Equilibrium Selection Now suppose that the Language Game is the stage game in an evolutionary setting. Time is discrete and goes forever. Utilities are received every period, and, when afforded the opportunity to update his action for tomorrow, a player is assumed to take a bestresponse to today’s action profile. The previous paragraph defines a deterministic population dynamic. There are three standard best-response based dynamics used in the literature, each of which satisfies the Darwinian mantra of “better strategies today are [weakly] better represented tomorrow”. The first of these is the best-reply dynamic. This stipulates that every period, all players update their action, taking a best-response to the current population profile. The second, known as independent inertia (Noldeke and Samuelson, 1993; Samuelson, 1994; Kandori and Rob, 1995), assumes that every period, each player is “activated” with the same probability, and that those activated players choose a best-response to the current population profile.22 The final dynamic is asynchronous learning (Binmore and Samuelson, 1997; Blume, 2003). This assumes that each period, one player is randomly chosen (typically, each is chosen with equal probability, 1/N ), and that the appointed player best-responds to the current population profile. Deterministic dynamics like those defined above can exhibit what Arthur (1989) termed path dependence. Informally, this just says that the initial strategy profile can have a strong bearing on the terminal outcome. In the case of the best-reply dynamic, 22 Note that the best-reply dynamic is a special case of independent inertia where the activation probability equals one. As regards equilibrium selection, however, a discontinuity can occur when this activation probability equals one. As such, throughout this paper, the term “independent inertia” refers to a revision protocol where the activation probability is strictly less than one. 12 initial behaviour always uniquely determines the final rest point; for the others, it often uniquely determines the final rest point for many initial configurations. Foster and Young (1990) were the first to show that adding noise to a deterministic dynamic can remove this path dependence. By assuming that players will forever occasionally deviate from their behavioural rule, interpreted as mistakes/mutations/shocks at the individual level, “noise” is added to the dynamics, and so the system is always in flux and permanent lock-in never occurs. However, despite the perpetual instability, there is some regularity to the randomness and the bulk of time is spent localised around a subset of the equilibria - the stochastically stable ones. Noise is typically assumed to occur in one of two ways, and when interpreted as resulting from individual mistakes each has a nice behavioural interpretation.23 The first is pure randomness: a player with a revision opportunity chooses an inferior action with a fixed likelihood that is independent of all outside factors. The second is the payoff-dependent “logit” variant introduced by Blume (1993), where the probability of choosing a particular action is dependent on how that action will affect utility.24 Crudely put, the formula for stochastic stability is “population dynamic” + “noise” ⇒ selection.25 Given that different choices of components on the LHS can affect the 23 Bergin and Lipman (1996) showed that for a given deterministic dynamic, any equilibrium can be rendered stochastically stable for an appropriately defined model of noise. However, while technically correct, it is not hard to see that in some games, the noise must be generated by mistakes that occur in almost pathological ways with no reasonable behavioural interpretation, for certain equilibria to be selected. 24 Recently, a new model of mistakes, the so-called ‘directed’ or ‘intentional’ errors, have been proposed by Naidu, Hwang, and Bowles (2010). In a binary action game, this translates to players being infinitely more likely tp make mistakes away from one action over the other. In our framework, as in the Battle of the Sexes as studied in Hwang, Naidu, and Bowles (2013), this would presumably translate to subjects from Group A only ever mistakenly choosing action a while those from Group B accidentally choose action b. As we discuss in Section 4.2 on individual level behaviour, while we do observe a leaning towards such directional-esque mistakes, it is far from absolute. Perhaps more importantly, mistakes of this form do not corroborate with what we observe at the population level as they would always predict the equilibrium outcome (a, b), a profile we observe in only two sessions (both in Treatment G2). 25 The word “refinement” has the conventional meaning of throwing out some equilibria, while the word “selection” has come to mean throwing out all but one of the equilibria. While stochastic stability is commonly referred to as a selection criterion, it is more accurate to refer to it as a refinement. While this only happens for non-generic parameters in the standard homogenous model, with uniform mistakes more than one equilibrium can be stochastically stable for an open set of parameters in the Language Game. 13 outcome on the right and given how often the concept is invoked in experimental studies, the point of the current paper is to give guidance over what components should be used. It is useful to break the large array of choices into subsets. Asynchronous learning and independent inertia always make the same selection in the Language Game when coupled with uniform mistakes (Neary, 2013); the best-reply dynamic, however, may give rise to very different predictions with uniform mistakes (Neary, 2012). As regards logit mistakes, Al´os-Ferrer and Netzer (2010) have a very useful result. They show that for all “Best-Response Potential Games” (Voorneveld, 2000), logit mistakes coupled with any deterministic dynamic with an arbitrary specification of revision opportunities, will select the same equilibrium. Since asynchronous learning, independent inertia, and the best-reply dynamic all satisfy this revision protocol, and the Language Game is a potential game (Shapley and Monderer, 1996) - a strict subset of the set of bestresponse potential games - we are left with three different (classes of) noisy dynamics that can make different predictions in the Language Game. Each dynamic is composed of a deterministic component and a noise component. D1 : best-reply + uniform mistakes D2 : asynchronous learning / independent inertia + uniform mistakes D3 : arbitrary specification of revision opportunities + logit mistakes Using an example, we now sketch how each of these three noisy dynamics can select different equilibria in the Language Game.26 The parameters are those from our first treatment, Game1: (N A , N B ) = (11, 9) and (α, β) = (0.57, 0.67). For these parameters, action profile (a, b) is an equilibrium. A diagram is helpful. While the Language Game has a population of size N (set equal to 20 in each of our treatments), there are in actuality only two types of player. 26 We will be somewhat vague on the mathematical machinery needed to compute stochastically stable equilibria. It involves “tree-surgery” techniques developed by Freidlin and Wentzell (1998), and first introduced to game theory in Foster and Young (1990). Due to the incredible popularity of the papers of Kandori, Mailath, and Rob (1993) and Young (1993), these techniques are now quite standard. Young (1993) is the most complete treatment. 14 9 8 x nB a `c `c `c s@ 7 s 6 s 5 s 4 s 3 s @ `c `c `c `c `c `c s s s s s ~ s @ @ `c `c s s s s s s s s s s s s s s s @ @ `c @ `c `c `c s @ `c `c s @ s @ `c @ `c s s s@ `c @ s s s s@ `c @ @ s s s s s @ `c `c `c `c @ @ `c @ s s @ `c `c@ s @ c`@ `c `c `c `c s @ @ c` c` c` s 2 s s s s s s 1 s s s s s s @ `c s @ s @ s 0 s~ s s s s s s 2 3 4 5 0 1 6 nA a c` @ @ s s@ 7 −→ 8 s s @ @ s s F s 9 s s s s ~ s 10 11 Figure 2: Condensed State Space. Thus, at any point in time the action profile can be summarised by a 2-dimensional A B vector (nA a , na ), where na is the number of players in Group A currently using action A B a, and nB a is the corresponding statistic for Group B. Clearly na + na = na . Figure 2 shows a condensed version of the action space S, what is commonly referred to as the state space, as a 12 × 10 lattice, with nA a ∈ {0, . . . , 11} on the horizontal-axis, and nB a ∈ {0, . . . , 9} on the vertical-axis. Each ‘state’ is depicted by a circle. Equilibrium states are depicted by large circles. For the parameters of this problem (and for the parameters of each of our treatments), it can be checked that state (11, 0), uniquely identified with the action profile (a, b), is an equilibrium. At most action profiles, optimal behaviour is the same for all players in a given group, so further information can be conveyed in a picture of the state space, like that in Figure 2, via colour-coding and shading.27 Under any best-response based 27 The quantifier “most” in this sentence in important. Technically, for some games, there can be a (small) set of states at which the best-responses for two players in the same group do not accord. (This is the case in the 10 person homogeneous population of the leading example in Section 2 of Kandori, Mailath, and Rob (1993).) However, it is not an issue in this example, nor in any of our treatments. 15 dynamic with an arbitrary specification of revision opportunities, all solid blue states will lead eventually, in the case of asynchronous learning or independent inertia, and immediately, in the case of the best-reply dynamic, to state (0, 0). The reason for this is at each of those states, there is uniform preference for which action is better - in this case action b. At these states, one can think of the dynamics as a current that is pushing down and to the left. A similar but opposite statement holds for all solid red states - they will lead with certainty to the equilibrium state (11, 9). Both of these sets are separated by correspondingly-coloured lines running at 45 degrees from northwest to southeast.28 Now consider the remaining states. At each of these states, group preferences disagree. That is, Group A players prefer action a while Group B players prefer action b. Any deterministic dynamic is pushing down and to the right. All these states transition immediately to (11, 0) under the best-reply dynamic. These states are further colour coded for the purposes of asynchronous learning and independent inertia. The solid black states will lead with certainty to equilibrium (11, 0). All hollow red states, those contained inside the small red triangle, will lead to either state (11, 0) or state (11, 9). They cannot lead to (0, 0), because the dynamics cannot “move left” and thus “get back to” any blue state due to the preferences of Group A driving population behaviour further to the right of the state space. A similar statement holds for all hollow blue states, those in the small blue triangle, in that they lead to either (0, 0) or (11, 9). Under asynchronous learning and independent inertia, all hollow black states can lead to any of the three equilibria, depending on what subset of players are randomly selected to revise their actions and in what order they are chosen to do so. For example, from state (6, 5), there is positive probability that three Group B players will be activated (i) in successive periods under asynchronous learning, and (ii) in the same period (with no Group A players activated) under independent inertia, and with each best-responding the resulting state will be (6, 2). From state (6, 2), the dynamics are 28 The fact that the local interactions are opponent-independent is what makes the boundary states of these sets lie at 45 degrees. If players had a stronger preference for coordinating with those from their own group, then these boundaries would be tilted away from 45 degrees. 16 unambiguous and will lead to equilibrium (0, 0). The green star, F, is the average period 1 population profile for this treatment. More on this when we discuss the results in Section 4. Now let us look at stochastically stable equilibria. For how the tree-surgery techniques alluded to in Footnote 26 are applied to the Language Game, the reader should consult Neary (2012) and Neary (2013) for precise statements. For understanding the intuition, the key feature is computing how easily each equilibrium can be escaped from. Transitions that ‘go against the flow’ of the dynamics have a cost associated with them; transitions that occur naturally ‘with the flow’ are costless. We begin with the case where mistakes are uniform. Each transition against the flow has equal cost (hence the name “uniform”), and this cost is normalised to 1. Suppose initially that population behaviour is at (b, b), i.e. at state (0, 0). From this state, if any 9 players mistakenly choose the wrong action (at the same time under best-reply, or sequentially under asynchronous learning), then the resulting population behaviour will be at some state where na = 9, from where it need not return to (0, 0).29 From there, a series of costless transitions will lead to (11, 0) (this is what will happen immediately and with certainty in the case of the best-reply dynamic, and with positive probability in the case of asynchronous learning or independent inertia). It is thus said that a minimum cost path from (0, 0) to (11, 0) has a cost of 9. A similar analysis is done for all pairs of equilibria and stochastically stable outcomes are then quite easily computed. Whenever (a, b) is an equilibrium, as it is for all the treatments in this paper, the dynamics D1 and D2 often select the same equilibria and can therefore be difficult to parse. (However, this is not true when profile (a, b) is not an equilibrium.) Despite the fact that (a, b) is an equilibrium for the parameters of Game 1, the intuition for the disparity in selection can still be gleaned from Figure 2. The important states are those where group preferences disagree. Under the best-reply dynamic, each of those states uniquely determines the terminal rest point but this is not true under the other 29 Ellison (2000) would say that the radius of (0, 0), the minimum number of mistakes needed to escape from (0, 0), is equal to 9. 17 dynamics. For example, under asynchronous learning, it is possible to transition from (11, 9) to (0, 0) with only 7 mistakes - by first transitioning to state (4, 9), followed by a series of costless transitions into one of the solid blue states. Thus, the bestreply dynamic tilts matters in favour of the larger group, while independent inertia and asynchronous learning lean towards the group with stronger preferences (i.e., α vs β). The state space depiction above is less helpful for understanding how to compute stochastically stable outcomes under a dynamic with logit mistakes. While the colour coding in Figure 2 comes from stage game parameters and noise-free deterministic dynamics, with logit mistakes some costly transitions may have very different costs, and so finding paths of minimum cost becomes far more tedious than just a simple counting exercise. However, a convenient short cut is available since each of the local interactions, GAA , GAB , and GBB , is a potential game and therefore so is the Language Game as a whole. Using a result from Neary (2013), the action profile that maximises the potential function is stochastically stable under these dynamics. 3 Experimental Design, Hypotheses and Procedure Now that we have discussed the intuition for how the different noisy dynamics may select different equilibria, we describe our experiment intended to isolate which noisy dynamic makes the best prediction. 3.1 Design and Hypotheses The population size was fixed at N = 20 throughout. The treatment variables are the group sizes and the strength of payoffs: N A , N B , α, and β. A given game is defined by the tuple (N A , N B , α, β), so our treatments G1, G2, and G3 are given by (11, 9, 0.57, 0.67), (12, 8, 0.58, 0.71), and (15, 5, 0.58, 0.80) respectively. The most important facet of the design is that the different dynamics make different predictions for the different parameter specifications. While only two treatments are 18 needed to obtain complete separation of three dynamics, we used three treatments in order to give each dynamic an opportunity to make the wrong prediction. Table 1 below summarises the stochastically stable outcomes for the three different noisy dynamics of interest. (N A , α, β) D1 D2 D3 Game1 (11, 0.57, 0.67) (b, b) (b, b) (b, b) Game2 (12, 0.58, 0.71) (a, a) (b, b) (a, a) Game3 (15, 0.58, 0.80) (a, a) (b, b) (b, b) Table 1: Experimental Treatments and Stochastically Stable Equilibria Our first hypothesis is very straightforward. In actuality, it says little more than ‘noisy population dynamics’, which are part and parcel of stochastic stability, are worthy of study in a laboratory setting. That is, in large population coordination problems, there are very ‘secure’ equilibria - in the sense that a deviation strictly, and in some cases considerably reduces utility - that can be moved away from. Evaluating this hypothesis is simple. All one needs to verify is that, in at least one of our treatments, population behaviour reaches a strict equilibrium and then drifts away from it. Hypothesis 1. Noisy dynamics have something to contribute. That is, people make suboptimal responses sufficiently regularly that strict equilibria can be escaped from. Next, note that in treatment G1, all three noisy dynamics make the same prediction. Treatment G1 can thus be thought of as our “control” treatment, allowing us to state our first ‘specific’ hypothesis as follows: Hypothesis 2. In Game 1, long run population behaviour will conform with that prescribed by each of the three noisy dynamics. Hypothesis 2 is there to rule out “pathological” mistakes that Bergin and Lipman (1996) highlight might be of issue when invoking stochastic stability. That is, observations consistent with Hypothesis 2 provide evidence in favour of restricting attention only to those dynamics with “sensible” models of mistakes. 19 Having confirmed that all three dynamics with sensible models of mistakes make the same prediction when they should, we next move to parsing them. Due to the incredibly popularity of the concept QRE, our prior was that dynamics supported by payoff-dependent mistakes ought make the “best” prediction. Thus, using treatments G2 and G3, our second hypothesis can be stated as: Hypothesis 3. In Game 2, population profile (a, a) will appear most frequently, and in Game 3, population profile (b, b) will appear most frequently. To reiterate, the [predictive] evolutionary theories lined out in Hypotheses 1-3 above are easily checked. As stated before, to reject Hypothesis 1, no strict equilibrium can ever be escaped from. For Hyptheses 2 and 3, effectively, one can just eyeball what equilibrium profile population behaviour is trending towards, or what equilibrium profile population behaviour spends most time localised around, and then compare this outcome to that stipulated by each of the noisy dynamics. Even in the case where there is a lot of bouncing around of population behaviour - something that immediately implies stochastic dynamics have value - summary statistics like the relative frequency spent at each equilibrium are easily computed. However, there is also the issue of what is going on at the level of the individual, as there is the possibility that we may over infer from the population-level results. It is well within the realm of possibility that in each of our treatments, population behaviour will coincide perfectly with the theoretical prediction of a particular noisy population dynamic, and yet individual behaviour does not conform with the individual learning rule that, when agglomerated, generates this aggregate prediction. Put more simply, since we are considering only three out of a possible infinite number of learning rules, perhaps individuals are behaving in a manner very different to myopic best-response with noise, and yet population behaviour just happens to corroborate the prediction for each of our parameter specifications. Analysis of the data at the individual level will allow us to either refute or validate (or at least to not refute) this. As such, our final hypothesis is the ‘individual level’ version of Hypothesis 3. 20 Hypothesis 4. The probability of making a mistake is higher when the payoff from the best-response is lower, and the probability of making a mistake is lower when the payoff from the non best-response action is higher. Before addressing Hypothesis 4, there remains a concern that subjects are (i) not perfectly myopic, and (ii) in periods 3 and onwards, conditioning behaviour using information from more than just the immediately preceding period. Issue (i) is easily checked by comparing actions taken to those prescribed by the myopic best-response learning rule.30 Issue (ii) is more subtle. However, in a given round of play, the feedback provided to subjects pertained only to the immediately preceding period.31 While it remains possible that our subjects were able to a) recall perfectly information from all previous periods, and b) use this information for strategic purposes, the most recent period is inherently focal. Hypothesis 4 states that deviations from the myopic best-response learning rule depend on payoff differentials. Logit mistakes are only one (nicely-parameterised) model of mistakes allowed under Hypothesis 4, but of course there are others. Our analysis of the individual level data will test generally for any kind of payoff dependent mistakes. 3.2 Experimental Procedure There were four sessions run per treatment, each sharing the same procedure. All sessions were conducted in English at the Hong Kong University of Science and Technology (HKUST). A total of 240 subjects (=12 sessions of 20 subjects) were recruited from the undergraduate and graduate populations of the university. None had any prior experience with this game. Subjects entered the lab and each was assigned a private computer terminal.32 Copies of the experimental instructions were distributed and subjects were given 10 minutes to read them. Communication of any sort between the subjects was forbidden 30 Data analysis shows that in all sessions, the percentage of actions that equate to myopic bestresponses exceeds 92.3%, so this appears robust. See subsection 4.2. 31 The next subsection spells out the experimental procedure precisely. 32 The computer program was written using z-tree (Fischbacher, 2007). 21 throughout thereby removing coalitional effects as a confounding factor.33 After reading the instructions, but before commencing the session, the subjects were required to answer a brief questionnaire demonstrating that they understood how payoffs would be assigned each period. No session would have begun until all students had responded to each question correctly, although the game is sufficiently simple that no problems were encountered. Finally, the experimenter read the instructions aloud to ensure that the information included in the instructions, that at this point was verified as understood, was mutual knowledge, and, depending on the levels of reasoning employed by the subjects, approaching common knowledge. In each treatment play lasted for 200 periods.34 At the beginning of each period except the initial one, each subject was provided with two pieces of information concerning the previous period’s play: the total number of players that had chosen each action; and his own payoff.35,36 Each subject was then prompted to select his action for the forthcoming period. The only difference across periods was how much time subjects were given to make a decision. We allowed subjects 30 seconds to choose an action in the initial period, 15 seconds in periods 2-10, and 10 seconds in all periods thereafter (periods 11-200).37 If a subject did not make a choice within the allowed timeframe, his action from the previous period was carried forward. For more details, see the instructions and the z-tree screenshots attached in Appendices A and B respectively. Payoffs were assigned as the average of that earned from playing the field. Payoffs 33 Newton (2012) is a detailed study of how coalitional behaviour can affect stochastically stable outcomes. In summary, it can matter a lot so removing it as a possibility was very important. 34 The choice of 200 periods was not taken lightly. While we needed the horizon to be large enough to justify making statements about the “long run”, if a session went on for too long, the possibility existed that subjects would lose focus and that their strategic behaviour would differ. It is the simplicity of our game that allows us to implement 200 periods within (an average time of less than) 2 hours. 35 Obviously, from Equations (1)-(4), each one of these pieces of information is sufficient for a subject to compute the other. Both were provided for the sake of clarity. 36 Importantly, subjects were not told how the actions taken were distributed across the two groups. This information was withheld to avoid the possibility that this information could be used as an external coordination device. It is also in accordance with a) how the game is defined, and b) how it would be played in a genuinely “large” population. 37 Together with the limited information feedback provided, the restricted time limit for each round placed a practical restriction on subjects’ behaviour, prodding it to be more in line with myopic best-response. 22 were scaled up so that for a given treatment, maximum Group A (B) payoffs were given by 100α (100β). Minimum payoffs were zero. While the groups were labelled as A and B, the actions a and b were labeled as ‘#’ and ‘&’ respectively, so as to reduce the possibility that group identity might increase anchoring on a particular action due to its label.38 Note that, as mentioned in Footnote 21, there is no clear-cut “safe” (risk-dominant) action. Take home cash was assigned as the sum of rewards from two randomly chosen rounds plus a HK$40 show-up fee. Average earnings were HK$135.7 (≈ US$17.4), with the range of payoffs given by the interval [HK$60, HK$190] (≈ [US$7.83, US$24.35]). 4 Results 4.1 Population Level We begin this section by drawing the reader’s attention to Figures 3, 4, and 5 below. Figure 3 refers to Treatment G1, Figure 4 to treatment G2, and Figure 5 to treatment G3. All three Figures contain four panels, labelled (a)-(d), with each panel referencing a different session of that treatment. A given panel conveys three pieces of information. On the horizontal axis is the period, ranging from 1 to 200, and on the vertical axis is the number of players in a particular subset using action a in a given period. The three subsets are Group A (represented by the light dashed line); Group B (the dark dashed line); and the total population (the solid line). The solid line is simply the “sum” of the other two. Our first hypothesis, Hypothesis 1, is trivial to check. In fact, any of the sessions 38 The theory of social identity was initially developed by Tajfel and Turner (1979) in the field of social psychology and is now gaining popularity in economics (Chen and Li (2009) is a recent experimental study). In our setting, group identity means that an individual finds a particular action more attractive than just its payoff consequence. Despite the fact that we labeled actions a and b as # and & respectively, it is immediate that each group has a particular action that it would like to see coordination on, so it is not possible to get rid of the group identity issue completely. Indeed, in our data, there are a few subjects who tend to choose their ideal action significantly more often than seems sensible. That is, they do so even though it consistently may not yield the highest payoff. 23 in any of the treatments confirm that the study of noisy dynamics is of value (at least when considering experiments). Specifically, in each session, more than one strict equilibrium is locked in on and then escaped from. Related to this, refer back to Figure 2. Note that in all sessions of treatment G1, for which Figure 2 depicts the state space, population behaviour began at an action profile very close to (a, b). The precise starting profiles were, (10, 0), (11, 1) (9, 1), and (8, 1), which corresponds to an average period 1 population profile in G1 of (9.5, 0.75). This is the state depicted by the green star, F, in Figure 2. Note that from this state, any noise-free best-response based dynamic must lock in on population profile (a, b) and stay there forever. A particularly striking example of how noisy population dynamics are of benefit can be seen from Game 3 Session 4. Here, population behaviour quickly (within 8 periods) locks in on equilibrium (a, a), then transitions to equilibrium (a, b) for a few period, only to move on to equilibrium (b, b) where it stays for 10 periods before jumping back up to equilibrium (a, a). Our first result can therefore be stated as: Result 1. The study of noisy population dynamics is undeniably a worthy endeavour. Our second hypothesis, Hypothesis 2, addressed the issue of whether or not stochastic stability based on, what the literature has deemed “reasonable” models of mistakes, can make accurate predictions in the lab.39 This can be answered using our control treatment, G1, for which all three noisy dynamics predicted (b, b).40 The plot of population behaviour for G1 is depicted in Figure 3. For the first three 39 We are ignoring what is by far and away the most popular solution concept used in the analysis of laboratory experiments - that of Quantal Response Equilibrium (QRE) due to McKelvey and Palfrey (1995). While QRE has had great success in explaining many experimental data, it is a fixed-point argument and our focus in this paper is on the process of convergence to equilibrium. With stochastic stability, it is still assumed that players are utility maximisers, but that the beliefs they form are very simple, and moreover it is not required that beliefs are ultimately correct. However, it should be noted that while QRE shot to fame for it’s predictive power for initial (period 1) responses, it has also been viewed as a model of learning. See Footnote 14 of Costa-Gomes, Crawford, and Iriberri (2013), and the references therein. 40 In some ways, treatment G1 is analogous to running an experiment with a homogeneous population playing the stag hunt and expecting to observe lock in on the risk dominant action since all three dynamics considered in this paper predict that outcome. As we have said before, the advantage to using the heterogeneous framework is that the dynamics can be parsed by choosing different parameters. 24 25 20 15 Total Group A 10 Group B 5 0 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105 109 113 117 121 125 129 133 137 141 145 149 153 157 161 165 169 173 177 181 185 189 193 197 (a) Game 1 Session 1 25 20 15 Total Group A 10 Group B 5 0 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105 109 113 117 121 125 129 133 137 141 145 149 153 157 161 165 169 173 177 181 185 189 193 197 (b) Game 1 Session 2 25 20 15 Total Group A 10 Group B 5 0 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105 109 113 117 121 125 129 133 137 141 145 149 153 157 161 165 169 173 177 181 185 189 193 197 (c) Game 1 Session 3 25 20 15 Total Group A 10 Group B 5 0 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105 109 113 117 121 125 129 133 137 141 145 149 153 157 161 165 169 173 177 181 185 189 193 197 (d) Game 1 Session 4 Figure 3: Trends for frequency of action a in Game 1. 25 25 20 15 Total Group A 10 Group B 5 0 1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191 (a) Game 2 Session 1 25 20 15 Total Group A 10 Group B 5 0 1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191 (b) Game 2 Session 2 25 20 15 Total Group A 10 Group B 5 0 1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191 (c) Game 2 Session 3 25 20 15 Total Group A 10 Group B 5 0 1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 (d) Game 2 Session 4 Figure 4: Trends for frequency of action a in Game 2. 26 171 181 191 25 20 15 Total Group A 10 Group B 5 0 1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191 (a) Game 3 Session 1 25 20 15 Total Group A 10 Group B 5 0 1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191 (b) Game 3 Session 2 25 20 15 Total Group A 10 Group B 5 0 1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191 (c) Game 3 Session 3 25 20 15 Total Group A 10 Group B 5 0 1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 (d) Game 3 Session 4 Figure 5: Trends for frequency of action a in Game 3. 27 171 181 191 sessions - panels (a), (b) and (c) - population behaviour starts in the basin of attraction of equilibrium (a, b), but quite quickly moves to equilibrium (b, b). Even a simple eyeballing of the trends in these panels shows that long run behaviour is described by uniform adoption of action b. However, the convergence in these sessions, in particular that depicted in panel (a) is not perfectly pure in that there are still some occasional deviations back to equilibrium (a, b). In the fourth session of treatment G1, panel (d) in Figure 3, population behaviour does not conform with any of the theoretical predictions. In fact, there is no clear lock in. While, as in the other sessions, population behaviour began localised around equilibrium (a, b), it then shifted to the theoretically predicted outcome (b, b), and then proceeded to bounce back and forth between the symmetric profiles (a, a) and (b, b).41 Thus, our first detailed result, in support of commonly-studied noisy dynamics / learning rules, can be stated as: Result 2. Myopic best-response based deterministic dynamics coupled with sensible models of mistakes have predictive power at the population level. In deciding how to judge stochastically stable outcomes, the analysis so far has been little more than a simple “eyeballing” of the trends in behaviour. Stochastically stable outcomes are often referred to as long run outcomes, but this is somewhat misleading as it creates the false impression that where population behaviour is trending is what matters. Really, due to the noise inserted into the process, population behaviour can never be trending to any equilibrium. Rather, stochastic stability measures the fraction of time spent at each equilibrium, and selects the equilibrium(a) at which the time average is non-negligible as the likelihood of mistakes becomes vanishingly small. With the above in mind, Table 2 below displays the empirical frequency with which the population chose a profile in the neighbourhood of (a, a), (a, b), and (b, b).42 The 41 Other treatments in which all dynamics made the same prediction were run for previous versions of the paper (albeit with shorter time horizons), and in each they corroborated the finding of treatment G1 in this version. The data is available upon request. 42 Recall that profiles (b, b), (a, b), and (a, a), are uniquely identified with states (0, 0), (N A , 0), and (N A , N B ) respectively, where N A and N B varied across treatment. We define the neighbourhood of profile (b, b) as the set {(0, 1), (0, 1), (1, 1), (0, 1)}, 28 data show very clear results. The frequency of a profile in the neighbourhood of (a, a) being chosen is 71.5 out of 200 in G2 and 106 out of 200 in G3. Mann-Whitney tests with session level data as independent observations reveal that these values are significantly higher (p < 0.01) than the frequency of a profile in the neighbourhood of (b, b). Furthermore, the frequency of a profile in the neighbourhood of (b, b) being chosen in G1 is 108 out of 200, which is significantly bigger (p < 0.01, Mann-Whitney test) than those of (a, a). Similarly, the frequency of a profile in the neighbourhood of (a, a) being chosen in G3 and (b, b) being chosen in G1 are significantly bigger (p < 0.01, Mann-Whitney test) than those of (a, b). However, for G2, we cannot reject the hypothesis that the frequency of a profile in the neighbourhood of (a, a) is the same as that of (a, b).43,44 This reinforces the conclusions drawn from the observable trends. Neighborhood of (a, a) Neighborhood of (a, b) Neighborhood of (b, b) Game1 10 17.25 108.25 Game2 71.5 84 0 Game3 106 39.75 2 Table 2: Empirical (average) frequency for profiles in the neighbourhood of equilibria We now move to parsing the dynamics. Recall that our conjecture was that a dynamic from the class of those contained in D3 - those based on logit mistakes - would A A A A that of (a, b) as (N − 1, 0), (N − 1, 1), (N , 1), (N , 0) A , and that of (a, a) as (N − 1, N B ), (N A − 1, N B − 1), (N A − 1, N B − 1), (N A , N B ) . Intuitively, this is just the equilibrium and those states that immediately surround it - where the action profile differs from the equilibrium in at most one component. This is the tightest definition of a neighbourhood, and so naturally the result is robust to expanding the definition of neighbourhood. 43 The results from our administered exit-survey suggest that the prevalence of (a, b) in Game 2 comes from the forming of particularly strong group identities, as discussed in Footnote 38, associated with this treatment. A few selected responses from Game 2 subjects to the dual-question: “Did you prefer taking a particular action (either # or &), even though doing so would likely give you a lower monetary payoff? If yes, please briefly explain.”, were as follows: “I preferred # and I was from group A.” “# for group A, & for group B.” “Yes. I prefer option # as I am a group A player.” “Yes, I try to press &, my rationale is to hope that all people in Group B can follow, and some people in Group A can follow, so that in the next round, I can get more money than the previous round.” 44 The high frequency of the profile (a, b) in Game 2 can be viewed as evidence for the directed errors of Naidu, Hwang, and Bowles (2010). However, our population level data reveals that the same pattern does not exist in the two other treatments. We investigate this issue more carefully in the next subsection on individual level analysis. 29 make the best prediction. However, by comparing the analysis above to the theoretical predictions given in Table 1, it is quite clear that a dynamic from this class is not the victor. In fact, it is the best-reply dynamic, D1, that gets it right more often than not. Our third result can then be stated as follows: Result 3. The best-reply dynamic with uniform mistakes, Dynamic D1, is the best predictor of population behaviour. That a noisy population dynamic based on uniform, payoff independent, mistakes makes the best prediction is the main finding of this subsection. It is particularly surprising that the best-reply dynamic with uniform mistakes might outperform the best-reply dynamic with the oft-invoked logit mistakes, and yet this is precisely what we observe. As stated before, it is important to confirm that behaviour at the individual level corroborates the population prediction made by “aggregating” it. Without this, we could not rule out the possibility that some unconsidered behavioural rule just happened to make the same prediction as that of the best-reply dynamic. This is the purpose of the next subsection. 4.2 Individual Level This subsection is devoted to analysing the individual level data. It is impossible to parse out the dynamics D1 and D2 based on the individual data because the behavioural rule (best-response) and mistakes (uniform) are the same in each. As a result, our focus here is to test if the individual mistake behaviour can be explained by logit mistakes as stated in Hypothesis 4. Before analysing how our subjects err, we need to determine precisely what behavioural rule they are following. In fact, note that without determining this it is impossible to define what constitutes a mistake. The size of the population coupled with the fact that only limited and very recent information was provided in each period, 30 was intended to induce subjects to behave as myopically as possible. However, there is always the possibility that some other behavioural rule was being adopted. As a first pass, consider Figure 6 below. The figure displays the percentage of actions taken that were myopic best-responses to the previous period’s population profile in periods 2 and onwards. The average mistake probabilities aggregate across all rounds are 7.61% on average with 10.76% in Game 1, 5.73% in Game 2, and 6.35% in Game 3. The significantly higher mistake probability in Game 1 clearly comes from the outlier (Session 4) of that treatment. More precisely, panel (a) plots the percentage of non-optimal actions (that is, action choices that were not myopic best-responses) taken in each round, aggregated over all four sessions of all three games. Panel (b) presents the percentage of mistakes made by individuals, sorted by group and by intervals of rounds 1-50, 51-100, 101-150, and 151200. The decreasing regularity with which subjects make mistakes as time progresses is clear. This is evidence that subjects are learning to control their behaviour over time. 12 30 10 25 20 8 Round 2-50 15 Total 6 Round 51-100 Group A Group B 10 Round 101-150 4 Round 151-200 5 2 0 Group A 0 1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 Group B Game1 191 (a) Time Trend Group A Group B Game2 Group A Group B Game3 (b) Group / Role Figure 6: Mistakes A few more precise remarks can be made regarding the information in Figure 6. First, as mentioned above, both panels (a) and (b) suggest that the frequency of mistakes is dependent upon time (period). A Spearman’s Rank order test was run to determine the relationship between the average mistake percentage for each round and the round number. The result shows that there is a strong negative monotonic relationship between the two variables (Spearman’s ρ = −0.71, p < 0.001 for all data; ρ = −0.53, 31 p < 0.001 for Group A; ρ = −0.75, p < 0.001 for Group B). Second, in G1, where (b, b) is the selected long-run outcome, the observed behaviour of Group B subjects is highly consistent with a myopic best-response heuristic, whereas in G2 and G3 where (a, a) is the observed long-run outcome, the observed behaviour of Group A subjects is highly consistent with a myopic best-response heuristic. This implies that the frequency of mistakes is asymmetric across group, i.e., it is group-dependent.45 The non-parametric Mann-Whitney test with session level aggregate data reveals that the frequency of mistakes for different groups are significantly different in G2 and G3 (p = 0.021 for G2, and p = 0.043 for G3) and insignificantly different in G1 (p = 0.149).46 Having concluded that our subjects behaved as myopic best-responders, we next move to classifying how they deviate from this rule. In particular, we want to determine whether or not the likelihood of a mistake, defined as a non myopic best-response, is dependent on the payoff consequences. Evidence against this would lead us to reject Hypothesis 4. We conduct individual level probit regression, for periods t = 2, . . . , 200, with a dependent variable Mit and four regressors UitBR , UitN BR , Gi , and t, where Mit denotes the action chosen in period t by individual i, taking the value 1 for the best-response action to period t − 1 population behaviour, and 0 otherwise; Gi is a dummy variable that takes the value 0 if individual i is in Group A, and 1 otherwise; UitBR gives the payoff earned from choosing the myopic best-response action and UitN BR gives the payoff earned from choosing the non-myopic best-response action. We write εit for the idiosyncratic error. The coefficients of interest - those on the four regressors above - are given by β1 , β2 , β3 , and β4 respectively. 45 The group-dependent mistakes we found are somewhat in line with the directed errors of Naidu, Hwang, and Bowles (2010). However, these are far from perfectly aligned as, in our setting, directed errors demand that players never make mistakes in the wrong direction. That is, Group A players must never choose b when it is suboptimal to do so, and Group B players never choose a when it is the inferior action. And yet we observe situations in which this is violated. Furthermore, it is the case that the directed errors would always predict equilibrium (a, b) as the long run outcome - something we never observe. 46 The insignificant difference in G1 results from the session 4 in which both groups make mistakes significantly more often compared to other sessions. With this ‘unusual’ session is removed, significance is restored. 32 The signs of coefficients β1 and β2 have a straightforward interpretation: if the deviations from the myopic best-response heuristic are dependent on the payoff consequences, then it should be the case that the sign of β1 is positive whereas the sign of β2 is negative. Coefficients β3 and β4 are also uncomplicated. If subjects in Group A (Group B) make mistakes more often than subjects in Group B (Group A), then β3 should be positive (negative). Similarly, if subjects make mistakes less often over time, then β4 should be positive.47 Game 1 Game 2 Game 3 Session Session Session Session Session Session Session Session Session Session Session Session 1 2 3 4 1 2 3 4 1 2 3 4 (1) U (BR-action) Coef. p-value .1852 < .001 .1430 .066 .2118 < .001 .0693 < .001 .0037 < .001 .0085 < .001 .0060 < .001 .0076 .002 .0070 < .001 .0036 < .001 .0034 < .001 .0034 < .001 (2) U (Non-BR-action) Coef. p-value .1033 < .001 .0436 .427 .0874 .001 −.0067 .194 .0010 .002 .0030 < .001 .0015 < .001 .0029 < .001 .0019 < .001 .0010 0.024 −.0004 0.322 −.0002 0.643 (3) Group Coef. p-value .5396 < .001 1.2115 < .001 .9586 < .001 .3034 < .001 −.0100 .960 −1.0095 < .001 −.3721 .012 −.8955 .050 −1.7550 < .001 −.9496 < .001 −.7255 .004 −.5287 .006 (4) Period Coef. p-value .0063 < .001 .0014 .028 .0077 < .001 −.0003 .523 .0070 < .001 .0054 < .001 .0080 < .001 .0118 < .001 .0107 < .001 .0023 < .001 .0050 < .001 .0035 < .001 Table 3: Probit Regression Table 3 presents the results of our regression.48 Column (1) shows that the sign of β1 is positive for all sessions of all games, and significant at the 1% level for the majority of sessions. Column (2) shows that the sign of β2 varies across games and sessions but is more likely to be either significantly positive or insignificantly negative. Overall, there is no clear pattern for the two coefficients in terms of the sign and the significance level. Most importantly regarding parameters β1 and β2 , there is no single session of any game where both β1 > 0 and β2 < 0 are simultaneously satisfied, as 47 The magnitudes of the coefficients do not capture marginal effects because the regression is conducted based on a non-linear model. 48 The results obtained from running a Logit regression are qualitatively the same and thus not reported here. 33 is required for mistake behaviour to be consistent with the logit model. Compatible with the results from the non-parametric tests, column (3) shows that the mistakes are group-dependent in a systematic way, i.e., Group B subjects make mistakes more often than subjects in Group A in G2 and G3 whereas Group A make mistakes more often than subjects in Group B in G1. Subjects make mistakes less often as period goes except for session 4 of G1. Overall, the result from Games 1-3 are not consistent with the logit mistakes with myopic best-response. Our final result that addresses the issues posed in Hypotheses 4 can then be summarised as follows: Result 4. At the individual level, the probability of a mistake is decreasing in the payoff from the best-response action (see column 1 in Table 3), and is independent of the payoff from the non-best response action (see column 2 in Table 3), and has a time component whereby players seem to ‘learn’ to control making mistakes as time progresses. 5 Conclusion This paper describes an experiment whose goal is to determine what noisy dynamic best predicts long run behaviour in a large population coordination problem. We use the Language Game of Neary (2012) in which different noisy dynamics can select different equilibria. We have two promising findings. The first is really more of an observation. We highlight that the prevalence of Bergin and Lipman’s result (Bergin and Lipman, 1996), that the model of noise is instrumental in affecting equilibrium selection, has blinded researchers to another important feature of noisy dynamics: it is extremely important to consider the effect that the revision protocol can have on equilibrium selection. In particular, given the mountain of experimental data showing that people make mistakes in systematic, behaviourally “reasonable”, and, perhaps most importantly, quantifiable ways, deciding what deterministic component of a dynamic best approximates the frequency with which players respond should probably be afforded more import than it currently is. 34 Our second finding shows, perhaps surprisingly, that the best-reply dynamic with uniform mistakes is the best predictor of long run population behaviour. That is, a noisy dynamic in which all players best-respond imperfectly each and every period and the imperfections are both state and time independent, generates the best prediction. The most startling part of this finding is the fact that uniform mistakes best-resemble how our subjects deviate from the conjectured best-response heuristic. Most importantly, this finding is corroborated by regression analysis of individual level behaviour. Potential extensions abound. One immediate test would be to conduct comparative statics on exactly when long run population behaviour flips and whether or not this flip in outcome accords with theoretical predictions.49 An interesting paper along these lines for the homogeneous case is Weber (2006), which shows that the experimentally observed population size cut off from the Van Huyck, Battalio, and Beil (1990) experiments of the minimum effort game is avoided when the population size is increased incrementally. Equilibrium selection on networks is another interesting avenue that could be pursued. Charness, Feri, Melendez-Jimenez, and Sutter (2014) is an experimental investigation of the theoretical predictions of Galeotti, Goyal, Jackson, Vega-Redondo, and Yariv (2010). 49 See Section 7 of Neary (2012) that conducts (theoretical) comparative statics for the case of the best-reply dynamic with uniform mistakes. 35 APPENDIX A Experimental Instructions (Game 1) INSTRUCTIONS Welcome to the study. In the following hour, you will participate in 200 rounds of decision making. Please read these instructions carefully; the cash payment you will receive at the end of the study depends on how well you perform so it is important that you understand the instructions. If you have a question at any point, please raise your hand and wait for one of us to come over. We ask that you turn off your mobile phone and any other electronic devices. Communication of any kind with other participants is not allowed. Your Role There is a total of 20 participants in the study. These 20 individuals are randomly assigned into two different Groups: Group A and Group B, with 11 individuals assigned to Group A and 9 individuals assigned to Group B. These group assignments are fixed throughout the study. In each round, you play a game with the rest of the participants - both those in the same group as you and those in the other group. Each player will be asked to take a decision that will affect the earnings of every other player including themselves. At the end of the round, a summary of what happened in that round, along with your earnings for that round, will be displayed on the computer monitor. Your Decision in Each Round You will play a 2-player game with each of the 19 other participants. You must choose one of two actions, labeled ‘#’ and ‘&’. This action will be used in every 2-player game that you play. Thus, you are using the same action with each other participant. Your total earnings in a given round will be the average of the earnings you received in each 2-player game. The tables on the next page show how earnings are determined with each cell corresponding to the choices of actions by you and your opponent in a particular 2-player game. The first number in a given cell represents your earning in a 2-player game, and the second number represents your opponent’s earning. Since there are two Groups, there are two cases. 1. When you are in Group A, earnings are as follows: In words this says, (a) If you and your opponent both choose action ‘#’, you get 57. If your opponent is in Group A, he/she gets 57; if your opponent is in Group B, he/she will get 33. (b) If you and your opponent both choose action ‘&’, you get 43. If your opponent is in Group A, he/she gets 43; if your opponent is in Group B, he/she will get 67. 36 opponent in Group A # & 0, 0 You # 57, 57 & 0, 0 43, 43 opponent in Group B # & 0, 0 You # 57, 33 & 0, 0 43, 67 Figure 7: When you are in Group A opponent in Group B # & 0, 0 You # 33, 33 & 0, 0 67, 67 opponent in Group A # & 0, 0 You # 33, 57 & 0, 0 67, 43 Figure 8: When you are in Group B (c) If you and your opponent choose different actions, you each get 0. 2. When you are in Group B, earnings are as follows: In words this says, (a) If you and your opponent both choose action ‘#’, you get 33. If your opponent is in Group B, he/she gets 33; if your opponent is in Group A, he/she will get 57. (b) If you and your opponent both choose action ‘&’, you get 67. If your opponent is in Group B, he/she gets 67; if your opponent is in Group A, he/she will get 43. (c) If you and your opponent choose different actions, you each get 0. This is a quick reminder for how you read entries in the tables that you can refer back to throughout the study: Your earning, Your opponent’s earning The following shows how to calculate your average earning in each round: 1. When you are in Group A. x (a) If you pick action ‘#’, your payoff is 57 × 19 , where ‘x’ is the number of other players who chose action ‘#’. y , where ‘y’ is the number of other players who (b) If you pick action ‘&’, your payoff is 43 × 19 chose action ‘&’. 2. When you are in Group B. x (a) If you pick action ‘#’, your payoff is 33 × 19 , where ‘x’ is the number of other players who chose action ‘#’. y (b) If you pick action ‘&’, your payoff is 67 × 19 , where ‘y’ is the number of other players who chose action ‘&’. 37 where x + y = 19. Rundown of the Study 1. At the beginning of the first round, you will be assigned to a group, and you will be shown the two tables specifying earnings that are relevant to your group. Below the tables, you will be prompted to enter your choice of action. You must choose either ‘#’ or ‘&’ within 30 seconds. If you do not choose an action, one will be randomly assigned to you. 2. The first round is over after everybody has chosen an action. The screen will then show you a summary for the first round: (a) how many players chose each action, (b) your choice of action, and (c) your (average) earning in the round, and (d) a table displaying your (average) earnings in all previous rounds. 3. Below the information feedback, you will be prompted to enter your choice of action for the second round. The game does not change, so as before you must choose either ‘#’ or ‘&’. All future rounds are identical to except for one important difference. The difference concerns how much time you have to choose an action. In rounds 2 − 10, you have 15 seconds to make a decision. If you do not make a decision within the 15 second window, then you will be assigned whatever action you used in the previous round. For rounds 11 − 200, you have only 10 seconds in which to make a decision. Again, if you fail to choose an action in this timeframe, you will be assigned the same action as in the previous round. Your Cash Payment We will randomly select 2 rounds out of the 200 to calculate your cash payment, so it is in your best interest to take each round seriously. Each round has equal chance to be selected. The sum of the points you earned in the 2 selected rounds will be converted into cash at an exchange rate of HK$1 per point. Your total cash payment at the end of the study will be this cash amount plus a HK$40 show-up fee. Precisely, Your total cash payment = HK$ (The sum of the points in the 2 selected rounds) + HK$ 40 Adminstration Your decisions as well as your cash payment will be kept completely confidential. Remember that you have to make your decisions entirely on your own; do not discuss your decisions with any other participants. Upon completion of the study, you will receive your cash payment. You will be asked to sign your name to acknowledge your receipt of the payment. You are then free to leave. If you have any questions, please raise your hand now. We will answer questions individually. If there are no questions, we will begin with the study. 38 B Screen Shots of Z-tree Figure 9: Member A’s Decision Screen Figure 10: Member B’s Decision Screen 39 References ´ s-Ferrer, C., and N. Netzer (2010): “The logit-response dynamics,” Games and Economic Alo Behavior, 68(2), 413 – 427. Arthur, W. B. (1989): “Competing Technologies, Increasing Returns, and Lock-In by Historical Events,” Economic Journal, 99(394), 116–131. Bergin, J., and B. L. Lipman (1996): “Evolution with State-Dependent Mutations,” Econometrica, 64(4), 943–956. Binmore, K., and L. Samuelson (1997): “Muddling Through: Noisy Equilibrium Selection,” Journal of Economic Theory, 74(2), 235 – 265. Blume, L. E. (1993): “The Statistical Mechanics of Strategic Interaction,” Games and Economic Behavior, 5(3), 387 – 424. (2003): “How noise matters,” Games and Economic Behavior, 44(2), 251 – 271. Boncinelli, L., and P. Pin (2012): “Stochastic stability in best shot network games,” Games and Economic Behavior, 75(2), 538 – 554. ´, Y., and R. Kranton (2007): “Public goods in networks,” Journal of Economic Bramoulle Theory, 135(1), 478–494. Carlsson, H., and E. v. Damme (1993): “Global Games and Equilibrium Selection,” Econometrica, 61(5), 989–1018. Charness, G., F. Feri, M. A. Melendez-Jimenez, and M. Sutter (2014): “Experimental Games on Networks: Underpinnings of Behavior and Equilibrium Selection,” Econometrica, 82(5), 1615–1670. Chen, Y., and S. X. Li (2009): “Group Identity and Social Preferences,” American Economic Review, 99(1), 431–57. Cheung, Y.-W., and D. Friedman (1997): “Individual Learning in Normal Form Games: Some Laboratory Results,” Games and Economic Behavior, 19(1), 46 – 76. Costa-Gomes, M. A., V. P. Crawford, and N. Iriberri (2013): “Structural Models of Nonequilibrium Strategic Thinking: Theory, Evidence, and Applications,” Journal of Economic Literature, 51. Crawford, V. P. (1991): “An “evolutionary” interpretation of Van Huyck, Battalio, and Beil’s experimental results on coordination,” Games and Economic Behavior, 3(1), 25–59. (1995): “Adaptive Dynamics in Coordination Games,” Econometrica, 63(1), 103–143. 40 Ellison, G. (1993): “Learning, Local Interaction, and Coordination,” Econometrica, 61(5), 1047– 1071. (2000): “Basins of Attraction, Long-Run Stochastic Stability, and the Speed of Step-by-Step Evolution,” Review of Economic Studies, 67(1), 17–45. Farrell, J., and G. Saloner (1985): “Standardization, Compatibility, and Innovation,” RAND Journal of Economics, 16(1), 70–83. Fischbacher, U. (2007): “z-Tree: Zurich toolbox for ready-made economic experiments,” Experimental Economics, 10(2), 171–178–178. Foster, D., and P. Young (1990): “Stochastic evolutionary game dynamics,” Theoretical Population biology, 38, 219–232. Foster, D. P., and H. P. Young (2006): “Regret testing: learning to play Nash equilibrium without knowing you have an opponent,” Theoretical Economics, 1(3), 341–367. Freidlin, M. I., and A. D. Wentzell (1998): Random Perturbations of Dynamical Systems (Grundlehren der mathematischen Wissenschaften). New York: Springer Verlag. Friedman, D., and R. Oprea (2012): “A Continuous Dilemma,” American Economic Review, 102(1), 337–63. Fudenberg, D., and D. K. Levine (1998): The Theory of Learning in Games (Economic Learning and Social Evolution). The MIT Press, Cambridge MA. Galeotti, A., S. Goyal, M. O. Jackson, F. Vega-Redondo, and L. Yariv (2010): “Network Games,” The Review of Economic Studies, 77(1), 218–244. Harsanyi, J. C., and R. Selten (1988): A General Theory of Equilibrium Selection in Games. MIT Press. Hart, S., and A. Mas-Colell (2003): “Uncoupled Dynamics Do Not Lead to Nash Equilibrium,” American Economic Review, 93(5), 1830–1836. Hwang, S.-H., S. Naidu, and S. Bowles (2013): “Social Conflict and the Evolution of Unequal Conventions,” Discussion paper, Columbia University. Jackson, M. O., and A. Watts (2002): “On the formation of interaction networks in social coordination games,” Games and Economic Behavior, 41(2), 265–291. Kandori, M., G. J. Mailath, and R. Rob (1993): “Learning, Mutation, and Long Run Equilibria in Games,” Econometrica, 61(1), 29–56. Kandori, M., and R. Rob (1995): “Evolution of Equilibria in the Long Run: A General Theory and Applications,” Journal of Economic Theory, 65(2), 383–414. 41 Katz, M. L., and C. Shapiro (1985): “Network Externalities, Competition, and Compatibility,” American Economic Review, 75(3), 424–440. Kohlberg, E., and J.-F. Mertens (1986): “On the Strategic Stability of Equilibria,” Econometrica, 54(5), 1003–1037. Lewis, D. K. (1969): Convention: a Philosophical Study. Cambridge, Mass: Harvard University Press. Luce, R. (1959): Individual Choice Behavior. New York: Wesley. Maes, M., and H. H. Nax (2014): “A Behavioral Study of ‘Noise’ in Coordination Games,” Available at SSRN: http://ssrn.com/abstract=2521119 or http://dx.doi.org/10.2139/ssrn.2521119. Maruta, T. (2002): “Binary Games with State Dependent Stochastic Choice,” Journal of Economic Theory, 103(2), 351 – 376. McKelvey, R. D., and T. R. Palfrey (1995): “Quantal Response Equilibria for Normal Form Games,” Games and Economic Behavior, 10(1), 6–38. Mertens, J.-F. (1989): “Stable Equilibria: A Reformulation Part I. Definition and Basic Properties,” Mathematics of Operations Research, 14(4), 575–625. (1991): “Stable Equilibria: A Reformulation. Part II. Discussion of the Definition, and Further Results,” Mathematics of Operations Research, 16(4), 694–753. Morris, S., and H. S. Shin (2003): “Global Games: Theory and Applications,” in in “Advances in Economics and Econometrics, the Eighth World Congress”, Dewatripont, Hansen and Turnovsky, Eds. Myerson, R. B. (1978): “Refinements of the Nash equilibrium concept,” International Journal of Game Theory, 7(2), 73–80. Naidu, S., S.-H. Hwang, and S. Bowles (2010): “Evolutionary bargaining with intentional idiosyncratic play,” Economics Letters, 109(1), 31 – 33. Neary, P. R. (2012): “Competing conventions,” Games and Economic Behavior, 76(1), 301 – 328. Neary, P. R. (2013): “Supplementing Stochastic Stability,” Discussion paper, University of London, Royal Holloway. Newton, J. (2012): “Coalitional stochastic stability,” Games and Economic Behavior, 75(2), 842–854. Noldeke, G., and L. Samuelson (1993): “An Evolutionary Analysis of Backward and Forward Induction,” Games and Economic Behavior, 5(3), 425–454. Pak, M. (2008): “Stochastic stability and time-dependent mutations,” Games and Economic Behavior, 64(2), 650 – 665, Special Issue in Honor of Michael B. Maschler. 42 Peski, M. (2010): “Generalized risk-dominance and asymmetric dynamics,” Journal of Economic Theory, 145(1), 216–248. Robles, J. (1998): “Evolution with Changing Mutation Rates,” Journal of Economic Theory, 79(2), 207 – 223. Samuelson, L. (1994): “Stochastic Stability in Games with Alternative Best Replies,” Journal of Economic Theory, 64(1), 35–65. Samuelson, L., and J. Zhang (1992): “Evolutionary stability in asymmetric games,” Journal of Economic Theory, 57(2), 363–391. Schelling, T. C. (1960): The Strategy of Conflict. Harvard University Press. Shapley, L. S., and D. Monderer (1996): “Potential Games,” Games and Economic Behavior, 14, 124–143. Tajfel, H., and J. Turner (1979): Brooks/Cole. An integrative theory of intergroup conflict.pp. 33–47. Van Damme, E., and J. W. Weibull (2002): “Evolution in Games with Endogenous Mistake Probabilities,” Journal of Economic Theory, 106(2), 296 – 315. Van Huyck, J. B., R. C. Battalio, and R. O. Beil (1990): “Tacit Coordination Games, Strategic Uncertainty, and Coordination Failure,” American Economic Review, 80(1), 234–48. Voorneveld, M. (2000): “Best-response potential games,” Economics Letters, 66(3), 289 – 295. Weber, R. A. (2006): “Managing Growth to Achieve Efficient Coordination in Large Groups,” The American Economic Review, 96(1), pp. 114–126. Yi, K.-O. (2009): “Payoff-dependent mistakes and q-resistant equilibrium,” Economics Letters, 102(2), 99 – 101. (2011): “Equilibrium Selection with Payoff-Dependent Mistakes,” Discussion Paper 1115, Research Institute for Market Economy, Sogang University. Young, H. P. (1993): “The Evolution of Conventions,” Econometrica, 61(1), 57–84. (1996): “The Economics of Convention,” The Journal of Economic Perspectives, 10(2), 105– 122. (2001): Individual Strategy and Social Structure: An Evolutionary Theory of Institutions. Princeton, NJ: Princeton University Press. (2005): Strategic Learning and Its Limits (Arne Ryde Memorial Lectures Series). Oxford University Press, USA. 43
© Copyright 2024