Download Report

An Experimental Investigation of Stochastic
Stability∗
Wooyoung Lim†
Philip R. Neary‡
March 30, 2015
Abstract
This paper discusses an experiment designed to test which, if any, model of errorprone best-responses most accurately predicts long run behaviour in large populations. In other words, what is the correct noisy population dynamic to invoke
when using the equilibrium selection technique of stochastic stability? The game
the subjects play is the simplest possible setting in which different deterministic
dynamics coupled with different noise components can select different long run
outcomes. We find that the best-reply dynamic with uniform errors, where all
players myopically best-respond each and every period with probability close to
1 (the deterministic component) and make mistakes independently of the payoff
penalty (the noise component), yields the most accurate prediction. We also find
a time trend to mistakes, with the magnitude tapering off as time progresses.
This is in contrast to much of the literature that assumes a variety of other specifications of revision opportunities, and time-independent and payoff-dependent,
“logit”, mistakes.
Keywords:
Stochastic Stability; Equilibrium Selection; Experiment;
Evolutionary Game Theory.
JEL Classification:
C72, C73, C92.
∗
Special thanks to Vince Crawford, Xun Lu, and Jonathan Newton for detailed comments, and to
Yong Kil Ahn for his excellent research assistance. We would also like to thank Jesper Bagger, Chris
Gee, Sotiris Georganas, Jacob Goeree, Sung-Ha Hwang, Heinrich Nax, Peter Neary, Santiago Oliveros,
Juan Pablo Rud, Hamid Sabourian, Ija Trapeznikova, and seminar participants at the North-American
ESA Conference, NUI Maynooth, Royal Holloway, and Universit¨at St. Gallen.
†
Email : [email protected]; Web: http://ihome.ust.hk/∼wooyoung/index.html/. Address: Department of Economics, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon,
Hong Kong.
‡
Email : [email protected]; Web: https://sites.google.com/site/prneary/. Address: Department of Economics, Royal Holloway, University of London, Egham, Surrey, TW20 0EX.
1
1
Introduction
The issue of equilibrium selection has long provided headaches for economic theorists.
In economies with multiple equilibria, modelled as games with multiple equilibria, exactly what should we expect to observe?1 The original, so-called “deductive”, selection
techniques, i.e., that some equilibria are more focal than others (Schelling, 1960), or
that some equilibria are safer than others (Harsanyi and Selten, 1988), proved unsatisfying as they were silent on how equilibrium beliefs would come to being. The most
commonly applied “inductive” selection technique is stochastic stability (Foster and
Young, 1990), that makes a unique prediction in many simple coordination problems.2
Most famously, it leads to uniform adoption of the risk dominant action when a large
population are repeatedly matched to play a symmetric 2×2 game of pure coordination
(Foster and Young, 1990; Kandori, Mailath, and Rob, 1993; Young, 1993).
There are three fundamental components to applying stochastic stability. First,
there is some procedure, possibly deterministic, possibly random, that specifies the
players to be afforded a revision opportunity in a given period. Second, it is assumed
that individual players follow a simple updating rule, a heuristic, whenever they are
afforded a revision opportunity. Finally, it is assumed that players occasionally deviate
from the heuristic, by choosing an action that is not prescribed by it (this is interpreted
as mutations in theoretical biology, and as mistakes/experimentation in economics).3
The first two components coupled together define a population dynamic (an “adjustment process”); appending the third yields a noisy population dynamic (a “stochastic
adjustment process”).4
1
To be more precise, really we are considering games with multiple strict equilibria. Strict equilibria
are those in which any unilateral deviation incurs a strict loss in utility. Strict equilibria are always
strategically stable in the sense of Kohlberg and Mertens (1986) and Mertens (1989, 1991), and hence
“un-refineable”.
2
In recent years the global games approach of Carlsson and Damme (1993) has gained huge popularity, in part due to its predictive power in many standard macro models where the alignment of
expectations plays a prominent role. Morris and Shin (2003) is a detailed survey.
3
There is a fourth assumption: that time is infinite. While this is clearly not implementable in a
laboratory setting, the issue of how much time is needed to simulate an infinite horizon is important.
This is addressed in Section 3.
4
For a given noisy population dynamic, we will occasionally abuse terminology and refer to the
2
While the above is very general, in practice only a few such noisy dynamics are
considered, with choices for each component made from relatively short menus. Regarding procedure, typically one assumes either that each player independently revises
his action with equal probability, or that one randomly drawn player does so with certainty. Upon being afforded a revision opportunity, the most common updating rule is
to take a myopic best-response to the previous period’s population profile.5 Vis-`a-vis
the introduction of noise, almost all attention has focused on one of two mistake models: the uniform mistakes of Kandori, Mailath, and Rob (1993) and Young (1993), and
the payoff-dependent, “logit”, mistakes introduced in Blume (1993). But if stochastic
stability is to be taken seriously as a selection criterion, it is important to know what
combination from the above menus yields the best proxy for aggregating individual
behaviour. Unfortunately, for many simple games, noisy dynamics based on choices
from each of these menus yield the same prediction, while for more complex games,
computing stochastically stable equilibria is a non-trivial task, and so authors typically
just assume one noisy dynamic and work only with it.
In this paper, we describe a laboratory experiment designed to test which, if any,
noisy best-response based dynamic is the most accurate predictor of long run behaviour.
The game that the subjects play is the Language Game of Neary (2012), that was introduced in the hope of discerning how different group properties might affect long
run outcomes, termed “conventions” (Lewis, 1969; Young, 1993, 1996), in large population settings with differing preferences. The Language Game describes an environment
where actions are strategic complements, but preferences are not the conventional homogenous ones. It thereby provides a framework for studying the emergence of coordinated outcomes with network effects (e.g. technologies, standards, languages, etc.),
when individuals differ in what they view as most desirable.6
first two components jointly as “deterministic”, and the third as the “noise” component.
5
There are many others. Fudenberg and Levine (1998) and Young (2005) are textbook treatments.
6
Most of the literature has focused on the (homogeneous) case where everybody wants the same
coordinated outcome - the issue of interest then being whether or not the population will ultimately
be successful in coordinating on this outcome. There do exist models with heterogeneous preferences the most simple one being a large population “Battle of the Sexes”. (See for example the asymmetric
3
There are always at least two Pareto efficient equilibria in the Language Game, so
the tradeoff is not the standard one of risk dominance versus efficiency.7 Regarding
equilibrium selection with uniform mistakes, it is not the case that one equilibrium (the
“safer”, risk dominant one in homogeneous agent models) is selected due to its larger
basin of attraction. In fact, the size of each equilibrium’s basin of attraction often has
little to do with selection in the Language Game.8 Regarding equilibrium selection with
logit mistakes, a group’s very strong preferences may tilt things in favour of its preferred
outcome, even if its numbers are far fewer. The Language Game seems ideal for our
experiment: it is simple enough to compute stochastically stable outcomes for each of
the standard noisy dynamics, yet also sufficiently involved that different noisy dynamics
can make different predictions for a wide range of parameters. Furthermore, unlike
many experimental tests of theoretical models, the Language Game is both extremely
accessible for subjects and easily codeable, and as such, we are not compelled to use a
“watered down” version of it.
We allow twelve populations of twenty subjects to play 200 rounds of the Language
Game, each with one of three sets of parameters (that are chosen carefully so that
the dynamics can be separated). Our interest is in (i) long run outcomes, and (ii)
behaviour at the individual level. Comparing long run outcomes to those predicted
by the different noisy dynamics is straightforward - simply record the time average of
the various equilibria, and contrast this with the theoretical prediction of each.9 We
find that observed play matches the prediction of the best-reply dynamic with uniform
contests of Samuelson and Zhang (1992).) The limitation to this framework is that players only interact
with those from the other group - women with men, and men with women - so the only bolstering from
those with similar preferences comes via how they affect those in the other group. Furthermore, this
framework does not seem particularly realistic for examples like technologies, standards, and languages,
as one only interacts with those with those who have different preferences.
7
For a wide range of parameters, there is also a third pure strategy equilibrium. This equilibrium
is somewhat unusual in that it involves a failure to coordinate across group. If, for example, actions
are interpreted as operating systems, then such an outcome allows for the coexistence of more than
one operating system in equilibrium - a seemingly natural feature missing from existing models.
8
Related to this is that the sufficient conditions of the various “radius-coradius” results in Ellison
(2000) are frequently not satisfied. See Section 5.2 of Neary (2012).
9
Technically, stochastic stability can only be invoked once the population dynamics are, in a sense,
“constant”. Given that our subjects were more mistake prone in earlier rounds, there is an argument
that we should limit the analysis to behaviour after some ‘cutoff’ period. More on this in Section 4.2.
4
mistakes in majority of sessions. The other benchmark dynamics, best-response based
dynamics with logit mistakes and best-response based dynamics with uniform mistakes
but different revision protocols, are outperformed in all treatments where the forecasts
disagree.10
The fact that a noisy dynamic based on agents making payoff-dependent mistakes
does not make the best prediction is definitely a surprise. There is by now a wealth of
evidence detailing that individual mistakes are highly systematic, and in particular that
the payoff penalty associated with a mistaken action choice is a strong determinant of
its likelihood. In experimental economics especially, parameterising payoff-dependent
mistakes has gained a strong foothold due to the popularity of the concept of Quantal
Response Equilibria introduced in McKelvey and Palfrey (1995).11 Ascertaining precisely how players err is important. It is well known since Bergin and Lipman (1996),
that if the procedure and learning rule of a dynamic are fixed (component #s 1 and 2 of
stochastic stability from the second paragraph), then any equilibrium can be selected
for an appropriately defined model of mistakes. As such, they argue that determining
the “nature of the mistake process must be analysed more carefully to derive some
economically justifiable restrictions ... It is an open question whether and what kinds
of interesting restrictions will emerge.” While our analysis shows that players err in a
payoff independent manner, we do find a time component in how the mistakes occur.
Stochastic stability with time-dependent mistakes are considered in both Pak (2008)
10
Each of these ‘different’ revision protocols are homogeneous, in the sense that in any given period,
the likelihood of each agent being afforded a revision opportunity is equally likely. However, heterogeneity in learning is another interesting avenue of research. Neary (2012) shows that group dynamism,
interpreted as a fixed number of discontent agents from each group best-responding each period, has
a strong bearing on selected outcomes in the Language Game. While this is interesting, it is arguably
somewhat artificial, and seems difficult to implement in the laboratory. However, if group-dynamism
is interpreted as “speed of learning”, then related to this is the well-known paper of Cheung and Friedman (1997). They compared the performance of two deterministic dynamics: myopic best-reply to the
current population profile and ‘fictitious play’ (myopic best-reply to a longer memory), and found that
the longer memory process performed better.
11
Quantal response functions were first introduced in a model of individual choice by Luce (1959),
but modelling mistakes as appearing in a priori ‘reasonable’ ways has other solid theoretical foundations. Myerson (1978) introduced the proper equilibrium, a refinement where, within the set of
non best-responses, better performing strategies receive a probability of higher order than worse ones.
Maruta (2002), Yi (2009) and Yi (2011) are evolutionary models with payoff-dependent mistakes.
5
and Robles (1998), and could also be related to a theory like that of Van Damme and
Weibull (2002), where players control the probability of implementing the intended
strategy by expending effort, but doing so becomes easier with time.
That the noisy dynamic fitting our data best has a revision protocol where all players
react each and every period may not seem surprising, since, in laboratory experiments
with discrete time periods, it is standard for the experimental designer to allow this.12
However, theoretically at least, when coupled with uniform mistakes this deterministic
dynamic can make very different predictions to ones extremely close to it. For example,
consider a noisy dynamic where all players react every period with a probability that is
˙ and mistakes are uniform. Such a
arbitrarily close to, but strictly less than 1, say ∼0.9,
dynamic can generate very different predictions and hence would not fit our data. This
addresses another issue of evolutionary dynamics that is often overlooked. That is, the
Language Game shows that if the learning rule and the model of mistakes are held fixed
(component #s 2 and 3 from the second paragraph), then the revision protocol can also
matter hugely for selection.13
While the best-reply dynamic with uniform mistakes is the best predictor of population behaviour, we also inspect the data at the individual level in case the accurate
prediction of this noisy dynamic is pure fluke, with the population outcome in fact driven
by some other underlying process whose prediction just happens to coincide. This is
particularly true given that although 200 rounds of play is definitely at the longer end
of most experimental studies, it is still nowhere near the infinite number required for
theoretical predictions to be felt with certainly. With this in mind, with our individual
level data analysis, we are effectively trying to answer the following two-part question:
“are players in this experiment behaving as imperfect myopic best-responders, and, if
yes, when mistakes are made, are these mistakes payoff-dependent?” The answer to the
first part is “yes”, but, contrary to what was anticipated, even at the individual level
12
Some recent experiments on continuous time games (see for example Friedman and Oprea (2012))
are interesting exceptions to this.
13
The revision protocol may not always matter however - for example when the game is a potential
game (Shapley and Monderer, 1996) and mistakes are logit. See Al´os-Ferrer and Netzer (2010).
6
we find that individual mistakes are not best-approximated as logit. We find that mistakes occur as uniform, but with a time component where the mistake frequency decays
over time. In other words, while we find evidence for social learning in the sense that
subjects form expectations about population behaviour, that are themselves arrived at
endogenously based on what has happened in the past, we also find that our subjects
learn to control their own behaviour better as time progresses. This could be because
the subjects attain a better understanding of the game, or perhaps because they are
more confident that others will choose “correctly”, or even that they are learning from
their past mistakes.
The classic paper on experimental tests of equilibrium selection in coordination
games is Van Huyck, Battalio, and Beil (1990). These authors looked at long run outcomes in the “minimum effort game”, and found that the inferior, risk-dominant equilibrium consistently emerged once population size exceeded a threshold. As pointed out
by Crawford (1991), the minimum effort game is analogous to a multi-action stag hunt,
and thus, their findings complement the theoretical predictions of the papers listed in
the opening paragraph.14,15 The problem, as we see it, with these findings, is that in
a homogeneous environment, supposedly only one property matters for equilibrium selection: risk-dominance. Our research program, of which this paper is one part, is to
ascertain what noisy dynamic best-approximates large population behaviour, and ultimately, using this, to understand what group properties matter for long run outcomes
in heterogeneous environments.16
Two other papers are important to mention. The first is a recent paper, Maes and
14
Each of the noisy dynamics considered in this paper select the risk dominant equilibrium in the
minimum effort game, and as such, it cannot be used to parse them.
15
It is standard when studying stag hunts in large populations to normalise payoffs along the
diagonal and set off-diagonal payoffs to zero. Using this and the argument of Crawford (1991), the
Language Game is then strategically equivalent to a large population game with two homogeneous
groups, where the minimal action for one group is the maximal action for the other, and vice versa.
16
Theoretically at least, other properties also matter for equilibrium selection. While we already
mentioned group dynamism, it is not hard to see that network architecture can also matter (Neary,
2013); Network architecture has no effect on selection with the canonical homogeneous model - risk
dominance always wins - except on a small class of networks (Ellison, 1993; Jackson and Watts, 2002;
Peski, 2010).
7
Nax (2014), that poses a very similar question as this one. Maes and Nax (2014) analyse individual level data from a large population experiment (in a game that is akin
to a networked version of the Language Game), and find that subjects’ deviations are
sensitive to deviation costs. There are two main differences. First of all, subjects are
not informed as to the structure of the network nor the payoffs of the other participants, which means that formally a very complex Bayesian Game is being played.17
Second is that they do not consider the relationship between individual-level mistakes
and population-level predictions, and their definition of deviation costs is not the same
as ours. The second such paper is Crawford (1995). The ‘adaptive dynamics’ studied
here, tailored perfectly to understand behaviour in large population coordination problems, are the first to fuse both rules governing strategy updating and the priors with
which players ‘enter’ the game. In other words, the model nests stochastic evolutionary dynamics and beliefs-based adaptive learning. While the dynamics apply only to
coordination problems in which players’ roles are symmetric and symmetric strategy
profiles are the only pure strategy equilibria, neither of which hold for the Language
Game, “heterogeneizing” these dynamics seems a potentially very fruitful avenue to
pursue.
The balance of the paper is organised as follows. In the next section, we formally
define the Language Game and provide insight as to why different noisy dynamics may
select different equilibria. Section 3 describes the experimental design, while Section 4
presents and discusses the results. Section 5 concludes.
17
The lack of information afforded subjects in this experiment has the nice feature that it allows
Maes and Nax (2014) to consider convergence to equilibrium under the ‘uncoupled’ and ‘completely
uncoupled’ dynamics of Hart and Mas-Colell (2003) and Foster and Young (2006) respectively.
8
2
The Language Game
2.1
The Game
The Language Game, G, is a simultaneous move game defined as the tuple {N , Π, S, G},
where N := {1, . . . , N } is the population of players; Π := {A, B} is a partition of
N into two nonempty homogeneous groups A and B of sizes N A and N B respectively (N A , N B ≥ 2); S := {a, b} is the binary action set common to all players;
G := GAA , GAB , GBB is a collection of local interactions, where GAA is the pairwise
exchange between a player from Group A and a player from Group A, etc. These local
interactions are given in Figure 1 below, where α, β ∈ (1/2, 1). Utilities are the sum
of payoffs earned from playing the field, where the same action must be used with one
and all.18 We assume players do not randomise.
GAA
GBB
A2
A1 a
b
a
α, α
0, 0
B2
b
0, 0
1 − α, 1 − α
B1 a
b
a
1 − β, 1 − β
0, 0
b
0, 0
β, β
GAB
B
A a
b
a
α, 1 − β
0, 0
b
0, 0
1 − α, β
Figure 1: The three local interactions, GAA , GAB , and GBB
Fixing an order on the players, with each Group A player listed before any from
Q A
QN B
Group B, we define S := N
i=1 S× j=1 S, with typical element s. When a player chooses
action s, from his perspective, action profile s ∈ S can be viewed as (s; s). It is important
to note that despite the heterogeneity and the fact that players can be matched with
18
While this has a different interpretation to random matching, optimising behaviour is the same.
9
more than one ‘type’ of player, local interactions are opponent independent, in that
one’s payoff is determined by his choice of action and the action choice of his opponent.
That is, the opponent’s identity does not matter, just their behaviour. This feature
then ‘scales up’ to the population level since a player cares only about the number of
others using each action and not on how those others are distributed across the two
groups. Given this, for any population profile s ∈ S, let na (s) denote the number of
players choosing action a at s. (Clearly then the number of players choosing action b
at s is equal to N − na (s).) With this notation, the utility a player in group K ∈ Π
receives from taking action s ∈ {a, b} when the population profile is s, written U K (s; s),
is given by
U A (a; s) := na (s) − 1 α
U A (b; s) := N − na (s) − 1 (1 − α)
U B (a; s) := na (s) − 1 (1 − β)
B
U (b; s) := N − na (s) − 1 β
(1)
(2)
(3)
(4)
By Theorem 1 in Neary (2012), the only pure equilibria are the group-symmetric
profiles (a, a), (a, b), and (b, b), where the first boldface symbol refers to the action
commonly chosen by those in Group A and the second to that commonly chosen by
everyone in Group B. While (a, a) and (b, b) are always equilibria, profile (a, b) is an
equilibrium if and only if the smaller group’s preferences are sufficiently strong. Profile
(b, a) is never an equilibrium.
Two things are worth mentioning. First, it is standard when studying large population games with a coordination aspect to divide them into games of “strategic complements” and those of “strategic substitutes”.19 However, while the Language Game is
clearly in the former category, it is not a pure coordination game in the standard sense
19
Galeotti, Goyal, Jackson, Vega-Redondo, and Yariv (2010) make the distinction very clear, with
games of strategic complements tend to receive more attention in the literature. Bramoull´e and
Kranton (2007) is an example of a game of strategic substitutes; Boncinelli and Pin (2012) study the
“best shot” game - a setting where actions are absolute strategic substitutes.
10
since it does not possess a unique Pareto efficient equilibrium (Group A players prefer
(a, a), while Group B players prefer (b, b)).20 Thus, the standard tradeoff of Pareto
efficiency versus risk dominance is not present.21 Furthermore, despite the lack of a
unique pareto efficient equilibrium, it is by no means clear that one should categorise
the Language Game as a coordination game with tension a` la a Battle of the Sexes (see
the “asymmetric contests” of Samuelson and Zhang (1992) for the large population
analog), since there are subsets of the population, the groups, wherein all agents agree
on what population profile they would most like to see emerge and yet these agents also
interact. This feature of the Language Game means it allows it to provide a more realistic framework for studying the emergence of standards and operating systems (Farrell
and Saloner, 1985; Katz and Shapiro, 1985; Arthur, 1989), since it does not insist that
preferences are homogeneous but still allows a given player to interact with everyone in
the population.
The second point concerns interpretation. There are two different, but straightforward, ways to interpret the Language Game. The first involves taking the most
commonly used setting in the literature on large population games and simply “doubling” it. That is, suppose that there are two groups, A and B, each located on a
distinct island. For each of these island economies, the local interactions are given by
GAA and GBB respectively. The Language Game can then be thought of as an “opening
up” of the islands to one another (thereby requiring the addition of the across group
local interaction GAB ).
The other interpretation supposes that the across group local interaction, GAB , is the
primitive (note that this local interaction is a Battle of the Sexes). Thus, the Language
20
However, the Language Game is a ‘pure coordination problem’ according to Young (2001), who
defines such a game as one wherein all players have m strategies, and strategy sets can be ordered such
that it is a strict equilibrium for each player to play their mth strategy.
21
In fact, it is not clear how one ought define risk dominance in this setting. If the standard definition
of a best-reply to 50% of the population using either action was adopted, then the risk dominant profile
is always (a, b). But this could yield the undesirable conclusion that the risk dominant profile need not
be an equilibrium. If an alternate definition of a best-reply to the profile (a, b) is used, then the risk
dominant profile, while always an equilibrium, could involve players from the smaller group adopting
the larger group’s preferred action, and this may not sit well either.
11
Game adds to a large population Battle of the Sexes that member of each “sex” also
interact amongst themselves, a feature absent from existing large population games
with asymmetry. To put it another way, it is not only the case that players are playing
more than one game simultaneously - this is one interpretation of all large population
setups - but rather that they are playing more than one type of game simultaneously.
2.2
Equilibrium Selection
Now suppose that the Language Game is the stage game in an evolutionary setting.
Time is discrete and goes forever. Utilities are received every period, and, when afforded
the opportunity to update his action for tomorrow, a player is assumed to take a bestresponse to today’s action profile.
The previous paragraph defines a deterministic population dynamic. There are three
standard best-response based dynamics used in the literature, each of which satisfies the
Darwinian mantra of “better strategies today are [weakly] better represented tomorrow”.
The first of these is the best-reply dynamic. This stipulates that every period, all
players update their action, taking a best-response to the current population profile.
The second, known as independent inertia (Noldeke and Samuelson, 1993; Samuelson,
1994; Kandori and Rob, 1995), assumes that every period, each player is “activated”
with the same probability, and that those activated players choose a best-response to
the current population profile.22 The final dynamic is asynchronous learning (Binmore
and Samuelson, 1997; Blume, 2003). This assumes that each period, one player is
randomly chosen (typically, each is chosen with equal probability, 1/N ), and that the
appointed player best-responds to the current population profile.
Deterministic dynamics like those defined above can exhibit what Arthur (1989)
termed path dependence. Informally, this just says that the initial strategy profile can
have a strong bearing on the terminal outcome. In the case of the best-reply dynamic,
22
Note that the best-reply dynamic is a special case of independent inertia where the activation
probability equals one. As regards equilibrium selection, however, a discontinuity can occur when this
activation probability equals one. As such, throughout this paper, the term “independent inertia”
refers to a revision protocol where the activation probability is strictly less than one.
12
initial behaviour always uniquely determines the final rest point; for the others, it
often uniquely determines the final rest point for many initial configurations. Foster
and Young (1990) were the first to show that adding noise to a deterministic dynamic
can remove this path dependence. By assuming that players will forever occasionally
deviate from their behavioural rule, interpreted as mistakes/mutations/shocks at the
individual level, “noise” is added to the dynamics, and so the system is always in flux
and permanent lock-in never occurs. However, despite the perpetual instability, there
is some regularity to the randomness and the bulk of time is spent localised around a
subset of the equilibria - the stochastically stable ones.
Noise is typically assumed to occur in one of two ways, and when interpreted as
resulting from individual mistakes each has a nice behavioural interpretation.23 The
first is pure randomness: a player with a revision opportunity chooses an inferior action
with a fixed likelihood that is independent of all outside factors. The second is the
payoff-dependent “logit” variant introduced by Blume (1993), where the probability of
choosing a particular action is dependent on how that action will affect utility.24
Crudely put, the formula for stochastic stability is “population dynamic” + “noise”
⇒ selection.25 Given that different choices of components on the LHS can affect the
23
Bergin and Lipman (1996) showed that for a given deterministic dynamic, any equilibrium can be
rendered stochastically stable for an appropriately defined model of noise. However, while technically
correct, it is not hard to see that in some games, the noise must be generated by mistakes that occur
in almost pathological ways with no reasonable behavioural interpretation, for certain equilibria to be
selected.
24
Recently, a new model of mistakes, the so-called ‘directed’ or ‘intentional’ errors, have been
proposed by Naidu, Hwang, and Bowles (2010). In a binary action game, this translates to players
being infinitely more likely tp make mistakes away from one action over the other. In our framework,
as in the Battle of the Sexes as studied in Hwang, Naidu, and Bowles (2013), this would presumably
translate to subjects from Group A only ever mistakenly choosing action a while those from Group
B accidentally choose action b. As we discuss in Section 4.2 on individual level behaviour, while we
do observe a leaning towards such directional-esque mistakes, it is far from absolute. Perhaps more
importantly, mistakes of this form do not corroborate with what we observe at the population level
as they would always predict the equilibrium outcome (a, b), a profile we observe in only two sessions
(both in Treatment G2).
25
The word “refinement” has the conventional meaning of throwing out some equilibria, while
the word “selection” has come to mean throwing out all but one of the equilibria. While stochastic
stability is commonly referred to as a selection criterion, it is more accurate to refer to it as a refinement.
While this only happens for non-generic parameters in the standard homogenous model, with uniform
mistakes more than one equilibrium can be stochastically stable for an open set of parameters in the
Language Game.
13
outcome on the right and given how often the concept is invoked in experimental studies,
the point of the current paper is to give guidance over what components should be used.
It is useful to break the large array of choices into subsets. Asynchronous learning
and independent inertia always make the same selection in the Language Game when
coupled with uniform mistakes (Neary, 2013); the best-reply dynamic, however, may
give rise to very different predictions with uniform mistakes (Neary, 2012). As regards
logit mistakes, Al´os-Ferrer and Netzer (2010) have a very useful result. They show that
for all “Best-Response Potential Games” (Voorneveld, 2000), logit mistakes coupled
with any deterministic dynamic with an arbitrary specification of revision opportunities,
will select the same equilibrium. Since asynchronous learning, independent inertia, and
the best-reply dynamic all satisfy this revision protocol, and the Language Game is
a potential game (Shapley and Monderer, 1996) - a strict subset of the set of bestresponse potential games - we are left with three different (classes of) noisy dynamics
that can make different predictions in the Language Game. Each dynamic is composed
of a deterministic component and a noise component.
D1 : best-reply
+ uniform mistakes
D2 : asynchronous learning / independent inertia
+ uniform mistakes
D3 : arbitrary specification of revision opportunities + logit mistakes
Using an example, we now sketch how each of these three noisy dynamics can select
different equilibria in the Language Game.26 The parameters are those from our first
treatment, Game1: (N A , N B ) = (11, 9) and (α, β) = (0.57, 0.67). For these parameters,
action profile (a, b) is an equilibrium.
A diagram is helpful. While the Language Game has a population of size N (set
equal to 20 in each of our treatments), there are in actuality only two types of player.
26
We will be somewhat vague on the mathematical machinery needed to compute stochastically
stable equilibria. It involves “tree-surgery” techniques developed by Freidlin and Wentzell (1998),
and first introduced to game theory in Foster and Young (1990). Due to the incredible popularity of
the papers of Kandori, Mailath, and Rob (1993) and Young (1993), these techniques are now quite
standard. Young (1993) is the most complete treatment.
14
9
8
x

nB
a
`c
`c
`c
s@
7
s
6
s
5
s
4
s
3
s
@
`c
`c
`c
`c
`c
`c
s
s
s
s
s
~
s
@
@
`c
`c
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
@
@
`c
@
`c
`c
`c
s
@
`c
`c
s @
s @ `c
@
`c
s
s
s@ `c
@
s
s
s
s@ `c
@
@
s
s
s
s
s
@
`c
`c
`c
`c
@
@
`c @ s
s
@
`c
`c@ s
@
c`@
`c
`c
`c
`c
s
@
@
c`
c`
c`
s
2
s
s
s
s
s
s
1
s
s
s
s
s
s
@
`c
s
@
s @
s
0
s~ s
s
s
s
s
s
2
3
4
5
0
1
6
nA
a
c`
@
@
s
s@
7
−→
8
s
s
@
@
s
s
F
s
9
s
s
s
s
~
s
10 11
Figure 2: Condensed State Space.
Thus, at any point in time the action profile can be summarised by a 2-dimensional
A
B
vector (nA
a , na ), where na is the number of players in Group A currently using action
A
B
a, and nB
a is the corresponding statistic for Group B. Clearly na + na = na . Figure
2 shows a condensed version of the action space S, what is commonly referred to as
the state space, as a 12 × 10 lattice, with nA
a ∈ {0, . . . , 11} on the horizontal-axis, and
nB
a ∈ {0, . . . , 9} on the vertical-axis. Each ‘state’ is depicted by a circle.
Equilibrium states are depicted by large circles. For the parameters of this problem
(and for the parameters of each of our treatments), it can be checked that state (11, 0),
uniquely identified with the action profile (a, b), is an equilibrium.
At most action profiles, optimal behaviour is the same for all players in a given
group, so further information can be conveyed in a picture of the state space, like
that in Figure 2, via colour-coding and shading.27 Under any best-response based
27
The quantifier “most” in this sentence in important. Technically, for some games, there can be a
(small) set of states at which the best-responses for two players in the same group do not accord. (This
is the case in the 10 person homogeneous population of the leading example in Section 2 of Kandori,
Mailath, and Rob (1993).) However, it is not an issue in this example, nor in any of our treatments.
15
dynamic with an arbitrary specification of revision opportunities, all solid blue states
will lead eventually, in the case of asynchronous learning or independent inertia, and
immediately, in the case of the best-reply dynamic, to state (0, 0). The reason for this
is at each of those states, there is uniform preference for which action is better - in
this case action b. At these states, one can think of the dynamics as a current that is
pushing down and to the left. A similar but opposite statement holds for all solid red
states - they will lead with certainty to the equilibrium state (11, 9). Both of these sets
are separated by correspondingly-coloured lines running at 45 degrees from northwest
to southeast.28
Now consider the remaining states. At each of these states, group preferences disagree. That is, Group A players prefer action a while Group B players prefer action b.
Any deterministic dynamic is pushing down and to the right. All these states transition
immediately to (11, 0) under the best-reply dynamic. These states are further colour
coded for the purposes of asynchronous learning and independent inertia. The solid
black states will lead with certainty to equilibrium (11, 0). All hollow red states, those
contained inside the small red triangle, will lead to either state (11, 0) or state (11, 9).
They cannot lead to (0, 0), because the dynamics cannot “move left” and thus “get back
to” any blue state due to the preferences of Group A driving population behaviour further to the right of the state space. A similar statement holds for all hollow blue states,
those in the small blue triangle, in that they lead to either (0, 0) or (11, 9).
Under asynchronous learning and independent inertia, all hollow black states can
lead to any of the three equilibria, depending on what subset of players are randomly
selected to revise their actions and in what order they are chosen to do so. For example, from state (6, 5), there is positive probability that three Group B players will be
activated (i) in successive periods under asynchronous learning, and (ii) in the same
period (with no Group A players activated) under independent inertia, and with each
best-responding the resulting state will be (6, 2). From state (6, 2), the dynamics are
28
The fact that the local interactions are opponent-independent is what makes the boundary states
of these sets lie at 45 degrees. If players had a stronger preference for coordinating with those from
their own group, then these boundaries would be tilted away from 45 degrees.
16
unambiguous and will lead to equilibrium (0, 0). The green star, F, is the average
period 1 population profile for this treatment. More on this when we discuss the results
in Section 4.
Now let us look at stochastically stable equilibria. For how the tree-surgery techniques alluded to in Footnote 26 are applied to the Language Game, the reader should
consult Neary (2012) and Neary (2013) for precise statements. For understanding the
intuition, the key feature is computing how easily each equilibrium can be escaped from.
Transitions that ‘go against the flow’ of the dynamics have a cost associated with them;
transitions that occur naturally ‘with the flow’ are costless.
We begin with the case where mistakes are uniform. Each transition against the flow
has equal cost (hence the name “uniform”), and this cost is normalised to 1. Suppose
initially that population behaviour is at (b, b), i.e. at state (0, 0). From this state, if
any 9 players mistakenly choose the wrong action (at the same time under best-reply, or
sequentially under asynchronous learning), then the resulting population behaviour will
be at some state where na = 9, from where it need not return to (0, 0).29 From there,
a series of costless transitions will lead to (11, 0) (this is what will happen immediately
and with certainty in the case of the best-reply dynamic, and with positive probability
in the case of asynchronous learning or independent inertia). It is thus said that a
minimum cost path from (0, 0) to (11, 0) has a cost of 9. A similar analysis is done for
all pairs of equilibria and stochastically stable outcomes are then quite easily computed.
Whenever (a, b) is an equilibrium, as it is for all the treatments in this paper, the
dynamics D1 and D2 often select the same equilibria and can therefore be difficult to
parse. (However, this is not true when profile (a, b) is not an equilibrium.) Despite
the fact that (a, b) is an equilibrium for the parameters of Game 1, the intuition for
the disparity in selection can still be gleaned from Figure 2. The important states are
those where group preferences disagree. Under the best-reply dynamic, each of those
states uniquely determines the terminal rest point but this is not true under the other
29
Ellison (2000) would say that the radius of (0, 0), the minimum number of mistakes needed to
escape from (0, 0), is equal to 9.
17
dynamics. For example, under asynchronous learning, it is possible to transition from
(11, 9) to (0, 0) with only 7 mistakes - by first transitioning to state (4, 9), followed
by a series of costless transitions into one of the solid blue states. Thus, the bestreply dynamic tilts matters in favour of the larger group, while independent inertia and
asynchronous learning lean towards the group with stronger preferences (i.e., α vs β).
The state space depiction above is less helpful for understanding how to compute
stochastically stable outcomes under a dynamic with logit mistakes. While the colour
coding in Figure 2 comes from stage game parameters and noise-free deterministic
dynamics, with logit mistakes some costly transitions may have very different costs,
and so finding paths of minimum cost becomes far more tedious than just a simple
counting exercise. However, a convenient short cut is available since each of the local
interactions, GAA , GAB , and GBB , is a potential game and therefore so is the Language
Game as a whole. Using a result from Neary (2013), the action profile that maximises
the potential function is stochastically stable under these dynamics.
3
Experimental Design, Hypotheses and Procedure
Now that we have discussed the intuition for how the different noisy dynamics may
select different equilibria, we describe our experiment intended to isolate which noisy
dynamic makes the best prediction.
3.1
Design and Hypotheses
The population size was fixed at N = 20 throughout. The treatment variables are
the group sizes and the strength of payoffs: N A , N B , α, and β. A given game is defined by the tuple (N A , N B , α, β), so our treatments G1, G2, and G3 are given by
(11, 9, 0.57, 0.67), (12, 8, 0.58, 0.71), and (15, 5, 0.58, 0.80) respectively.
The most important facet of the design is that the different dynamics make different
predictions for the different parameter specifications. While only two treatments are
18
needed to obtain complete separation of three dynamics, we used three treatments in
order to give each dynamic an opportunity to make the wrong prediction. Table 1 below
summarises the stochastically stable outcomes for the three different noisy dynamics of
interest.
(N A , α, β)
D1
D2
D3
Game1
(11, 0.57, 0.67)
(b, b)
(b, b)
(b, b)
Game2
(12, 0.58, 0.71)
(a, a)
(b, b)
(a, a)
Game3
(15, 0.58, 0.80)
(a, a)
(b, b)
(b, b)
Table 1: Experimental Treatments and Stochastically Stable Equilibria
Our first hypothesis is very straightforward. In actuality, it says little more than
‘noisy population dynamics’, which are part and parcel of stochastic stability, are worthy
of study in a laboratory setting. That is, in large population coordination problems,
there are very ‘secure’ equilibria - in the sense that a deviation strictly, and in some
cases considerably reduces utility - that can be moved away from. Evaluating this
hypothesis is simple. All one needs to verify is that, in at least one of our treatments,
population behaviour reaches a strict equilibrium and then drifts away from it.
Hypothesis 1. Noisy dynamics have something to contribute. That is, people make
suboptimal responses sufficiently regularly that strict equilibria can be escaped from.
Next, note that in treatment G1, all three noisy dynamics make the same prediction.
Treatment G1 can thus be thought of as our “control” treatment, allowing us to state
our first ‘specific’ hypothesis as follows:
Hypothesis 2. In Game 1, long run population behaviour will conform with that prescribed by each of the three noisy dynamics.
Hypothesis 2 is there to rule out “pathological” mistakes that Bergin and Lipman
(1996) highlight might be of issue when invoking stochastic stability. That is, observations consistent with Hypothesis 2 provide evidence in favour of restricting attention
only to those dynamics with “sensible” models of mistakes.
19
Having confirmed that all three dynamics with sensible models of mistakes make
the same prediction when they should, we next move to parsing them. Due to the
incredibly popularity of the concept QRE, our prior was that dynamics supported by
payoff-dependent mistakes ought make the “best” prediction. Thus, using treatments
G2 and G3, our second hypothesis can be stated as:
Hypothesis 3. In Game 2, population profile (a, a) will appear most frequently, and in
Game 3, population profile (b, b) will appear most frequently.
To reiterate, the [predictive] evolutionary theories lined out in Hypotheses 1-3 above
are easily checked. As stated before, to reject Hypothesis 1, no strict equilibrium
can ever be escaped from. For Hyptheses 2 and 3, effectively, one can just eyeball
what equilibrium profile population behaviour is trending towards, or what equilibrium
profile population behaviour spends most time localised around, and then compare
this outcome to that stipulated by each of the noisy dynamics. Even in the case where
there is a lot of bouncing around of population behaviour - something that immediately
implies stochastic dynamics have value - summary statistics like the relative frequency
spent at each equilibrium are easily computed.
However, there is also the issue of what is going on at the level of the individual, as
there is the possibility that we may over infer from the population-level results. It is
well within the realm of possibility that in each of our treatments, population behaviour
will coincide perfectly with the theoretical prediction of a particular noisy population
dynamic, and yet individual behaviour does not conform with the individual learning
rule that, when agglomerated, generates this aggregate prediction. Put more simply,
since we are considering only three out of a possible infinite number of learning rules,
perhaps individuals are behaving in a manner very different to myopic best-response
with noise, and yet population behaviour just happens to corroborate the prediction
for each of our parameter specifications. Analysis of the data at the individual level
will allow us to either refute or validate (or at least to not refute) this. As such, our
final hypothesis is the ‘individual level’ version of Hypothesis 3.
20
Hypothesis 4. The probability of making a mistake is higher when the payoff from the
best-response is lower, and the probability of making a mistake is lower when the payoff
from the non best-response action is higher.
Before addressing Hypothesis 4, there remains a concern that subjects are (i) not
perfectly myopic, and (ii) in periods 3 and onwards, conditioning behaviour using information from more than just the immediately preceding period. Issue (i) is easily
checked by comparing actions taken to those prescribed by the myopic best-response
learning rule.30 Issue (ii) is more subtle. However, in a given round of play, the feedback
provided to subjects pertained only to the immediately preceding period.31 While it
remains possible that our subjects were able to a) recall perfectly information from all
previous periods, and b) use this information for strategic purposes, the most recent
period is inherently focal.
Hypothesis 4 states that deviations from the myopic best-response learning rule
depend on payoff differentials. Logit mistakes are only one (nicely-parameterised) model
of mistakes allowed under Hypothesis 4, but of course there are others. Our analysis of
the individual level data will test generally for any kind of payoff dependent mistakes.
3.2
Experimental Procedure
There were four sessions run per treatment, each sharing the same procedure. All sessions were conducted in English at the Hong Kong University of Science and Technology
(HKUST). A total of 240 subjects (=12 sessions of 20 subjects) were recruited from
the undergraduate and graduate populations of the university. None had any prior
experience with this game.
Subjects entered the lab and each was assigned a private computer terminal.32
Copies of the experimental instructions were distributed and subjects were given 10
minutes to read them. Communication of any sort between the subjects was forbidden
30
Data analysis shows that in all sessions, the percentage of actions that equate to myopic bestresponses exceeds 92.3%, so this appears robust. See subsection 4.2.
31
The next subsection spells out the experimental procedure precisely.
32
The computer program was written using z-tree (Fischbacher, 2007).
21
throughout thereby removing coalitional effects as a confounding factor.33 After reading the instructions, but before commencing the session, the subjects were required to
answer a brief questionnaire demonstrating that they understood how payoffs would be
assigned each period. No session would have begun until all students had responded
to each question correctly, although the game is sufficiently simple that no problems
were encountered. Finally, the experimenter read the instructions aloud to ensure that
the information included in the instructions, that at this point was verified as understood, was mutual knowledge, and, depending on the levels of reasoning employed by
the subjects, approaching common knowledge.
In each treatment play lasted for 200 periods.34 At the beginning of each period
except the initial one, each subject was provided with two pieces of information concerning the previous period’s play: the total number of players that had chosen each
action; and his own payoff.35,36 Each subject was then prompted to select his action for
the forthcoming period. The only difference across periods was how much time subjects
were given to make a decision. We allowed subjects 30 seconds to choose an action in
the initial period, 15 seconds in periods 2-10, and 10 seconds in all periods thereafter
(periods 11-200).37 If a subject did not make a choice within the allowed timeframe,
his action from the previous period was carried forward. For more details, see the
instructions and the z-tree screenshots attached in Appendices A and B respectively.
Payoffs were assigned as the average of that earned from playing the field. Payoffs
33
Newton (2012) is a detailed study of how coalitional behaviour can affect stochastically stable
outcomes. In summary, it can matter a lot so removing it as a possibility was very important.
34
The choice of 200 periods was not taken lightly. While we needed the horizon to be large enough to
justify making statements about the “long run”, if a session went on for too long, the possibility existed
that subjects would lose focus and that their strategic behaviour would differ. It is the simplicity of
our game that allows us to implement 200 periods within (an average time of less than) 2 hours.
35
Obviously, from Equations (1)-(4), each one of these pieces of information is sufficient for a subject
to compute the other. Both were provided for the sake of clarity.
36
Importantly, subjects were not told how the actions taken were distributed across the two groups.
This information was withheld to avoid the possibility that this information could be used as an
external coordination device. It is also in accordance with a) how the game is defined, and b) how it
would be played in a genuinely “large” population.
37
Together with the limited information feedback provided, the restricted time limit for each round
placed a practical restriction on subjects’ behaviour, prodding it to be more in line with myopic
best-response.
22
were scaled up so that for a given treatment, maximum Group A (B) payoffs were
given by 100α (100β). Minimum payoffs were zero. While the groups were labelled as
A and B, the actions a and b were labeled as ‘#’ and ‘&’ respectively, so as to reduce
the possibility that group identity might increase anchoring on a particular action due
to its label.38 Note that, as mentioned in Footnote 21, there is no clear-cut “safe”
(risk-dominant) action.
Take home cash was assigned as the sum of rewards from two randomly chosen
rounds plus a HK$40 show-up fee. Average earnings were HK$135.7 (≈ US$17.4), with
the range of payoffs given by the interval [HK$60, HK$190] (≈ [US$7.83, US$24.35]).
4
Results
4.1
Population Level
We begin this section by drawing the reader’s attention to Figures 3, 4, and 5 below.
Figure 3 refers to Treatment G1, Figure 4 to treatment G2, and Figure 5 to treatment
G3. All three Figures contain four panels, labelled (a)-(d), with each panel referencing
a different session of that treatment. A given panel conveys three pieces of information.
On the horizontal axis is the period, ranging from 1 to 200, and on the vertical axis is
the number of players in a particular subset using action a in a given period. The three
subsets are Group A (represented by the light dashed line); Group B (the dark dashed
line); and the total population (the solid line). The solid line is simply the “sum” of
the other two.
Our first hypothesis, Hypothesis 1, is trivial to check. In fact, any of the sessions
38
The theory of social identity was initially developed by Tajfel and Turner (1979) in the field
of social psychology and is now gaining popularity in economics (Chen and Li (2009) is a recent
experimental study). In our setting, group identity means that an individual finds a particular action
more attractive than just its payoff consequence. Despite the fact that we labeled actions a and b as
# and & respectively, it is immediate that each group has a particular action that it would like to
see coordination on, so it is not possible to get rid of the group identity issue completely. Indeed, in
our data, there are a few subjects who tend to choose their ideal action significantly more often than
seems sensible. That is, they do so even though it consistently may not yield the highest payoff.
23
in any of the treatments confirm that the study of noisy dynamics is of value (at
least when considering experiments). Specifically, in each session, more than one strict
equilibrium is locked in on and then escaped from. Related to this, refer back to
Figure 2. Note that in all sessions of treatment G1, for which Figure 2 depicts the
state space, population behaviour began at an action profile very close to (a, b). The
precise starting profiles were, (10, 0), (11, 1) (9, 1), and (8, 1), which corresponds to an
average period 1 population profile in G1 of (9.5, 0.75). This is the state depicted by
the green star, F, in Figure 2. Note that from this state, any noise-free best-response
based dynamic must lock in on population profile (a, b) and stay there forever. A
particularly striking example of how noisy population dynamics are of benefit can be
seen from Game 3 Session 4. Here, population behaviour quickly (within 8 periods)
locks in on equilibrium (a, a), then transitions to equilibrium (a, b) for a few period,
only to move on to equilibrium (b, b) where it stays for 10 periods before jumping back
up to equilibrium (a, a). Our first result can therefore be stated as:
Result 1. The study of noisy population dynamics is undeniably a worthy endeavour.
Our second hypothesis, Hypothesis 2, addressed the issue of whether or not stochastic stability based on, what the literature has deemed “reasonable” models of mistakes,
can make accurate predictions in the lab.39 This can be answered using our control
treatment, G1, for which all three noisy dynamics predicted (b, b).40
The plot of population behaviour for G1 is depicted in Figure 3. For the first three
39
We are ignoring what is by far and away the most popular solution concept used in the analysis of
laboratory experiments - that of Quantal Response Equilibrium (QRE) due to McKelvey and Palfrey
(1995). While QRE has had great success in explaining many experimental data, it is a fixed-point
argument and our focus in this paper is on the process of convergence to equilibrium. With stochastic
stability, it is still assumed that players are utility maximisers, but that the beliefs they form are very
simple, and moreover it is not required that beliefs are ultimately correct. However, it should be noted
that while QRE shot to fame for it’s predictive power for initial (period 1) responses, it has also been
viewed as a model of learning. See Footnote 14 of Costa-Gomes, Crawford, and Iriberri (2013), and
the references therein.
40
In some ways, treatment G1 is analogous to running an experiment with a homogeneous population playing the stag hunt and expecting to observe lock in on the risk dominant action since all
three dynamics considered in this paper predict that outcome. As we have said before, the advantage to using the heterogeneous framework is that the dynamics can be parsed by choosing different
parameters.
24
25
20
15
Total
Group A
10
Group B
5
0
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105 109 113 117 121 125 129 133 137 141 145 149 153 157 161 165 169 173 177 181 185 189 193 197
(a) Game 1 Session 1
25
20
15
Total
Group A
10
Group B
5
0
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105 109 113 117 121 125 129 133 137 141 145 149 153 157 161 165 169 173 177 181 185 189 193 197
(b) Game 1 Session 2
25
20
15
Total
Group A
10
Group B
5
0
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105 109 113 117 121 125 129 133 137 141 145 149 153 157 161 165 169 173 177 181 185 189 193 197
(c) Game 1 Session 3
25
20
15
Total
Group A
10
Group B
5
0
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105 109 113 117 121 125 129 133 137 141 145 149 153 157 161 165 169 173 177 181 185 189 193 197
(d) Game 1 Session 4
Figure 3: Trends for frequency of action a in Game 1.
25
25
20
15
Total
Group A
10
Group B
5
0
1
11
21
31
41
51
61
71
81
91
101
111
121
131
141
151
161
171
181
191
(a) Game 2 Session 1
25
20
15
Total
Group A
10
Group B
5
0
1
11
21
31
41
51
61
71
81
91
101
111
121
131
141
151
161
171
181
191
(b) Game 2 Session 2
25
20
15
Total
Group A
10
Group B
5
0
1
11
21
31
41
51
61
71
81
91
101
111
121
131
141
151
161
171
181
191
(c) Game 2 Session 3
25
20
15
Total
Group A
10
Group B
5
0
1
11
21
31
41
51
61
71
81
91
101
111
121
131
141
151
161
(d) Game 2 Session 4
Figure 4: Trends for frequency of action a in Game 2.
26
171
181
191
25
20
15
Total
Group A
10
Group B
5
0
1
11
21
31
41
51
61
71
81
91
101
111
121
131
141
151
161
171
181
191
(a) Game 3 Session 1
25
20
15
Total
Group A
10
Group B
5
0
1
11
21
31
41
51
61
71
81
91
101
111
121
131
141
151
161
171
181
191
(b) Game 3 Session 2
25
20
15
Total
Group A
10
Group B
5
0
1
11
21
31
41
51
61
71
81
91
101
111
121
131
141
151
161
171
181
191
(c) Game 3 Session 3
25
20
15
Total
Group A
10
Group B
5
0
1
11
21
31
41
51
61
71
81
91
101
111
121
131
141
151
161
(d) Game 3 Session 4
Figure 5: Trends for frequency of action a in Game 3.
27
171
181
191
sessions - panels (a), (b) and (c) - population behaviour starts in the basin of attraction
of equilibrium (a, b), but quite quickly moves to equilibrium (b, b). Even a simple
eyeballing of the trends in these panels shows that long run behaviour is described by
uniform adoption of action b. However, the convergence in these sessions, in particular
that depicted in panel (a) is not perfectly pure in that there are still some occasional
deviations back to equilibrium (a, b). In the fourth session of treatment G1, panel
(d) in Figure 3, population behaviour does not conform with any of the theoretical
predictions. In fact, there is no clear lock in. While, as in the other sessions, population
behaviour began localised around equilibrium (a, b), it then shifted to the theoretically
predicted outcome (b, b), and then proceeded to bounce back and forth between the
symmetric profiles (a, a) and (b, b).41 Thus, our first detailed result, in support of
commonly-studied noisy dynamics / learning rules, can be stated as:
Result 2. Myopic best-response based deterministic dynamics coupled with sensible
models of mistakes have predictive power at the population level.
In deciding how to judge stochastically stable outcomes, the analysis so far has been
little more than a simple “eyeballing” of the trends in behaviour. Stochastically stable
outcomes are often referred to as long run outcomes, but this is somewhat misleading
as it creates the false impression that where population behaviour is trending is what
matters. Really, due to the noise inserted into the process, population behaviour can
never be trending to any equilibrium. Rather, stochastic stability measures the fraction
of time spent at each equilibrium, and selects the equilibrium(a) at which the time
average is non-negligible as the likelihood of mistakes becomes vanishingly small.
With the above in mind, Table 2 below displays the empirical frequency with which
the population chose a profile in the neighbourhood of (a, a), (a, b), and (b, b).42 The
41
Other treatments in which all dynamics made the same prediction were run for previous versions
of the paper (albeit with shorter time horizons), and in each they corroborated the finding of treatment
G1 in this version. The data is available upon request.
42
Recall that profiles (b, b), (a, b), and (a, a), are uniquely identified with states
(0, 0), (N A , 0), and (N A , N B ) respectively, where N A and N B varied across treatment.
We define the neighbourhood of profile (b, b) as the set {(0, 1), (0, 1), (1, 1), (0, 1)},
28
data show very clear results. The frequency of a profile in the neighbourhood of (a, a)
being chosen is 71.5 out of 200 in G2 and 106 out of 200 in G3. Mann-Whitney
tests with session level data as independent observations reveal that these values are
significantly higher (p < 0.01) than the frequency of a profile in the neighbourhood of
(b, b). Furthermore, the frequency of a profile in the neighbourhood of (b, b) being
chosen in G1 is 108 out of 200, which is significantly bigger (p < 0.01, Mann-Whitney
test) than those of (a, a). Similarly, the frequency of a profile in the neighbourhood
of (a, a) being chosen in G3 and (b, b) being chosen in G1 are significantly bigger
(p < 0.01, Mann-Whitney test) than those of (a, b). However, for G2, we cannot reject
the hypothesis that the frequency of a profile in the neighbourhood of (a, a) is the same
as that of (a, b).43,44 This reinforces the conclusions drawn from the observable trends.
Neighborhood of (a, a)
Neighborhood of (a, b)
Neighborhood of (b, b)
Game1
10
17.25
108.25
Game2
71.5
84
0
Game3
106
39.75
2
Table 2: Empirical (average) frequency for profiles in the neighbourhood of equilibria
We now move to parsing the dynamics. Recall that our conjecture was that a
dynamic from the class of those contained in D3 - those based on logit mistakes - would
A
A
A
A
that
of
(a,
b)
as
(N
−
1,
0),
(N
−
1,
1),
(N
,
1),
(N
,
0)
A
, and that of (a, a) as
(N − 1, N B ), (N A − 1, N B − 1), (N A − 1, N B − 1), (N A , N B ) .
Intuitively, this is just the
equilibrium and those states that immediately surround it - where the action profile differs from the
equilibrium in at most one component. This is the tightest definition of a neighbourhood, and so
naturally the result is robust to expanding the definition of neighbourhood.
43
The results from our administered exit-survey suggest that the prevalence of (a, b) in Game 2
comes from the forming of particularly strong group identities, as discussed in Footnote 38, associated
with this treatment. A few selected responses from Game 2 subjects to the dual-question: “Did you
prefer taking a particular action (either # or &), even though doing so would likely give you a lower
monetary payoff? If yes, please briefly explain.”, were as follows: “I preferred # and I was from group
A.” “# for group A, & for group B.” “Yes. I prefer option # as I am a group A player.” “Yes, I try
to press &, my rationale is to hope that all people in Group B can follow, and some people in Group
A can follow, so that in the next round, I can get more money than the previous round.”
44
The high frequency of the profile (a, b) in Game 2 can be viewed as evidence for the directed
errors of Naidu, Hwang, and Bowles (2010). However, our population level data reveals that the same
pattern does not exist in the two other treatments. We investigate this issue more carefully in the next
subsection on individual level analysis.
29
make the best prediction. However, by comparing the analysis above to the theoretical
predictions given in Table 1, it is quite clear that a dynamic from this class is not the
victor. In fact, it is the best-reply dynamic, D1, that gets it right more often than not.
Our third result can then be stated as follows:
Result 3. The best-reply dynamic with uniform mistakes, Dynamic D1, is the best
predictor of population behaviour.
That a noisy population dynamic based on uniform, payoff independent, mistakes
makes the best prediction is the main finding of this subsection. It is particularly
surprising that the best-reply dynamic with uniform mistakes might outperform the
best-reply dynamic with the oft-invoked logit mistakes, and yet this is precisely what
we observe.
As stated before, it is important to confirm that behaviour at the individual level
corroborates the population prediction made by “aggregating” it. Without this, we
could not rule out the possibility that some unconsidered behavioural rule just happened
to make the same prediction as that of the best-reply dynamic. This is the purpose of
the next subsection.
4.2
Individual Level
This subsection is devoted to analysing the individual level data. It is impossible to
parse out the dynamics D1 and D2 based on the individual data because the behavioural
rule (best-response) and mistakes (uniform) are the same in each. As a result, our focus
here is to test if the individual mistake behaviour can be explained by logit mistakes as
stated in Hypothesis 4.
Before analysing how our subjects err, we need to determine precisely what behavioural rule they are following. In fact, note that without determining this it is
impossible to define what constitutes a mistake. The size of the population coupled
with the fact that only limited and very recent information was provided in each period,
30
was intended to induce subjects to behave as myopically as possible. However, there is
always the possibility that some other behavioural rule was being adopted.
As a first pass, consider Figure 6 below. The figure displays the percentage of actions
taken that were myopic best-responses to the previous period’s population profile in
periods 2 and onwards. The average mistake probabilities aggregate across all rounds
are 7.61% on average with 10.76% in Game 1, 5.73% in Game 2, and 6.35% in Game 3.
The significantly higher mistake probability in Game 1 clearly comes from the outlier
(Session 4) of that treatment.
More precisely, panel (a) plots the percentage of non-optimal actions (that is, action
choices that were not myopic best-responses) taken in each round, aggregated over all
four sessions of all three games. Panel (b) presents the percentage of mistakes made by
individuals, sorted by group and by intervals of rounds 1-50, 51-100, 101-150, and 151200. The decreasing regularity with which subjects make mistakes as time progresses is
clear. This is evidence that subjects are learning to control their behaviour over time.
12
30
10
25
20
8
Round 2-50
15
Total
6
Round 51-100
Group A
Group B
10
Round 101-150
4
Round 151-200
5
2
0
Group A
0
1
11
21
31
41
51
61
71
81
91
101
111
121
131
141
151
161
171
181
Group B
Game1
191
(a) Time Trend
Group A
Group B
Game2
Group A
Group B
Game3
(b) Group / Role
Figure 6: Mistakes
A few more precise remarks can be made regarding the information in Figure 6.
First, as mentioned above, both panels (a) and (b) suggest that the frequency of mistakes is dependent upon time (period). A Spearman’s Rank order test was run to determine the relationship between the average mistake percentage for each round and the
round number. The result shows that there is a strong negative monotonic relationship
between the two variables (Spearman’s ρ = −0.71, p < 0.001 for all data; ρ = −0.53,
31
p < 0.001 for Group A; ρ = −0.75, p < 0.001 for Group B). Second, in G1, where (b, b)
is the selected long-run outcome, the observed behaviour of Group B subjects is highly
consistent with a myopic best-response heuristic, whereas in G2 and G3 where (a, a) is
the observed long-run outcome, the observed behaviour of Group A subjects is highly
consistent with a myopic best-response heuristic. This implies that the frequency of
mistakes is asymmetric across group, i.e., it is group-dependent.45 The non-parametric
Mann-Whitney test with session level aggregate data reveals that the frequency of mistakes for different groups are significantly different in G2 and G3 (p = 0.021 for G2,
and p = 0.043 for G3) and insignificantly different in G1 (p = 0.149).46
Having concluded that our subjects behaved as myopic best-responders, we next
move to classifying how they deviate from this rule. In particular, we want to determine
whether or not the likelihood of a mistake, defined as a non myopic best-response, is
dependent on the payoff consequences. Evidence against this would lead us to reject
Hypothesis 4.
We conduct individual level probit regression, for periods t = 2, . . . , 200, with a
dependent variable Mit and four regressors UitBR , UitN BR , Gi , and t, where Mit denotes
the action chosen in period t by individual i, taking the value 1 for the best-response
action to period t − 1 population behaviour, and 0 otherwise; Gi is a dummy variable
that takes the value 0 if individual i is in Group A, and 1 otherwise; UitBR gives the payoff
earned from choosing the myopic best-response action and UitN BR gives the payoff earned
from choosing the non-myopic best-response action. We write εit for the idiosyncratic
error. The coefficients of interest - those on the four regressors above - are given by β1 ,
β2 , β3 , and β4 respectively.
45
The group-dependent mistakes we found are somewhat in line with the directed errors of Naidu,
Hwang, and Bowles (2010). However, these are far from perfectly aligned as, in our setting, directed
errors demand that players never make mistakes in the wrong direction. That is, Group A players
must never choose b when it is suboptimal to do so, and Group B players never choose a when it is
the inferior action. And yet we observe situations in which this is violated. Furthermore, it is the case
that the directed errors would always predict equilibrium (a, b) as the long run outcome - something
we never observe.
46
The insignificant difference in G1 results from the session 4 in which both groups make mistakes
significantly more often compared to other sessions. With this ‘unusual’ session is removed, significance
is restored.
32
The signs of coefficients β1 and β2 have a straightforward interpretation: if the
deviations from the myopic best-response heuristic are dependent on the payoff consequences, then it should be the case that the sign of β1 is positive whereas the sign of
β2 is negative. Coefficients β3 and β4 are also uncomplicated. If subjects in Group A
(Group B) make mistakes more often than subjects in Group B (Group A), then β3
should be positive (negative). Similarly, if subjects make mistakes less often over time,
then β4 should be positive.47
Game 1
Game 2
Game 3
Session
Session
Session
Session
Session
Session
Session
Session
Session
Session
Session
Session
1
2
3
4
1
2
3
4
1
2
3
4
(1)
U (BR-action)
Coef. p-value
.1852 < .001
.1430
.066
.2118 < .001
.0693 < .001
.0037 < .001
.0085 < .001
.0060 < .001
.0076
.002
.0070 < .001
.0036 < .001
.0034 < .001
.0034 < .001
(2)
U (Non-BR-action)
Coef.
p-value
.1033
< .001
.0436
.427
.0874
.001
−.0067
.194
.0010
.002
.0030
< .001
.0015
< .001
.0029
< .001
.0019
< .001
.0010
0.024
−.0004
0.322
−.0002
0.643
(3)
Group
Coef. p-value
.5396 < .001
1.2115 < .001
.9586 < .001
.3034 < .001
−.0100
.960
−1.0095 < .001
−.3721
.012
−.8955
.050
−1.7550 < .001
−.9496 < .001
−.7255
.004
−.5287
.006
(4)
Period
Coef. p-value
.0063 < .001
.0014
.028
.0077 < .001
−.0003
.523
.0070 < .001
.0054 < .001
.0080 < .001
.0118 < .001
.0107 < .001
.0023 < .001
.0050 < .001
.0035 < .001
Table 3: Probit Regression
Table 3 presents the results of our regression.48 Column (1) shows that the sign
of β1 is positive for all sessions of all games, and significant at the 1% level for the
majority of sessions. Column (2) shows that the sign of β2 varies across games and
sessions but is more likely to be either significantly positive or insignificantly negative.
Overall, there is no clear pattern for the two coefficients in terms of the sign and the
significance level. Most importantly regarding parameters β1 and β2 , there is no single
session of any game where both β1 > 0 and β2 < 0 are simultaneously satisfied, as
47
The magnitudes of the coefficients do not capture marginal effects because the regression is conducted based on a non-linear model.
48
The results obtained from running a Logit regression are qualitatively the same and thus not
reported here.
33
is required for mistake behaviour to be consistent with the logit model. Compatible
with the results from the non-parametric tests, column (3) shows that the mistakes
are group-dependent in a systematic way, i.e., Group B subjects make mistakes more
often than subjects in Group A in G2 and G3 whereas Group A make mistakes more
often than subjects in Group B in G1. Subjects make mistakes less often as period goes
except for session 4 of G1. Overall, the result from Games 1-3 are not consistent with
the logit mistakes with myopic best-response. Our final result that addresses the issues
posed in Hypotheses 4 can then be summarised as follows:
Result 4. At the individual level, the probability of a mistake is decreasing in the payoff
from the best-response action (see column 1 in Table 3), and is independent of the payoff
from the non-best response action (see column 2 in Table 3), and has a time component
whereby players seem to ‘learn’ to control making mistakes as time progresses.
5
Conclusion
This paper describes an experiment whose goal is to determine what noisy dynamic best
predicts long run behaviour in a large population coordination problem. We use the
Language Game of Neary (2012) in which different noisy dynamics can select different
equilibria.
We have two promising findings. The first is really more of an observation. We
highlight that the prevalence of Bergin and Lipman’s result (Bergin and Lipman, 1996),
that the model of noise is instrumental in affecting equilibrium selection, has blinded
researchers to another important feature of noisy dynamics: it is extremely important
to consider the effect that the revision protocol can have on equilibrium selection. In
particular, given the mountain of experimental data showing that people make mistakes
in systematic, behaviourally “reasonable”, and, perhaps most importantly, quantifiable
ways, deciding what deterministic component of a dynamic best approximates the frequency with which players respond should probably be afforded more import than it
currently is.
34
Our second finding shows, perhaps surprisingly, that the best-reply dynamic with
uniform mistakes is the best predictor of long run population behaviour. That is, a noisy
dynamic in which all players best-respond imperfectly each and every period and the
imperfections are both state and time independent, generates the best prediction. The
most startling part of this finding is the fact that uniform mistakes best-resemble how
our subjects deviate from the conjectured best-response heuristic. Most importantly,
this finding is corroborated by regression analysis of individual level behaviour.
Potential extensions abound. One immediate test would be to conduct comparative statics on exactly when long run population behaviour flips and whether or not
this flip in outcome accords with theoretical predictions.49 An interesting paper along
these lines for the homogeneous case is Weber (2006), which shows that the experimentally observed population size cut off from the Van Huyck, Battalio, and Beil (1990)
experiments of the minimum effort game is avoided when the population size is increased incrementally. Equilibrium selection on networks is another interesting avenue
that could be pursued. Charness, Feri, Melendez-Jimenez, and Sutter (2014) is an
experimental investigation of the theoretical predictions of Galeotti, Goyal, Jackson,
Vega-Redondo, and Yariv (2010).
49
See Section 7 of Neary (2012) that conducts (theoretical) comparative statics for the case of the
best-reply dynamic with uniform mistakes.
35
APPENDIX
A
Experimental Instructions (Game 1)
INSTRUCTIONS
Welcome to the study. In the following hour, you will participate in 200 rounds of decision making.
Please read these instructions carefully; the cash payment you will receive at the end of the study
depends on how well you perform so it is important that you understand the instructions. If you have
a question at any point, please raise your hand and wait for one of us to come over. We ask that you
turn off your mobile phone and any other electronic devices. Communication of any kind with other
participants is not allowed.
Your Role
There is a total of 20 participants in the study. These 20 individuals are randomly assigned into two
different Groups: Group A and Group B, with 11 individuals assigned to Group A and 9 individuals
assigned to Group B. These group assignments are fixed throughout the study.
In each round, you play a game with the rest of the participants - both those in the same group
as you and those in the other group. Each player will be asked to take a decision that will affect the
earnings of every other player including themselves. At the end of the round, a summary of what
happened in that round, along with your earnings for that round, will be displayed on the computer
monitor.
Your Decision in Each Round
You will play a 2-player game with each of the 19 other participants. You must choose one of two
actions, labeled ‘#’ and ‘&’. This action will be used in every 2-player game that you play. Thus, you
are using the same action with each other participant.
Your total earnings in a given round will be the average of the earnings you received in each 2-player
game. The tables on the next page show how earnings are determined with each cell corresponding
to the choices of actions by you and your opponent in a particular 2-player game. The first number
in a given cell represents your earning in a 2-player game, and the second number represents your
opponent’s earning.
Since there are two Groups, there are two cases.
1. When you are in Group A, earnings are as follows:
In words this says,
(a) If you and your opponent both choose action ‘#’, you get 57. If your opponent is in Group
A, he/she gets 57; if your opponent is in Group B, he/she will get 33.
(b) If you and your opponent both choose action ‘&’, you get 43. If your opponent is in Group
A, he/she gets 43; if your opponent is in Group B, he/she will get 67.
36
opponent in Group A
#
&
0, 0
You # 57, 57
&
0, 0
43, 43
opponent in Group B
#
&
0, 0
You # 57, 33
&
0, 0
43, 67
Figure 7: When you are in Group A
opponent in Group B
#
&
0, 0
You # 33, 33
&
0, 0
67, 67
opponent in Group A
#
&
0, 0
You # 33, 57
&
0, 0
67, 43
Figure 8: When you are in Group B
(c) If you and your opponent choose different actions, you each get 0.
2. When you are in Group B, earnings are as follows:
In words this says,
(a) If you and your opponent both choose action ‘#’, you get 33. If your opponent is in Group
B, he/she gets 33; if your opponent is in Group A, he/she will get 57.
(b) If you and your opponent both choose action ‘&’, you get 67. If your opponent is in Group
B, he/she gets 67; if your opponent is in Group A, he/she will get 43.
(c) If you and your opponent choose different actions, you each get 0.
This is a quick reminder for how you read entries in the tables that you can refer back to throughout
the study:
Your earning, Your opponent’s earning
The following shows how to calculate your average earning in each round:
1. When you are in Group A.
x
(a) If you pick action ‘#’, your payoff is 57 × 19
, where ‘x’ is the number of other players who
chose action ‘#’.
y
, where ‘y’ is the number of other players who
(b) If you pick action ‘&’, your payoff is 43 × 19
chose action ‘&’.
2. When you are in Group B.
x
(a) If you pick action ‘#’, your payoff is 33 × 19
, where ‘x’ is the number of other players who
chose action ‘#’.
y
(b) If you pick action ‘&’, your payoff is 67 × 19
, where ‘y’ is the number of other players who
chose action ‘&’.
37
where x + y = 19.
Rundown of the Study
1. At the beginning of the first round, you will be assigned to a group, and you will be shown the
two tables specifying earnings that are relevant to your group. Below the tables, you will be
prompted to enter your choice of action. You must choose either ‘#’ or ‘&’ within 30 seconds.
If you do not choose an action, one will be randomly assigned to you.
2. The first round is over after everybody has chosen an action. The screen will then show you a
summary for the first round:
(a) how many players chose each action,
(b) your choice of action, and
(c) your (average) earning in the round, and
(d) a table displaying your (average) earnings in all previous rounds.
3. Below the information feedback, you will be prompted to enter your choice of action for the
second round. The game does not change, so as before you must choose either ‘#’ or ‘&’.
All future rounds are identical to except for one important difference. The difference concerns
how much time you have to choose an action. In rounds 2 − 10, you have 15 seconds to make a
decision. If you do not make a decision within the 15 second window, then you will be assigned
whatever action you used in the previous round. For rounds 11 − 200, you have only 10 seconds
in which to make a decision. Again, if you fail to choose an action in this timeframe, you will
be assigned the same action as in the previous round.
Your Cash Payment
We will randomly select 2 rounds out of the 200 to calculate your cash payment, so it is in your
best interest to take each round seriously. Each round has equal chance to be selected. The sum of the
points you earned in the 2 selected rounds will be converted into cash at an exchange rate of HK$1
per point. Your total cash payment at the end of the study will be this cash amount plus a HK$40
show-up fee. Precisely,
Your total cash payment = HK$ (The sum of the points in the 2 selected rounds) + HK$ 40
Adminstration
Your decisions as well as your cash payment will be kept completely confidential. Remember that
you have to make your decisions entirely on your own; do not discuss your decisions with any other
participants.
Upon completion of the study, you will receive your cash payment. You will be asked to sign your
name to acknowledge your receipt of the payment. You are then free to leave.
If you have any questions, please raise your hand now. We will answer questions individually. If
there are no questions, we will begin with the study.
38
B
Screen Shots of Z-tree
Figure 9: Member A’s Decision Screen
Figure 10: Member B’s Decision Screen
39
References
´ s-Ferrer, C., and N. Netzer (2010): “The logit-response dynamics,” Games and Economic
Alo
Behavior, 68(2), 413 – 427.
Arthur, W. B. (1989): “Competing Technologies, Increasing Returns, and Lock-In by Historical
Events,” Economic Journal, 99(394), 116–131.
Bergin, J., and B. L. Lipman (1996): “Evolution with State-Dependent Mutations,” Econometrica,
64(4), 943–956.
Binmore, K., and L. Samuelson (1997): “Muddling Through: Noisy Equilibrium Selection,”
Journal of Economic Theory, 74(2), 235 – 265.
Blume, L. E. (1993): “The Statistical Mechanics of Strategic Interaction,” Games and Economic
Behavior, 5(3), 387 – 424.
(2003): “How noise matters,” Games and Economic Behavior, 44(2), 251 – 271.
Boncinelli, L., and P. Pin (2012): “Stochastic stability in best shot network games,” Games and
Economic Behavior, 75(2), 538 – 554.
´, Y., and R. Kranton (2007): “Public goods in networks,” Journal of Economic
Bramoulle
Theory, 135(1), 478–494.
Carlsson, H., and E. v. Damme (1993): “Global Games and Equilibrium Selection,” Econometrica,
61(5), 989–1018.
Charness, G., F. Feri, M. A. Melendez-Jimenez, and M. Sutter (2014): “Experimental
Games on Networks: Underpinnings of Behavior and Equilibrium Selection,” Econometrica, 82(5),
1615–1670.
Chen, Y., and S. X. Li (2009): “Group Identity and Social Preferences,” American Economic
Review, 99(1), 431–57.
Cheung, Y.-W., and D. Friedman (1997): “Individual Learning in Normal Form Games: Some
Laboratory Results,” Games and Economic Behavior, 19(1), 46 – 76.
Costa-Gomes, M. A., V. P. Crawford, and N. Iriberri (2013): “Structural Models of Nonequilibrium Strategic Thinking: Theory, Evidence, and Applications,” Journal of Economic Literature,
51.
Crawford, V. P. (1991): “An “evolutionary” interpretation of Van Huyck, Battalio, and Beil’s
experimental results on coordination,” Games and Economic Behavior, 3(1), 25–59.
(1995): “Adaptive Dynamics in Coordination Games,” Econometrica, 63(1), 103–143.
40
Ellison, G. (1993): “Learning, Local Interaction, and Coordination,” Econometrica, 61(5), 1047–
1071.
(2000): “Basins of Attraction, Long-Run Stochastic Stability, and the Speed of Step-by-Step
Evolution,” Review of Economic Studies, 67(1), 17–45.
Farrell, J., and G. Saloner (1985): “Standardization, Compatibility, and Innovation,” RAND
Journal of Economics, 16(1), 70–83.
Fischbacher, U. (2007): “z-Tree: Zurich toolbox for ready-made economic experiments,” Experimental Economics, 10(2), 171–178–178.
Foster, D., and P. Young (1990): “Stochastic evolutionary game dynamics,” Theoretical Population biology, 38, 219–232.
Foster, D. P., and H. P. Young (2006): “Regret testing: learning to play Nash equilibrium
without knowing you have an opponent,” Theoretical Economics, 1(3), 341–367.
Freidlin, M. I., and A. D. Wentzell (1998): Random Perturbations of Dynamical Systems
(Grundlehren der mathematischen Wissenschaften). New York: Springer Verlag.
Friedman, D., and R. Oprea (2012): “A Continuous Dilemma,” American Economic Review,
102(1), 337–63.
Fudenberg, D., and D. K. Levine (1998): The Theory of Learning in Games (Economic Learning
and Social Evolution). The MIT Press, Cambridge MA.
Galeotti, A., S. Goyal, M. O. Jackson, F. Vega-Redondo, and L. Yariv (2010): “Network
Games,” The Review of Economic Studies, 77(1), 218–244.
Harsanyi, J. C., and R. Selten (1988): A General Theory of Equilibrium Selection in Games.
MIT Press.
Hart, S., and A. Mas-Colell (2003): “Uncoupled Dynamics Do Not Lead to Nash Equilibrium,”
American Economic Review, 93(5), 1830–1836.
Hwang, S.-H., S. Naidu, and S. Bowles (2013): “Social Conflict and the Evolution of Unequal
Conventions,” Discussion paper, Columbia University.
Jackson, M. O., and A. Watts (2002): “On the formation of interaction networks in social coordination games,” Games and Economic Behavior, 41(2), 265–291.
Kandori, M., G. J. Mailath, and R. Rob (1993): “Learning, Mutation, and Long Run Equilibria
in Games,” Econometrica, 61(1), 29–56.
Kandori, M., and R. Rob (1995): “Evolution of Equilibria in the Long Run: A General Theory
and Applications,” Journal of Economic Theory, 65(2), 383–414.
41
Katz, M. L., and C. Shapiro (1985): “Network Externalities, Competition, and Compatibility,”
American Economic Review, 75(3), 424–440.
Kohlberg, E., and J.-F. Mertens (1986): “On the Strategic Stability of Equilibria,” Econometrica,
54(5), 1003–1037.
Lewis, D. K. (1969): Convention: a Philosophical Study. Cambridge, Mass: Harvard University
Press.
Luce, R. (1959): Individual Choice Behavior. New York: Wesley.
Maes, M., and H. H. Nax (2014): “A Behavioral Study of ‘Noise’ in Coordination Games,” Available
at SSRN: http://ssrn.com/abstract=2521119 or http://dx.doi.org/10.2139/ssrn.2521119.
Maruta, T. (2002): “Binary Games with State Dependent Stochastic Choice,” Journal of Economic
Theory, 103(2), 351 – 376.
McKelvey, R. D., and T. R. Palfrey (1995): “Quantal Response Equilibria for Normal Form
Games,” Games and Economic Behavior, 10(1), 6–38.
Mertens, J.-F. (1989): “Stable Equilibria: A Reformulation Part I. Definition and Basic Properties,”
Mathematics of Operations Research, 14(4), 575–625.
(1991): “Stable Equilibria: A Reformulation. Part II. Discussion of the Definition, and Further
Results,” Mathematics of Operations Research, 16(4), 694–753.
Morris, S., and H. S. Shin (2003): “Global Games: Theory and Applications,” in in “Advances in
Economics and Econometrics, the Eighth World Congress”, Dewatripont, Hansen and Turnovsky,
Eds.
Myerson, R. B. (1978): “Refinements of the Nash equilibrium concept,” International Journal of
Game Theory, 7(2), 73–80.
Naidu, S., S.-H. Hwang, and S. Bowles (2010): “Evolutionary bargaining with intentional idiosyncratic play,” Economics Letters, 109(1), 31 – 33.
Neary, P. R. (2012): “Competing conventions,” Games and Economic Behavior, 76(1), 301 – 328.
Neary, P. R. (2013): “Supplementing Stochastic Stability,” Discussion paper, University of London,
Royal Holloway.
Newton, J. (2012): “Coalitional stochastic stability,” Games and Economic Behavior, 75(2), 842–854.
Noldeke, G., and L. Samuelson (1993): “An Evolutionary Analysis of Backward and Forward
Induction,” Games and Economic Behavior, 5(3), 425–454.
Pak, M. (2008): “Stochastic stability and time-dependent mutations,” Games and Economic Behavior, 64(2), 650 – 665, Special Issue in Honor of Michael B. Maschler.
42
Peski, M. (2010): “Generalized risk-dominance and asymmetric dynamics,” Journal of Economic
Theory, 145(1), 216–248.
Robles, J. (1998): “Evolution with Changing Mutation Rates,” Journal of Economic Theory, 79(2),
207 – 223.
Samuelson, L. (1994): “Stochastic Stability in Games with Alternative Best Replies,” Journal of
Economic Theory, 64(1), 35–65.
Samuelson, L., and J. Zhang (1992): “Evolutionary stability in asymmetric games,” Journal of
Economic Theory, 57(2), 363–391.
Schelling, T. C. (1960): The Strategy of Conflict. Harvard University Press.
Shapley, L. S., and D. Monderer (1996): “Potential Games,” Games and Economic Behavior,
14, 124–143.
Tajfel, H., and J. Turner (1979):
Brooks/Cole.
An integrative theory of intergroup conflict.pp. 33–47.
Van Damme, E., and J. W. Weibull (2002): “Evolution in Games with Endogenous Mistake
Probabilities,” Journal of Economic Theory, 106(2), 296 – 315.
Van Huyck, J. B., R. C. Battalio, and R. O. Beil (1990): “Tacit Coordination Games, Strategic
Uncertainty, and Coordination Failure,” American Economic Review, 80(1), 234–48.
Voorneveld, M. (2000): “Best-response potential games,” Economics Letters, 66(3), 289 – 295.
Weber, R. A. (2006): “Managing Growth to Achieve Efficient Coordination in Large Groups,” The
American Economic Review, 96(1), pp. 114–126.
Yi, K.-O. (2009): “Payoff-dependent mistakes and q-resistant equilibrium,” Economics Letters,
102(2), 99 – 101.
(2011): “Equilibrium Selection with Payoff-Dependent Mistakes,” Discussion Paper 1115,
Research Institute for Market Economy, Sogang University.
Young, H. P. (1993): “The Evolution of Conventions,” Econometrica, 61(1), 57–84.
(1996): “The Economics of Convention,” The Journal of Economic Perspectives, 10(2), 105–
122.
(2001): Individual Strategy and Social Structure: An Evolutionary Theory of Institutions.
Princeton, NJ: Princeton University Press.
(2005): Strategic Learning and Its Limits (Arne Ryde Memorial Lectures Series). Oxford
University Press, USA.
43