Emergent Conventions and the Structure of Multi Agent Systems

Emergent Conventions and the Structure
of Multi{Agent Systems
James E. Kittock
Robotics Laboratory
Stanford University
[email protected]
Abstract
This paper examines the emergence of conventions through \co-learning" in a model multiagent system. Agents interact through a two-player game, receiving feedback according to
the game's payo matrix. The agent model species how agents use this feedback to choose
a strategy from the possible strategies for the game. A global structure, represented as a
graph, restricts which agents may interact with one another. Results are presented from
experiments with two dierent games and a range of global structures. We nd that for
a given game, the choice of global structure has a profound eect on the evolution of the
system. We give some preliminary analytical results and intuitive arguments to explain
why the systems behave as they do and suggest directions of further study. Finally, we
briey discuss the relationship of these systems to work in computer science, economics,
and other elds.
To appear in:
Lynn Nadel and Daniel Stein, editors, 1993 Lectures in Complex Systems: the proceedings
of the 1993 Complex Systems Summer School, Santa Fe Institute Studies in the Sciences of
Complexity Lecture Volume VI. Santa Fe Institute, Addison-Wesley Publishing Co., 1994.
Available as:
http://robotics.stanford.edu/people/jek/Papers/sfi93.ps
This research was supported in part by grants from the Advanced Research Projects Agency
and the Air Force O ce of Scientic Research.
1
1 Introduction
Conventions are common in human society, including such disparate things as standing in line and trading currency for goods. Driving an automobile is a commonplace
task which requires many conventions|one can imagine the chaos that would result
if each driver used a completely dierent set of strategies. This example is easy to
extend into an articial society, as autonomous mobile robots would also need to
obey trac laws. Indeed, it appears that conventions are generally necessary in
multi-agent systems: conventions reduce the potential for conict and help ensure
that agents can achieve their goals in an orderly, ecient manner.
In 5], Shoham and Tennenholtz introduced the notion of emergent conventions. In contrast with conventions which might be designed into agents' behavior
or legislated by a central authority, emergent conventions are the result of the
behavioral decisions of individual agents based on feedback from local interactions.
Shoham and Tennenholtz extended this idea into a more general framework, dubbed
co-learning 6]. In the co-learning paradigm, agents acquire experience through
interactions with the world, and use that experience to guide their future course of
action. A distinguishing characteristic of co-learning is that each agent's environment consists (at least in part) of the other agents in the system. Thus, in order for
agents to adapt to their environment, they must adapt to one another's behavior.
Here, we describe a modication of the co-learning framework as presented in 6]
and examine its eects on the emergence of conventions in a model multi-agent
system.
Simulation Model
We assume that most tasks that an agent might undertake can only be performed
in a limited number of ways actions are thus chosen from a nite selection of
strategies. A convention exists when most or all agents in a system are using
one particular strategy for a given task. We will consider a simplied system in
which the agents have only one abstract task to perform, and we will examine how
a convention for this task can arise spontaneously through co-learning, using a very
basic learning rule.
In our model system, each agent's environment is comprised solely of the other
agents. That is, the only feedback that agents receive comes from their interactions
with other agents. We model agent interactions using payo matrices analogous to
those used for two-person games in game theory. Each time two agents interact,
they receive feedback as specied by the payo matrix. It is this feedback the agents
will use to select their action the next time they interact. Systems similar to this
have been referred to as iterated games 3, 6, 7]. Each task has a corresponding
two-person iterated game. In this paper, we will consider two dierent games, which
represent the distinct goals of \coordination" and \cooperation."
Our modication to the co-learning setting is the addition of an interaction
graph which limits agent interactions. In the original study of emergent conven2
tions, any pair of agents could interact 5] we will restrict this by only allowing
interactions between agents which are adjacent on the interaction graph. Our primary objective is to explore the eects of this global structure on the behavior of
the system. In particular, we examine how the time to reach a convention scales
with the number of agents in the system for dierent types of interaction graph.
The basic structure of our simulations is as follows. We select a game to use
and specify an interaction graph. We create a number of agents, and each agent is
given an initial strategy. The simulation is run for a nite number of \time-steps,"
and during each time step, a pair of agents is chosen to interact. The agents receive
feedback based on the strategies they used, and they incorporate this feedback into
their memories. Using a learning algorithm we call the strategy update rule,
each agent then selects the strategy it will use the next time it is chosen to interact.
The system can either be run for a predetermined number of time-steps or be run
until a convention has been reached.
Overview
In the following section, the structure of the simulation model is explained in more
detail. In Section 3, we describe some results of experiments with these systems.
Section 4 puts forth our preliminary analytic and intuitive understanding of these
systems. In Section 5 we discuss possibilities for further research and the relationship of these experiments to work in a number of other elds.
2 Simulating Agent Societies
In order to conduct experiments on articial agent societies, we must choose a
way to model them. Since the simulations detailed in this paper are intended only
to explore some basic issues, the model used is deliberately simple. We envision
agents existing in an environment where they choose actions from a nite repertoire
of behaviors. When an agent performs an action, it aects the environment, which
in turn aects the agent. That is, an agent receives feedback as a result of its
behavior. When investigating emergent conventions, we are primarily concerned
with how the agents are aected by each others' behavior. Thus, in the present
implementation of our system, all feedback that agents receive is due to their mutual
interactions|the agents are each others' environment.
2.1 The Agent Model
In each simulation there is a xed, nite number, N , of agents. Each agent has two
dening characteristics, its strategy and its memory.
For a given task, s , the strategy of agent k, is chosen from a set = f 1 : : : g
of S distinct abstract strategies. It is important to note that does not represent
any particular suite of possible actions rather, it serves to model the general situation where multiple strategies are available to agents. We will only consider the two
k
S
3
strategy case similar systems with more than two strategy choices are discussed
in 5].
The memory of agent k, M , is of maximum size , where is the memory
size, a parameter of the agent model. An agent's memory is conveniently thought
of as a set, each element of which is a feedback event. An event m 2 M is written
as a triple, m = ht(m) s(m) f(m)i, where f(m) is the feedback the agent received when
using strategy s(m) at time t(m). An agent uses the contents of its memory to select
the strategy it will use the next time it interacts with another agent.
k
k
2.2 Modelling Interactions
Once the structure of individual agents is specied, we must decide how the agents
interact with their environment. We introduce the concept of an interaction graph
to specify which agents can interact with one another, and we use the payo matrix
of a two-player game to determine agents' feedback.
2.2.1 Interaction Graph
In other, similar models, it was possible for any pair of agents to interact 5, 6].
To explore the eects of incomplete mixing of agents, we specify an interaction
graph, I , which has N vertices, each representing one of the agents in the system.
An edge connecting vertices i and j in I indicates that the pair of agents (i j ) may
be chosen to interact by playing the specied game. Interacting pairs are chosen
randomly and uniformly from all pairs allowed by I . Note that when I is K , the
complete N -vertex graph, the present model is equivalent to those which allowed
complete mixing of the agents.
N
C 6,1 = C 6
C 6,2
C 6,3 = K 6
Figure 1: Relationship between C , C , and K .
N
N r
N
To facilitate investigation of the eects of the structure of I , we dene a class
of graphs representing agents arranged on a circular lattice with a xed interaction
radius.
Denition 1 (C ) C
is the graph on N vertices such that vertex i is adjacent to vertices (i + j ) mod N and (i ; j ) mod N for 1 j r. We call r the
N r
N r
4
interaction radius of C .1 C C
r b 2 c C = K .
N r
N
N r
N
N
1
is the cycle on N vertices. Note that for
N
See Figure 1 for an illustration of the denition. We note that while this is a
somewhat arbitrary choice of structure (i.e. why not a two-dimensional grid or a
tree structure?), it does yield interesting and illuminating results, while avoiding
the added complexity we might expect from a more elaborate structure.
The interaction graph is a general way to model restrictions on interactions.
Such restrictions may be due to any number of factors, including hierarchies, physical separations, communication links, security barriers, etc. Whatever its origin,
the structure of I will be seen to have a substantial eect on the behavior of the
systems we examine.
2.2.2 Two-Player Games
Once we have determined which pairs of agents are allowed to interact, we must
specify what game the agents will play. We examine two games, the iterated cooperation game (ICG) and the iterated prisoner's dilemma (IPD).
ICG
IPD
A
B
C
D
A +1 +1 ;1 ;1
C +2 +2 ;6 +6
B ;1 ;1 +1 +1
D +6 ;6 ;5 ;5
Table 1: Payo matrices for coordination game and prisoner's dilemma.
ICG is a \pure coordination" game 3], with two possible strategies, labelled
A and B. When agents with identical strategies meet, they get positive feedback,
and when agents with dierent strategies meet, they get negative feedback. The
payo matrix is specied in Table 1. This game is intended to model situations
where: 1) from the point of view of interaction with the world, the two available
strategies are equivalent and there is no a priori way for agents to choose between
them, and 2) the two strategies are mutually incompatible. A simple example of
such a situation is driving on a divided road: one can either drive on the left or
on the right, but it is sub-optimal if some people do one and some people do the
other. In this case, our goal is for the agents to reach any convention|either with
strategy A or with strategy B.
IPD has two available strategies, labelled C and D. The payo matrix is detailed
in Table 1. This game is designed to model situations where two agents benet
from cooperating (strategy C), but there is also the potential to get a large payo
by defecting (strategy D) if the other player cooperates. However, if both agents
defect, they are both punished. Our goal here is for the agents to reach a convention
with strategy C, which indicates the agents are all cooperating.
1
Not to be confused with the graph-theoretic radius, which is something else altogether.
5
The prisoner's dilemma has been examined extensively, in particular by Axelrod:Evolution in his classic book 1]. The relationship between previous work
with prisoner's dilemma and the present experiments will be briey discussed in
Section 5.
2.2.3 Strategy Selection
Once the agent model and game are specied, our nal step in dening the system is
to determine how agents choose the strategy that they will use. In these experiments
we use a version of the Highest Current Reward (HCR) strategy update rule 6].2
The current reward for a strategy is the total remembered feedback for using
that strategy, i.e. for strategy the current reward is the sum of f(m) for all feedback
events m in the agent's memory such that s(m) = . We can now dene the HCR rule
(in the two-strategy case) as: \If the other strategy has a current reward greater
than that of the current strategy, change strategies." Note that HCR is performed
after the feedback event from the interaction which has just occurred is added to
the agent's memory, i.e. HCR is performed on a set M = M fm g, where M
is the memory of agent k at time t and m is the feedback event agent k records
at time t. Once an agent's next strategy has been chosen, the agent's memory is
updated by incorporating the event which the agent just received and discarding
the oldest event:
M +1 = M fm g ; farg mint t(m)g:
0
t
k
t
k
t
k
t
k
t
k
t
k
t
k
k
M
Agents apply the HCR rule immediately after receiving feedback, and their strategies are considered to be updated instantaneously. Agents which were not chosen
to interact at time t do nothing, so we have simply M +1 = M for those agents.
t
k
t
k
3 Experimental Results
Before we proceed with a look at some results from our simulations, a word is in
order about how we compare the behavior of the various possible systems.
3.1 Performance Measures
In the present situation, the most obvious performance criterion is \how well" the
system reaches a convention. For an ICG system, the goal is to have all of the
agents using the same strategy, but we do not care which particular strategy the
agents are using. On the other hand, for an IPD system, we want all agents to be
cooperating. Thus, we have dierent notions of convergence for the two systems.
We dene C , the convergence of a system at time t, as follows. For ICG, the
t
2 It should be noted that the present denition of memory is slightly dierent than that found
in 6]. There, memory was assumed to record a xed amount of \time" during which an agent
might interact many, few, or no times. Here, memory refers explicitly to the number of previous
interactions in which an agent has participated that it remembers.
6
convergence is the fraction of agents using the majority strategy (either A or B)
for IPD, the convergence is the fraction of agents using the cooperate strategy (C).
Note that convergence ranges from 0.5 to 1 for ICG and from 0 to 1 for IPD. Given
this denition of convergence, we can also dene T , the convergence time for a
simulation: the convergence time for a given level of convergence c is the earliest
time at which C c. In this paper, we will use \time" and \number of interactions"
interchangeably. Thus, when we speak about \time t", we are referring to the point
in the evolution of the system when t interactions have occurred.
We use two dierent measures of performance which arise from these denitions.
The rst measure is average time to a xed convergence. In this case, we run
simulations until a xed convergence level is reached and note how long it took.
When our concern is that the system reach a critical degree of convergence as
rapidly as possible, this is a useful measure. However, some systems will never
converge on practical timescales, and yet may have interesting behavior which will
not be evident from timing data. The second measure we use is average convergence
after a xed time. We simply run simulations for a specied amount of time and
note the convergence level at the end of the run. We nd that this is often one
of the most revealing measures. However, for systems where the convergence over
time is not generally monotonic, this measure is eectively meaningless.
There are, of course, other possible measures of performance, such as probability
of achieving a xed convergence after a xed time (used in 5, 6]) and maximum
convergence achieved in a xed amount of time. We have chosen the measures which
we found most revealing for the issues at hand.
c
t
3.2 Eects of Interaction Graph Topology
Unless otherwise specied, the simulations were run with one hundred agents, each
agent had an equal probability of having the two possible strategies as its initial
strategy, and the data were averaged over one thousand trials with dierent random
seeds. We used memory sizes of = 1 and = 0 for ICG and IPD, respectively
(note that with = 0, agents change strategies immediately upon receiving negative
feedback) experimentation showed that these memory sizes were among the most
ecient for the HCR rule.
We will rst consider the extreme cases, where all agents are allowed to interact
with one another (I = K ) and where agents are only allowed to interact with
their nearest neighbors on the one-dimensional lattice (I = C ). One of our most
interesting discoveries was the radically diering behavior of HCR with ICG and
IPD as a function of the structure of I . The experimental data are presented in
Figure 2, which shows the time to achieve 90% convergence as a function of the
number of agents for both games on K and C .
The performance of the HCR rule with ICG is reasonable for both cases of I .
The linear form of the data on the log-log plot indicates that ICG systems can be
expected to converge in polynomial time on both K and C . For intermediate
interaction radii, the performance of the ICG systems is somewhere between that
N
N
N
N
N
7
N
106
+
105
104
103
10
2
+
++++ 22
+
+
+ 2
+
++++ 2
+
+2
3 3
3
3
3
3
ICG, I = K
IPD, I = K
ICG, I = C
IPD, I = C
N
N
N
2
10
3 3
N
100
1000
Number of Agents
3
+
2
10000
Figure 2: T90% vs. N for ICG and IPD with dierent congurations of I .
for C and K .
For IPD, the story is dierent. Using the HCR rule and working with a system
equivalent to our IPD system on K , Shoham and Tennenholtz write, \
HCR]
is hopeless in the cooperation setting" 6]. They had discovered what we see in
Figure 2: convergence time for IPD on K appears to be at least exponential in
the number of agents. HCR is redeemed somewhat by its performance with IPD on
C , which appears to be polynomial, and is possibly linear. On C 2 (not shown),
the IPD system still manages to converge in reasonable time, but for all interaction
radii greater than two, it once again becomes \hopeless."
In general, it appears that the particular choice of I has a drastic eect on the
way system performance scales with system size.
N
N
N
N
N
N
I =K
I=C
ICG T90% / N log N T90% / N 3
IPD
T90% / c
T90% / N
N
N
N
Table 2: Possible relationships between T90% and the number of agents for dierent
interactiong graph structures.
Possible functional relations between expected values of T90% and N are summarized in Table 2 they were derived from tting curves to the simulation results
and are merely descriptive at this stage.
8
4 Analysis
In this section, we aim to give a avor for some of the ways we can pursue an
understanding of the behavior of both the coordination game and the prisoner's
dilemma.
4.1 Iterated Coordination Game
1
0.9
0.8
0.7
0.6
C3000 vs. Interaction Radius
3
33
0
2
333
4
6
33
333
3
3333
333
ICG, N = 100 3
8 10 12 14
Interaction Radius (r)
16
18
20
Figure 3: ICG: C3000 as a function of r, I = C100 .
r
To begin our investigation of the relationship between ICG performance and the
structure of I , we look to Figure 3, which shows how performance (measured as
convergence after a xed time) varies with the interaction radius for agents on
C100 . Empirically, we nd that performance increases with increasing interaction
radius. Thus, we are lead to ask, what properties of I vary with the interaction
radius? Two important ones are the vertex degree and the graph diameter.
The degree of a vertex is the number of edges containing that vertex. In the
present case, all vertices of I have the same degree, so we can speak of the vertex
degree, , of I . For C = 2r (restricted to 2 N ; 1). Thus, as r increases,
each agent can interact with more agents.
The diameter of a graph is the maximum longest path between any two vertices,
and it provides a lower limit on the time for information to propagate throughout
the graph. As r increases, the diameter of C decreases, and we expect that the
time for information to travel among the agents will decrease as well.
We speculate that either or both of these properties of I aect the observed
performance. However, for C , the diameter is d 2 e = d e, so it is closely related
to the vertex degree. To test the relative importance of graph diameter and vertex
degree, it would be useful to construct a set of graphs for which one property
(diameter or vertex degree) is constant, while the other property varies. Initially,
we would also like to keep our graphs symmetric with respect to each vertex, to
r
N r
N r
N
r
N r
9
N
C3000 vs. Diameter of I
0.9
0.8
0.7
0.6
6
8
10
ICG, N = 100 12
14 16 18
Graph Diameter
20
22
24
26
Figure 4: ICG: C3000 plotted against the diameter of I , for I = D100 , 2 49.
avoid introducing eects due to inhomogeneity.
It turns out to be straightforward to construct a symmetric graph of xed vertex
degree. As a test case, we dene the class of graphs D such that an edge connects
vertex i with vertices i + 1 i ; 1 i + and i ; (all mod N ). For 2 < b 2 c,
the vertex degree is xed at four, and a variety of diameters result (we can measure
the diameter of each graph using, e.g. Dijkstra's algorithm 8]). Once we have
measured the performance of ICG on each graph D , we can plot performance
against diameter, as seen in Figure 4.
We see that there is a correlation between the diameter of an interaction graph
and the performance of an ICG system on that graph. We have hinted that this
may be a function of the speed with which information can ow among the agents.
However, more work is necessary to determine precisely how and why the diameter,
vertex degree, and other graph properties of I aect the performance of an ICG
system. It will also take further study to prove (or disprove) the relationships
between expected convergence time for ICG and number of agents proposed in
Table 2.
N N
N 4.2 Iterated Prisoner's Dilemma
It was seen in Section 3 that IPD behaves quite dierently from ICG with respect
to the structure of I . For r = 1, IPD on C converges quite rapidly, but for large
r, it does not converge on any reasonable time scale. We can get some intuition
for why this is if we think in terms of \stable" cooperative agents. A cooperative
agent is stable at a given time if it is guaranteed to interact with a cooperative
agent. On K , we have an all-or-none situation: an agent can only be stable if all
of the agents are cooperative. In contrast, on C an agent need only have its two
neighbors be cooperative to be stable. As yet, we have not extended this notion to
a formal analysis. However, for IPD on K , we can give an analytical argument
N r
N
N
N
10
for the relatively poor performance of HCR in our experiments (recall that I = K
shows a dramatic increase in convergence time as the number of agents is increased,
as seen in Figure 2).
N
2
1.5
1
0.5
0
-0.5
Expected change in N vs. C
C
h N i
C
0
0.2
0.4
0.6
Convergence Level (C )
0.8
1
Figure 5: Expected change in number of cooperative agents as a function of the
convergence level for IPD, I = K .
N
We begin by computing the expected change in the number of cooperative agents
as a function of the convergence level. Since we are considering agents with memory
size = 0, whenever two agents using strategy D meet, they will both switch to
using strategy C (because they will both get negative feedback, as seen in Table 1).
When an agent using strategy C encounters an agent using strategy D, the agent
with strategy C will switch to strategy D. When two agents with strategy C meet,
nothing will happen. Now we can compute the expected change in the number of
cooperative agents as a function of the probabilities of each of these meetings taking
place:
h N i = 2 p (DD) ; 1 p (CD) + p (DC)] :
These probabilities are functions of the number of agents using strategy C and hence
the convergence level of the system. Thus, we can compute the expected change
in the number of cooperative agents as a function of the convergence level this
function is plotted in Figure 5. Note that for C > 0:5 we can actually expect the
number of cooperative agents to decrease. This partially explains the reluctance of
the IPD system to converge|as the system gets closer to total convergence it tends
to move away, and towards 50% convergence. Note that before converging, the IPD
system must pass through a state in which just two agents have strategy D. We can
calculate that in this state, the probability that the system will move away from
convergence is a factor of O(N ) greater than the probability that the system will
move towards convergence.
Thus, we see that there is essentially a probabilistic barrier between the fully
converged state and less converged states. We can approximate this situation by
assuming that the system repeatedly attempts to cross the barrier all at once.
C
11
The expected time for the system to achieve the fully convergent state is then
inversely proportional to the probability of the system succeeding in an attempt:
1 . A straightforward analysis shows that p
hTconv i / pconv
) for some
conv O (c
c > 1, so hTconv i O(c ), which correlates with what we saw experimentally.
On C , analysis is complicated by the fact that not all states with the same
convergence level are isomorphic (for example, consider the case where agents have
alternating strategies around the lattice versus the case where agents 1 : : : 2 have
one strategy and agents 2 + 1 : : : N have the other strategy). Thus, our analysis
would require a dierent methodology than that used for K . Experimentaly, we
see that the order imposed by I allows the formation of stable groups of cooperating agents. These groups tend to grow and merge, until eventually all agents are
cooperative. It is hoped that continued work with the idea of \stable" agents will
lead to a more complete understanding of the relationship between performance
and the topology of I .
As a nal note, Shoham and Tennenholtz have proven a general lower bound
of N log N on the convergence time of systems such as these 6], which appears to
contradict our assertion that T90% appears to be proportional to N . However, in
the general case T100% need not be proportional to T90% , because the nal stages
of convergence might be much less likely to take place. Experimental data for IPD
on C indicate that while T90% / N , T100% / N log N .
The arguments presented in this section are an attempt to explain the results
of the simulations they are more theory than theorem. Deriving tight bounds on
the performance for any of these systems is an open problem which will most likely
require an appeal to both rigorous algorithmic analysis and dynamical systems
theory.
;
;N
N
N
N
N
N
N
5 Discussion
We have seen that a wide variety of interesting and often surprising behaviors can
result from a system which is quite simple in concept. Further analytic investigation
is necessary to gain a clear theoretical understanding of the origins and ramications
of the complexity inherent in this multi-agent framework. To move from the current
system to practical applications will also require adding features to the model that
reect real-world situations, such as random and systemic noise and feedback due to
other environmental factors. A number of possible applications to test the viability
of this framework are presently under consideration, including distributed memory
management and automated load balancing in multiprocessor systems.
The systems discussed here have ties to work in a number of other areas both
within and outside of computer science. Co-learning has fundamental ties to the
machine learning subeld of articial intelligence. The Highest Current Reward
strategy update rule provides another link to machine learning, as it is essentially a
basic form of reinforcement learning. This leads to another possibly fruitful avenue
of investigation: systems of substantially more sophisticated agents. Schaerf et al
12
have used co-learning with a more sophisticated learning rule to investigate load
balancing without central control in a model multi-agent system 4]. Our present
framework also has ties to theoretical computer science, especially when we view
either individual agents or the entire system as nite state machines.
Readers familiar with the Prisoner's Dilemma and its treatment in game theory and economics have probably noticed that our approach is markedly dierent.
Our emphasis on feedback-based learning techniques violates some of the basic assumptions of economic cooperation theory 1, 6]. In particular, we do not allow for
any meta-reasoning by agents that is, our agents do not have access to the payo
matrix and thus can only make decisions based on their experience. Furthermore,
agents do not know the specic source of their feedback. They do not see the actions
which other agents take and, indeed, have no means of distinguishing feedback due
to interactions with other agents from feedback due to other environmental factors.
In our framework, agents must learn solely based on the outcome of their actions.
In some respects, this may limit our systems, but it also allows for a more general
approach to learning in a dynamic environment. The current interest in economics
with \bounded rationality" has led to some work which is closer in spirit to our
model 3].
The systems discussed in this paper (and multi-agent systems in general) are
also related to various other dynamical systems. Ties to population genetics are
suggested both by the resemblance of the spread of convention information through
an agent society to the spread of genetic information through a population and by
the possible similarity of the selection of behavioral conventions to the selection of
phenotypic traits. There are also links to statistical mechanics, which are exploited
more thoroughly in other models of multi-agent systems which have been called
\computational ecologies" 2]. For a more thorough discussion of the relationship
of the present framework to other complex dynamic systems, see 6].
6 Conclusion
We have seen that the proper global structure is required if conventions are to
arise successfully in our model multi-agent system, and that this optimal structure
depends upon the nature of the interactions in the agent society. Social structures
which readily allow conventions of one sort to arise may be completely inadequate
with regards to other conventions. Designing multi-agent systems with the capacity
to automatically and locally develop behavioral conventions has its own unique
diculties and challenges emergent conventions are not simply a panacea for the
problems of o-line and centralized social legislation methods. However, the study
of emergent conventions is in its earliest stages and still has potential for improving
the functionality of multi-agent systems. Furthermore, the framework presented
here invites creative design and investigative eorts which may ultimately borrow
ideas from|and share ideas with|the broad range of subjects loosely grouped
under the heading \complex systems."
13
Acknowledgements
I would like to thank Yoav Shoham for introducing me to this interesting topic, and
to Marko Balabanovi!c and Tom!as Uribe for reading and commenting on a draft of
this paper.
References
1] R. Axelrod. The Evolution of Cooperation. New York: Basic Books, 1984.
2] Bernardo Huberman and Tad Hogg. The behavior of computational ecologies.
In Bernardo Huberman, editor, The Ecology of Computation. Elsevier Science
Publishers B.V., 1988.
3] M. Kandori, G. Mailath, and R. Rob. Learning, mutation and long equilibria
in games. Econometrica, 61:29{56, 1993.
4] Andrea Schaerf, Moshe Tennenholtz, and Yoav Shoham. Adaptive load balancing: a study in co-learning. Draft manuscript, 1993.
5] Yoav Shoham and Moshe Tennenholtz. Emergent conventions in multi-agent
systems: initial experimental results and observations. In KR-92, 1992.
6] Yoav Shoham and Moshe Tennenholtz. Co-learning and the evolution of social
activity. Submitted for publication, 1993.
7] Karl Sigmund. Games of Life: Explorations in Ecology, Evolution, and Behaviour. Oxford University Press, 1993.
8] Steven Skiena. Implementing Discrete Mathematics. Addison-Wesley Publishing
Co., 1990.
14