EWA Model - Ideas Archive

Experience Weighted Attraction Model
Parameterization for the Traveler’s Dilemma
Richard Schwinn
April 29, 2015
Abstract
As a component of my candidature for the Researcher position at
NUS, I estimated the experience weighted attraction (EWA) model for
Capra, Goeree, Gomez, Holt, and Capra [1999]’s experimental data.
In order to give a transparent and honest appraisal of my abilities,
I did not consult any person, forum, or other resource apart from
the following published references: Capra, Goeree, Gomez, Holt, and
Capra [1999], Camerer [2003], Camerer and Ho [1999], Ho, Camerer,
and Chong [2002], Ichimura and Bracht [2001], Steenbergen [2006],
Uwasu [2007], Xiaojing, Wei, Jia, Linjie, and Jingning [2012].1 .
1
Overview
Belief learning incorporates people’s imaginations into their decision making. By accounting for the payoffs of forgone strategies, belief learners base
their decisions as much upon choices which they might have pursued but
did not, as on choices actually chosen. The reinforcement approach assumes
that a person’s propensity to pursue a strategy is determined purely on the
basis of previous payoffs personally experienced. Thus reinforcement learners only consider past payoffs due to specific strategies. The EWA model
combines the main features of belief learning and reinforcement learning. In
this paper I estimate the Experience-Weighted Attraction (EWA) model for
Capra et al. [1999]’s experimental data.
1
April 28, 2015 I accepted a position with the US Small Business Administration’s
Office of Advocacy. Nevertheless, I submit this paper since (1) the position I accepted
is not guaranteed until I clear the Office of Personnel Management (OPM) and (2) I
enjoyed this challenge and am interested in the possibility of collaborating with cutting
edge game-theorist like Ho Teck-Hua and Ryan O. Murphy. I may be contacted via
[email protected]
1
1.1
The Traveler’s Dilemma
The following is a modified description of the Travelers Dilemma story based
mostly on Capra et al. [1999]’s description:
Suppose two travelers purchase identical antiques while on a
tropical vacation. Their luggage is lost and the airline asks them
to make independent claims for compensation. In anticipation
of excessive claims, the airline announces that they will honor
any claim between $80 and $200 (in $1 increments), but that
each will be reimbursed an amount that equals the minimum of
the two claims submitted. Additionally, if the two claims differ,
a reward of $R will be paid to the person making the smaller
claim and a penalty of $R we will be deducted from the high
claimant’s claim.2
The Nash equilibrium for this strategic situation is for both players to
claim $80. Experimental data demonstrate that people often make claims
higher than $80 and that these outcomes tend to evolve stably over time.
The experimental data is used to estimate that roles that imagination (i.e.
beliefs) and reinforcement play in determining these outcomes. See Camerer
and Ho [1999] for insight into the underlying theory.
1.2
Estimation Choices
Important notes on the estimations:
• In order to avoid over-parameterization, I employ an uninformed Dirichletlike distribution for the initial strategies. This means that the players
view each discrete strategy from $80 to $200 as being equally attractive
before the first round of play.
• I did not restrict the parameters due to convergence & time issues.3 In
many instances, this does not seem to have played a role in the results.
If further interest in my work is indicated, I will gladly devote computing time to the slower, but more appropriate, Limited Memory BroydenFletcherGoldfarbShanno (LM-BFGS) constrained optimization.
• The imagination factor, δ, measures how carefully agents incorporate
forgone strategies into their decision making. A δ = 1 suggests complete imagination (i.e. belief learning).
2
3
R’s equal to 5, 10, 20, 25, 50, and 80 are considered in our data.
The code is far from optimized for speed
2
Experience Weighted Learning
Rel. Attraction
Strategy
Time
R=
5
Figure 1: Relative Attractiveness for R = 5
• The data for R = $10, player 2 is corrupt. In order to offer a compete
set of results, the ceiling of the mean of the strategies before and after
each bad entries was used.
2
Results
The tables below list the estimated parameters for each of the data sets. The
figures contain smoothed surfaces displaying the average relative attractiveness matrices. Relative attractiveness for each strategy is calculated as the
strategy’s average attractiveness across the players as a fraction of the sum
of total attractivenesses for all strategies in each round of play round.
R
δ
φ
ρ
λ
N (0)
(reward)
(imagination factor)
(attraction decay)
(experience decay)
(payoff sensitivity)
(initial experience)
5
0.7932
0.5166
0.6936
0.1055
1.105
Table 1: Parameters for EWA estimation with R = 5
3
Experience Weighted Learning
Rel. Attraction
Strategy
Time
R=
10
Figure 2: Relative Attractiveness for R = 10
R
δ
φ
ρ
λ
N (0)
(reward)
(imagination factor)
(attraction decay)
(experience decay)
(payoff sensitivity)
(initial experience)
10
0.7231
0.333
0.7173
0.07345
1.108
Table 2: Parameters for EWA estimation with R = 10
4
Experience Weighted Learning
Rel. Attraction
Strategy
Time
R=
20
Figure 3: Relative Attractiveness for R = 20
R
δ
φ
ρ
λ
N (0)
(reward)
(imagination factor)
(attraction decay)
(experience decay)
(payoff sensitivity)
(initial experience)
20
9311
-0.01721
0.6201
-0.008463
2350
Table 3: Parameters for EWA estimation with R = 20
5
Experience Weighted Learning
Rel. Attraction
Strategy
Time
R=
25
Figure 4: Relative Attractiveness for R = 25
R
δ
φ
ρ
λ
N (0)
(reward)
(imagination factor)
(attraction decay)
(experience decay)
(payoff sensitivity)
(initial experience)
25
0.466
0.252
0.5858
0.05516
1.001
Table 4: Parameters for EWA estimation with R = 25
6
Experience Weighted Learning
Rel. Attraction
Strategy
Time
R=
50
Figure 5: Relative Attractiveness for R = 50
R
δ
φ
ρ
λ
N (0)
(reward)
(imagination factor)
(attraction decay)
(experience decay)
(payoff sensitivity)
(initial experience)
5
0.5105
-0.1185
0.8205
0.04019
1.187
Table 5: Parameters for EWA estimation with R = 50
7
Experience Weighted Learning
Rel. Attraction
Strategy
Time
R=
80
Figure 6: Relative Attractiveness for R = 80
R
δ
φ
ρ
λ
N (0)
(reward)
(imagination factor)
(attraction decay)
(experience decay)
(payoff sensitivity)
(initial experience)
80
0.5355
-0.03476
0.7112
0.04146
0.9573
Table 6: Parameters for EWA estimation with R = 80
8
3
Conclusion
Future work would involve cross-validation, actual probability density surfaces, and further results interpretation such as discussion of how higher
rewards, R, translate into strategies closer to the Nash equilibrium, $80.
R Code
The following code requires that the *.dat files are located in the R working
directory.
1 f o r (R i n c ( 5 , 1 0 , 2 0 , 2 5 , 5 0 , 8 0 ) ) {
2 #####
3 # Data
4 i f (R == 5 ) {
5 CHR = f l o o r ( read . table ( ’ Td5 . dat ’ ) − 7 9 ) # rows t r a c k time
and columns t r a c k p l a y e r s
6 CHC = f l o o r ( read . table ( ’ Tdo5 . dat ’ ) − 7 9 )
7 }
8
9 i f (R == 1 0 ) {
10 CHR = f l o o r ( read . table ( ’ Td10 . dat ’ ) − 7 9 ) # rows t r a c k time
and columns t r a c k p l a y e r s
11 CHC = f l o o r ( read . table ( ’ Tdo10 . dat ’ ) − 7 9 )
12 }
13
14
15 i f (R == 2 0 ) {
16 CHR = f l o o r ( read . table ( ’ Td20 . dat ’ ) − 7 9 ) # rows t r a c k time
and columns t r a c k p l a y e r s
17 CHC = f l o o r ( read . table ( ’ Tdo20 . dat ’ ) − 7 9 ) }
18
19
20 i f (R == 2 5 ) {
21 CHR = f l o o r ( read . table ( ’ Td25 . dat ’ ) − 7 9 ) # rows t r a c k time
and columns t r a c k p l a y e r s
22 CHC = f l o o r ( read . table ( ’ Tdo25 . dat ’ ) − 7 9 ) }
23
24
25 i f (R == 5 0 ) {
26 CHR = f l o o r ( read . table ( ’ Td50 . dat ’ ) − 7 9 ) # rows t r a c k time
and columns t r a c k p l a y e r s
27 CHC = f l o o r ( read . table ( ’ Tdo50 . dat ’ ) − 7 9 ) }
28
9
29
30 i f (R == 8 0 ) {
31 CHR = f l o o r ( read . table ( ’ Td80 . dat ’ ) − 7 9 ) # rows t r a c k time
and columns t r a c k p l a y e r s
32 CHC = f l o o r ( read . table ( ’ Tdo80 . dat ’ ) − 7 9 ) }
33
34 # #####
35 # p h i = 0 . 8
36 # rho = p h i # D e p r e c i a t i o n o f e x p e r i e n c e
37 # d e l t a = 0 . 5 # Weight g i v e n t o f o r g o n e s t r a t e g i e s
38 # lambda = 0 . 1
39 # DRPLN[ 5 ] = 1 # I n i t i a l b s e r v a t i o n e q u i v a l e n t s
40 # DRPLN = c ( phi , rho , d e l t a , lambda , DRPLN[ 5 ] )
41
42 #####
43 # Parameters
44 # R = 50 # Rewards
45 T = nrow(CHR)+1 # Number o f p e r i o d s
46 NP = ncol (CHR) # Number o f p l a y e r s
47
48 od <− options ( d i g i t s = 5 )
49
50 #####
51 # P a y o f f s
52 s = 8 0 : 2 0 0
53 PR = matrix ( rep ( s , 1 2 1 ) , 1 2 1 , 1 2 1 )
54 PR[ lower . t r i (PR, FALSE) ] <− 0
55 PC = t (PR)
56 diag (PC) = 0
57 o n e s = matrix ( rep ( 1 , 1 2 1 ˆ 2 ) , 1 2 1 , 1 2 1 )
58 o n e s [ lower . t r i ( ones , TRUE) ] = 0
59 PC = PR + PC + R∗t ( o n e s ) − R∗ o n e s
60 rownames(PC) = 8 0 : 2 0 0
61 colnames (PC) = 8 0 : 2 0 0
62 PR = t (PC) # PC I s t h e p a y o f f t o t h e column p l a y e r w h i l e
PR i s t h e p a y o f f t o Row
63
64 #####
65 # A t t r a c t i o n M a t r i c e s ( rows r e p r e s e n t p l a y e r s , c o l s
strategies )
66 A0 = matrix ( rep ( 1 / 1 2 1 ,NP) ,NP, 1 2 1 )
67 A1 = A0 # The f o l l o w i n g s i m p l y a l l o t memory t o t h e f u t u r e
attraction tables
68 A2 = A0
69 A3 = A0
10
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
A4 = A0
A5 = A0
A6 = A0
A7 = A0
A8 = A0
A9 = A0
A10 = A0
Tdilemma <− function (DRPLN) {
#####
# Experience Equivalents
N = NA
NN = NA
NN[ 1 : T ] = DRPLN[ 2 ] ˆ ( ( 1 : T) −1)∗DRPLN[ 5 ]
f o r ( i i n 1 :T) {N[ i ] = sum(NN[ 1 : i ] ) }
# F i r s t Period A t t r a c t i v e n e s s
f o r ( i i n 1 :NP) {
for ( j in 1 : 1 2 1 ) {
A1 [ i , j ] = (
DRPLN[ 3 ] ∗DRPLN[ 5 ] ∗A0 [ i , j ] + (DRPLN[1]+(1 −DRPLN[ 1 ] ) ∗ ( i f ( j==
CHR[ 1 , i ] ) {1} e l s e { 0 } ) ) ∗PR[ j ,CHC[ 1 , i ] ] ) / (N [ 1 ] )
92 }
93 }
94
95 # Second P e r i o d A t t r a c t i v e n e s s
96 f o r ( i i n 1 :NP) {
97 f o r ( j i n 1 : 1 2 1 ) {
98 A2 [ i , j ] = (DRPLN[ 3 ] ∗N [ 1 ] ∗A1 [ i , j ] + (DRPLN[1]+(1 −DRPLN[ 1 ] ) ∗ (
i f ( j==CHR[ 2 , i ] ) {1} e l s e { 0 } ) ) ∗PR[ j ,CHC[ 2 , i ] ] ) / (N [ 2 ] )
99 }
100 }
101
102 # Third P e r i o d A t t r a c t i v e n e s s
103 f o r ( i i n 1 :NP) {
104 f o r ( j i n 1 : 1 2 1 ) {
105 A3 [ i , j ] = (DRPLN[ 3 ] ∗N [ 2 ] ∗A2 [ i , j ] + (DRPLN[1]+(1 −DRPLN[ 1 ] ) ∗ (
i f ( j==CHR[ 3 , i ] ) {1} e l s e { 0 } ) ) ∗PR[ j ,CHC[ 3 , i ] ] ) / (N [ 3 ] )
106 }
107 }
108
109 # Fourth P e r i o d A t t r a c t i v e n e s s
110 f o r ( i i n 1 :NP) {
111 f o r ( j i n 1 : 1 2 1 ) {
11
112 A4 [ i , j ] = ( 1 /N [ 4 ] ) ∗ (
113 DRPLN[ 3 ] ∗N [ 3 ] ∗A3 [ i , j ] + (DRPLN[1]+(1 −DRPLN[ 1 ] ) ∗ ( i f ( j==CHR
[ 4 , i ] ) {1} e l s e { 0 } ) ) ∗PR[ j ,CHC[ 4 , i ] ] )
114 }
115 }
116
117 # F i f t h P e r i o d A t t r a c t i v e n e s s
118 f o r ( i i n 1 :NP) {
119 f o r ( j i n 1 : 1 2 1 ) {
120 A5 [ i , j ] = ( 1 /N [ 5 ] ) ∗ (
121 DRPLN[ 3 ] ∗N [ 4 ] ∗A4 [ i , j ] + (DRPLN[1]+(1 −DRPLN[ 1 ] ) ∗ ( i f ( j==CHR
[ 5 , i ] ) {1} e l s e { 0 } ) ) ∗PR[ j ,CHC[ 5 , i ] ] )
122 }
123 }
124
125 # S i x t h P e r i o d A t t r a c t i v e n e s s
126 f o r ( i i n 1 :NP) {
127 f o r ( j i n 1 : 1 2 1 ) {
128 A6 [ i , j ] = ( 1 /N [ 6 ] ) ∗ (
129 DRPLN[ 3 ] ∗N [ 5 ] ∗A5 [ i , j ] + (DRPLN[1]+(1 −DRPLN[ 1 ] ) ∗ ( i f ( j==CHR
[ 6 , i ] ) {1} e l s e { 0 } ) ) ∗PR[ j ,CHC[ 6 , i ] ] )
130 }
131 }
132
133 # S e v e n t h P e r i o d A t t r a c t i v e n e s s
134 f o r ( i i n 1 :NP) {
135 f o r ( j i n 1 : 1 2 1 ) {
136 A7 [ i , j ] = ( 1 /N [ 7 ] ) ∗ (
137 DRPLN[ 3 ] ∗N [ 6 ] ∗A6 [ i , j ] + (DRPLN[1]+(1 −DRPLN[ 1 ] ) ∗ ( i f ( j==CHR
[ 7 , i ] ) {1} e l s e { 0 } ) ) ∗PR[ j ,CHC[ 7 , i ] ] )
138 }
139 }
140
141 # E i g t h P e r i o d A t t r a c t i v e n e s s
142 f o r ( i i n 1 :NP) {
143 f o r ( j i n 1 : 1 2 1 ) {
144 A8 [ i , j ] = ( 1 /N [ 8 ] ) ∗ (
145 DRPLN[ 3 ] ∗N [ 7 ] ∗A7 [ i , j ] + (DRPLN[1]+(1 −DRPLN[ 1 ] ) ∗ ( i f ( j==CHR
[ 8 , i ] ) {1} e l s e { 0 } ) ) ∗PR[ j ,CHC[ 8 , i ] ] )
146 }
147 }
148
149 # Ninth P e r i o d A t t r a c t i v e n e s s
150 f o r ( i i n 1 :NP) {
151 f o r ( j i n 1 : 1 2 1 ) {
12
152 A9 [ i , j ] = ( 1 /N [ 9 ] ) ∗ (
153 DRPLN[ 3 ] ∗N [ 8 ] ∗A8 [ i , j ] + (DRPLN[1]+(1 −DRPLN[ 1 ] ) ∗ ( i f ( j==CHR
[ 9 , i ] ) {1} e l s e { 0 } ) ) ∗PC[ j ,CHC[ 9 , i ] ] )
154 }
155 }
156
157 # Tenth P e r i o d A t t r a c t i v e n e s s
158 f o r ( i i n 1 :NP) {
159 f o r ( j i n 1 : 1 2 1 ) {
160 A10 [ i , j ] = ( 1 /N [ 1 0 ] ) ∗ (
161 DRPLN[ 3 ] ∗N [ 9 ] ∗A9 [ i , j ] + (DRPLN[1]+(1 −DRPLN[ 1 ] ) ∗ ( i f ( j==CHR
[ 1 0 , i ] ) {1} e l s e { 0 } ) ) ∗PC[ j ,CHC[ 1 0 , i ] ] )
162 }
163 }
164
165 A <<− l i s t (A0 , A1 , A2 , A3 , A4 , A5 , A6 , A7 , A8 , A9)
166
167 g = sum( sapply ( 1 : ( T−1) ,
168 function ( t )
169 sum( sapply ( 1 : NP,
170 function ( i )
171 ( log (sum( sapply ( 1 : 1 2 1 ,
172 function ( j )
173 ( i f e l s e ( ( j==CHR[ t , i ] ) , 1 , 0 ) ) ∗ (
174 ( exp (DRPLN[ 4 ] ∗A [ [ t ] ] [ i , j ] ) ) /
175 sum( sapply ( 1 : 1 2 1 , function ( k ) exp (DRPLN[ 4 ] ∗A [ [ t ] ] [ i , k ] ) ) )
176 ) ) ) ) ) ) ) ) )
177
178 return(−g )
179 }
180
181 #####
182 # E s t i m a t i o n
183
184 R e s u l t s = nlm( Tdilemma , DRPLN<− c ( 0 . 5 , 0 . 5 , 0 . 5 , 0 . 5 , 0 . 5 ) )
185 # R e s u l t s = optim (DRPLN <− c ( 0 . 5 , 0 . 5 , 0 . 5 , 0 . 5 , 0 . 5 ) , f n =
Tdilemma , method = ”L−BFGS−B” , l o w e r = c ( 0 . 0 0 0 1 , 0 , 0 ,
0 . 0 0 0 1 , 0) , upper = c ( 1 0 , 1 , 1 , 10 , 1/(1−DRPLN[ 2 ] ) ) )
186 # R e s u l t s = optim (DRPLN <− c ( 0 . 5 , 0 . 5 , 0 . 5 , 0 . 5 , 0 . 5 ) , f n =
Tdilemma , method = ”L−BFGS−B” , l o w e r = c ( 0 . 0 0 0 1 , 0 , 0 ,
0 . 0 0 0 1 , 0) )
187 # R e s u l t s = optim (DRPLN <− c ( 0 . 5 , 0 . 5 , 0 . 5 , 0 . 5 , 0 . 5 ) , f n =
Tdilemma )
188
189 l i b r a r y ( s t a r g a z e r )
13
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
R e s u l t s . Table = cbind ( c ( ’ $R$ ’ , ’ $ d e l t a $ ’ , ’ $ p h i $ ’ , ’ $ rho $ ’ , ’ $
lambda$ ’ , ’ $N( 0 ) $ ’ ) ,
c ( ’ ( reward ) ’ , ’ ( i m a g i n a t i o n f a c t o r ) ’ , ’ ( a t t r a c t i o n decay ) ’ , ’ (
e x p e r i e n c e decay ) ’ , ’ ( p a y o f f s e n s i t i v i t y ) ’ , ’ ( i n i t i a l
experience ) ’ ) ,
c (R, s i g n i f ( R e s u l t s [ [ 2 ] ] , 4 ) ) )
write ( s t a r g a z e r ( R e s u l t s . Table ,
t i t l e= paste ( ’ Parameters f o r EWA e s t i m a t i o n with R = ’ ,R) ) ,
’ output . t e x ’ , append=TRUE)
mA = matrix ( 1 ,T, 1 2 1 )
f o r ( t i n 1 : ( T−1) ) {
for ( j in 1 : 1 2 1 ) {
mA[ t , j ] = mean(A [ [ t ] ] [ 1 : 1 0 , j ] ) }}
f o r ( t i n 1 :T) {
mA[ t , ] = mA[ t , ] /sum(mA[ t , ] ) }
mA = mA[ − 1 1 , ]
df <− data . frame ( x = rep ( seq l e n ( ncol (mA) ) , each = nrow(mA)
),
y = rep ( seq l e n (nrow(mA) ) , t i m e s = ncol (mA) ) ,
z = c (mA) )
require ( ”mgcv” )
mod <− gam( z ˜ t e ( x , y ) , data = df )
density <− matrix ( f i t t e d (mod) , ncol = 1 2 1 )
require ( ” l a t t i c e ” )
l a t t i c e . options ( axis . padding=l i s t ( factor =0.5) )
s u r f a c e = w i r e f r a m e ( t ( density ) , shade = TRUE,
aspect = c ( 0 . 8 , 1) ,
l i g h t . source = c ( 1 0 , 0 , 1 0 ) ,
y l a b = ’ Time ’ ,
xlab = ’ Strategy ’ ,
z l a b = ’ Rel . A t t r a c t i o n ’ ,
cex . l a b =0.5 ,
main = ’ E x p e r i e n c e Weighted L e a r n i n g ’ ,
sub = paste ( c ( ’R = ’ ,R) ) ,
#par . s e t t i n g s = l i s t ( a x i s . l i n e = l i s t ( c o l = ” t r a n s p a r e n t ”) )
#par . box = c ( c o l = ” t r a n s p a r e n t ”)
)
surface
pdf ( paste ( ’R = ’ ,R, ’ s u r f a c e . pdf ’ ) , family=” Times ” , width =8,
h e i g h t =8)
surface
14
229
230
dev . o f f ( )
}
15
References
C F Camerer. Behavioral Game Theory.
9780691090399. doi: 23qw3. 1
2003.
ISBN 0691090394,
Colin F. Camerer and T. Ho. Experience-weighted attraction learning in
normal-form games. Econometrica, 67:827–874, 1999. 1, 2
C Monica Capra, Jacob K Goeree, Rosario Gomez, Charles a Holt, and
By C Monica Capra. Anomalous Behavior in a Traveler’s Dilemma ? 89
(3):678–690, 1999. ISSN 0002-8282. doi: 10.1257/aer.89.3.678. 1, 2
Teck-Hua Ho, Colin F. Camerer, and Juin-kuan Chong. Functional EWA
: A One-parameter Theory of Learning in Games. Learning, (December
2001):1–55, 2002. 1
Hidehiko Ichimura and Juergen Bracht. Estimation of Learning Models on
Experimental Game Data. Learning, (2000), 2001. 1
Marco R Steenbergen. Maximum Likelihood Programming in R. Political
Science, (January):1–7, 2006. doi: 10.2307/2335964. 1
Michinori Uwasu. Essays on Environmental Cooperation. 2007. 1
Wang Xiaojing, Tong Wei, Ren Jia, Ding Linjie, and Liu Jingning. Weighted
Fairness Resource Allocation. 2(3):1–10, 2012. 1
16