Experience Weighted Attraction Model Parameterization for the Traveler’s Dilemma Richard Schwinn April 29, 2015 Abstract As a component of my candidature for the Researcher position at NUS, I estimated the experience weighted attraction (EWA) model for Capra, Goeree, Gomez, Holt, and Capra [1999]’s experimental data. In order to give a transparent and honest appraisal of my abilities, I did not consult any person, forum, or other resource apart from the following published references: Capra, Goeree, Gomez, Holt, and Capra [1999], Camerer [2003], Camerer and Ho [1999], Ho, Camerer, and Chong [2002], Ichimura and Bracht [2001], Steenbergen [2006], Uwasu [2007], Xiaojing, Wei, Jia, Linjie, and Jingning [2012].1 . 1 Overview Belief learning incorporates people’s imaginations into their decision making. By accounting for the payoffs of forgone strategies, belief learners base their decisions as much upon choices which they might have pursued but did not, as on choices actually chosen. The reinforcement approach assumes that a person’s propensity to pursue a strategy is determined purely on the basis of previous payoffs personally experienced. Thus reinforcement learners only consider past payoffs due to specific strategies. The EWA model combines the main features of belief learning and reinforcement learning. In this paper I estimate the Experience-Weighted Attraction (EWA) model for Capra et al. [1999]’s experimental data. 1 April 28, 2015 I accepted a position with the US Small Business Administration’s Office of Advocacy. Nevertheless, I submit this paper since (1) the position I accepted is not guaranteed until I clear the Office of Personnel Management (OPM) and (2) I enjoyed this challenge and am interested in the possibility of collaborating with cutting edge game-theorist like Ho Teck-Hua and Ryan O. Murphy. I may be contacted via [email protected] 1 1.1 The Traveler’s Dilemma The following is a modified description of the Travelers Dilemma story based mostly on Capra et al. [1999]’s description: Suppose two travelers purchase identical antiques while on a tropical vacation. Their luggage is lost and the airline asks them to make independent claims for compensation. In anticipation of excessive claims, the airline announces that they will honor any claim between $80 and $200 (in $1 increments), but that each will be reimbursed an amount that equals the minimum of the two claims submitted. Additionally, if the two claims differ, a reward of $R will be paid to the person making the smaller claim and a penalty of $R we will be deducted from the high claimant’s claim.2 The Nash equilibrium for this strategic situation is for both players to claim $80. Experimental data demonstrate that people often make claims higher than $80 and that these outcomes tend to evolve stably over time. The experimental data is used to estimate that roles that imagination (i.e. beliefs) and reinforcement play in determining these outcomes. See Camerer and Ho [1999] for insight into the underlying theory. 1.2 Estimation Choices Important notes on the estimations: • In order to avoid over-parameterization, I employ an uninformed Dirichletlike distribution for the initial strategies. This means that the players view each discrete strategy from $80 to $200 as being equally attractive before the first round of play. • I did not restrict the parameters due to convergence & time issues.3 In many instances, this does not seem to have played a role in the results. If further interest in my work is indicated, I will gladly devote computing time to the slower, but more appropriate, Limited Memory BroydenFletcherGoldfarbShanno (LM-BFGS) constrained optimization. • The imagination factor, δ, measures how carefully agents incorporate forgone strategies into their decision making. A δ = 1 suggests complete imagination (i.e. belief learning). 2 3 R’s equal to 5, 10, 20, 25, 50, and 80 are considered in our data. The code is far from optimized for speed 2 Experience Weighted Learning Rel. Attraction Strategy Time R= 5 Figure 1: Relative Attractiveness for R = 5 • The data for R = $10, player 2 is corrupt. In order to offer a compete set of results, the ceiling of the mean of the strategies before and after each bad entries was used. 2 Results The tables below list the estimated parameters for each of the data sets. The figures contain smoothed surfaces displaying the average relative attractiveness matrices. Relative attractiveness for each strategy is calculated as the strategy’s average attractiveness across the players as a fraction of the sum of total attractivenesses for all strategies in each round of play round. R δ φ ρ λ N (0) (reward) (imagination factor) (attraction decay) (experience decay) (payoff sensitivity) (initial experience) 5 0.7932 0.5166 0.6936 0.1055 1.105 Table 1: Parameters for EWA estimation with R = 5 3 Experience Weighted Learning Rel. Attraction Strategy Time R= 10 Figure 2: Relative Attractiveness for R = 10 R δ φ ρ λ N (0) (reward) (imagination factor) (attraction decay) (experience decay) (payoff sensitivity) (initial experience) 10 0.7231 0.333 0.7173 0.07345 1.108 Table 2: Parameters for EWA estimation with R = 10 4 Experience Weighted Learning Rel. Attraction Strategy Time R= 20 Figure 3: Relative Attractiveness for R = 20 R δ φ ρ λ N (0) (reward) (imagination factor) (attraction decay) (experience decay) (payoff sensitivity) (initial experience) 20 9311 -0.01721 0.6201 -0.008463 2350 Table 3: Parameters for EWA estimation with R = 20 5 Experience Weighted Learning Rel. Attraction Strategy Time R= 25 Figure 4: Relative Attractiveness for R = 25 R δ φ ρ λ N (0) (reward) (imagination factor) (attraction decay) (experience decay) (payoff sensitivity) (initial experience) 25 0.466 0.252 0.5858 0.05516 1.001 Table 4: Parameters for EWA estimation with R = 25 6 Experience Weighted Learning Rel. Attraction Strategy Time R= 50 Figure 5: Relative Attractiveness for R = 50 R δ φ ρ λ N (0) (reward) (imagination factor) (attraction decay) (experience decay) (payoff sensitivity) (initial experience) 5 0.5105 -0.1185 0.8205 0.04019 1.187 Table 5: Parameters for EWA estimation with R = 50 7 Experience Weighted Learning Rel. Attraction Strategy Time R= 80 Figure 6: Relative Attractiveness for R = 80 R δ φ ρ λ N (0) (reward) (imagination factor) (attraction decay) (experience decay) (payoff sensitivity) (initial experience) 80 0.5355 -0.03476 0.7112 0.04146 0.9573 Table 6: Parameters for EWA estimation with R = 80 8 3 Conclusion Future work would involve cross-validation, actual probability density surfaces, and further results interpretation such as discussion of how higher rewards, R, translate into strategies closer to the Nash equilibrium, $80. R Code The following code requires that the *.dat files are located in the R working directory. 1 f o r (R i n c ( 5 , 1 0 , 2 0 , 2 5 , 5 0 , 8 0 ) ) { 2 ##### 3 # Data 4 i f (R == 5 ) { 5 CHR = f l o o r ( read . table ( ’ Td5 . dat ’ ) − 7 9 ) # rows t r a c k time and columns t r a c k p l a y e r s 6 CHC = f l o o r ( read . table ( ’ Tdo5 . dat ’ ) − 7 9 ) 7 } 8 9 i f (R == 1 0 ) { 10 CHR = f l o o r ( read . table ( ’ Td10 . dat ’ ) − 7 9 ) # rows t r a c k time and columns t r a c k p l a y e r s 11 CHC = f l o o r ( read . table ( ’ Tdo10 . dat ’ ) − 7 9 ) 12 } 13 14 15 i f (R == 2 0 ) { 16 CHR = f l o o r ( read . table ( ’ Td20 . dat ’ ) − 7 9 ) # rows t r a c k time and columns t r a c k p l a y e r s 17 CHC = f l o o r ( read . table ( ’ Tdo20 . dat ’ ) − 7 9 ) } 18 19 20 i f (R == 2 5 ) { 21 CHR = f l o o r ( read . table ( ’ Td25 . dat ’ ) − 7 9 ) # rows t r a c k time and columns t r a c k p l a y e r s 22 CHC = f l o o r ( read . table ( ’ Tdo25 . dat ’ ) − 7 9 ) } 23 24 25 i f (R == 5 0 ) { 26 CHR = f l o o r ( read . table ( ’ Td50 . dat ’ ) − 7 9 ) # rows t r a c k time and columns t r a c k p l a y e r s 27 CHC = f l o o r ( read . table ( ’ Tdo50 . dat ’ ) − 7 9 ) } 28 9 29 30 i f (R == 8 0 ) { 31 CHR = f l o o r ( read . table ( ’ Td80 . dat ’ ) − 7 9 ) # rows t r a c k time and columns t r a c k p l a y e r s 32 CHC = f l o o r ( read . table ( ’ Tdo80 . dat ’ ) − 7 9 ) } 33 34 # ##### 35 # p h i = 0 . 8 36 # rho = p h i # D e p r e c i a t i o n o f e x p e r i e n c e 37 # d e l t a = 0 . 5 # Weight g i v e n t o f o r g o n e s t r a t e g i e s 38 # lambda = 0 . 1 39 # DRPLN[ 5 ] = 1 # I n i t i a l b s e r v a t i o n e q u i v a l e n t s 40 # DRPLN = c ( phi , rho , d e l t a , lambda , DRPLN[ 5 ] ) 41 42 ##### 43 # Parameters 44 # R = 50 # Rewards 45 T = nrow(CHR)+1 # Number o f p e r i o d s 46 NP = ncol (CHR) # Number o f p l a y e r s 47 48 od <− options ( d i g i t s = 5 ) 49 50 ##### 51 # P a y o f f s 52 s = 8 0 : 2 0 0 53 PR = matrix ( rep ( s , 1 2 1 ) , 1 2 1 , 1 2 1 ) 54 PR[ lower . t r i (PR, FALSE) ] <− 0 55 PC = t (PR) 56 diag (PC) = 0 57 o n e s = matrix ( rep ( 1 , 1 2 1 ˆ 2 ) , 1 2 1 , 1 2 1 ) 58 o n e s [ lower . t r i ( ones , TRUE) ] = 0 59 PC = PR + PC + R∗t ( o n e s ) − R∗ o n e s 60 rownames(PC) = 8 0 : 2 0 0 61 colnames (PC) = 8 0 : 2 0 0 62 PR = t (PC) # PC I s t h e p a y o f f t o t h e column p l a y e r w h i l e PR i s t h e p a y o f f t o Row 63 64 ##### 65 # A t t r a c t i o n M a t r i c e s ( rows r e p r e s e n t p l a y e r s , c o l s strategies ) 66 A0 = matrix ( rep ( 1 / 1 2 1 ,NP) ,NP, 1 2 1 ) 67 A1 = A0 # The f o l l o w i n g s i m p l y a l l o t memory t o t h e f u t u r e attraction tables 68 A2 = A0 69 A3 = A0 10 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 A4 = A0 A5 = A0 A6 = A0 A7 = A0 A8 = A0 A9 = A0 A10 = A0 Tdilemma <− function (DRPLN) { ##### # Experience Equivalents N = NA NN = NA NN[ 1 : T ] = DRPLN[ 2 ] ˆ ( ( 1 : T) −1)∗DRPLN[ 5 ] f o r ( i i n 1 :T) {N[ i ] = sum(NN[ 1 : i ] ) } # F i r s t Period A t t r a c t i v e n e s s f o r ( i i n 1 :NP) { for ( j in 1 : 1 2 1 ) { A1 [ i , j ] = ( DRPLN[ 3 ] ∗DRPLN[ 5 ] ∗A0 [ i , j ] + (DRPLN[1]+(1 −DRPLN[ 1 ] ) ∗ ( i f ( j== CHR[ 1 , i ] ) {1} e l s e { 0 } ) ) ∗PR[ j ,CHC[ 1 , i ] ] ) / (N [ 1 ] ) 92 } 93 } 94 95 # Second P e r i o d A t t r a c t i v e n e s s 96 f o r ( i i n 1 :NP) { 97 f o r ( j i n 1 : 1 2 1 ) { 98 A2 [ i , j ] = (DRPLN[ 3 ] ∗N [ 1 ] ∗A1 [ i , j ] + (DRPLN[1]+(1 −DRPLN[ 1 ] ) ∗ ( i f ( j==CHR[ 2 , i ] ) {1} e l s e { 0 } ) ) ∗PR[ j ,CHC[ 2 , i ] ] ) / (N [ 2 ] ) 99 } 100 } 101 102 # Third P e r i o d A t t r a c t i v e n e s s 103 f o r ( i i n 1 :NP) { 104 f o r ( j i n 1 : 1 2 1 ) { 105 A3 [ i , j ] = (DRPLN[ 3 ] ∗N [ 2 ] ∗A2 [ i , j ] + (DRPLN[1]+(1 −DRPLN[ 1 ] ) ∗ ( i f ( j==CHR[ 3 , i ] ) {1} e l s e { 0 } ) ) ∗PR[ j ,CHC[ 3 , i ] ] ) / (N [ 3 ] ) 106 } 107 } 108 109 # Fourth P e r i o d A t t r a c t i v e n e s s 110 f o r ( i i n 1 :NP) { 111 f o r ( j i n 1 : 1 2 1 ) { 11 112 A4 [ i , j ] = ( 1 /N [ 4 ] ) ∗ ( 113 DRPLN[ 3 ] ∗N [ 3 ] ∗A3 [ i , j ] + (DRPLN[1]+(1 −DRPLN[ 1 ] ) ∗ ( i f ( j==CHR [ 4 , i ] ) {1} e l s e { 0 } ) ) ∗PR[ j ,CHC[ 4 , i ] ] ) 114 } 115 } 116 117 # F i f t h P e r i o d A t t r a c t i v e n e s s 118 f o r ( i i n 1 :NP) { 119 f o r ( j i n 1 : 1 2 1 ) { 120 A5 [ i , j ] = ( 1 /N [ 5 ] ) ∗ ( 121 DRPLN[ 3 ] ∗N [ 4 ] ∗A4 [ i , j ] + (DRPLN[1]+(1 −DRPLN[ 1 ] ) ∗ ( i f ( j==CHR [ 5 , i ] ) {1} e l s e { 0 } ) ) ∗PR[ j ,CHC[ 5 , i ] ] ) 122 } 123 } 124 125 # S i x t h P e r i o d A t t r a c t i v e n e s s 126 f o r ( i i n 1 :NP) { 127 f o r ( j i n 1 : 1 2 1 ) { 128 A6 [ i , j ] = ( 1 /N [ 6 ] ) ∗ ( 129 DRPLN[ 3 ] ∗N [ 5 ] ∗A5 [ i , j ] + (DRPLN[1]+(1 −DRPLN[ 1 ] ) ∗ ( i f ( j==CHR [ 6 , i ] ) {1} e l s e { 0 } ) ) ∗PR[ j ,CHC[ 6 , i ] ] ) 130 } 131 } 132 133 # S e v e n t h P e r i o d A t t r a c t i v e n e s s 134 f o r ( i i n 1 :NP) { 135 f o r ( j i n 1 : 1 2 1 ) { 136 A7 [ i , j ] = ( 1 /N [ 7 ] ) ∗ ( 137 DRPLN[ 3 ] ∗N [ 6 ] ∗A6 [ i , j ] + (DRPLN[1]+(1 −DRPLN[ 1 ] ) ∗ ( i f ( j==CHR [ 7 , i ] ) {1} e l s e { 0 } ) ) ∗PR[ j ,CHC[ 7 , i ] ] ) 138 } 139 } 140 141 # E i g t h P e r i o d A t t r a c t i v e n e s s 142 f o r ( i i n 1 :NP) { 143 f o r ( j i n 1 : 1 2 1 ) { 144 A8 [ i , j ] = ( 1 /N [ 8 ] ) ∗ ( 145 DRPLN[ 3 ] ∗N [ 7 ] ∗A7 [ i , j ] + (DRPLN[1]+(1 −DRPLN[ 1 ] ) ∗ ( i f ( j==CHR [ 8 , i ] ) {1} e l s e { 0 } ) ) ∗PR[ j ,CHC[ 8 , i ] ] ) 146 } 147 } 148 149 # Ninth P e r i o d A t t r a c t i v e n e s s 150 f o r ( i i n 1 :NP) { 151 f o r ( j i n 1 : 1 2 1 ) { 12 152 A9 [ i , j ] = ( 1 /N [ 9 ] ) ∗ ( 153 DRPLN[ 3 ] ∗N [ 8 ] ∗A8 [ i , j ] + (DRPLN[1]+(1 −DRPLN[ 1 ] ) ∗ ( i f ( j==CHR [ 9 , i ] ) {1} e l s e { 0 } ) ) ∗PC[ j ,CHC[ 9 , i ] ] ) 154 } 155 } 156 157 # Tenth P e r i o d A t t r a c t i v e n e s s 158 f o r ( i i n 1 :NP) { 159 f o r ( j i n 1 : 1 2 1 ) { 160 A10 [ i , j ] = ( 1 /N [ 1 0 ] ) ∗ ( 161 DRPLN[ 3 ] ∗N [ 9 ] ∗A9 [ i , j ] + (DRPLN[1]+(1 −DRPLN[ 1 ] ) ∗ ( i f ( j==CHR [ 1 0 , i ] ) {1} e l s e { 0 } ) ) ∗PC[ j ,CHC[ 1 0 , i ] ] ) 162 } 163 } 164 165 A <<− l i s t (A0 , A1 , A2 , A3 , A4 , A5 , A6 , A7 , A8 , A9) 166 167 g = sum( sapply ( 1 : ( T−1) , 168 function ( t ) 169 sum( sapply ( 1 : NP, 170 function ( i ) 171 ( log (sum( sapply ( 1 : 1 2 1 , 172 function ( j ) 173 ( i f e l s e ( ( j==CHR[ t , i ] ) , 1 , 0 ) ) ∗ ( 174 ( exp (DRPLN[ 4 ] ∗A [ [ t ] ] [ i , j ] ) ) / 175 sum( sapply ( 1 : 1 2 1 , function ( k ) exp (DRPLN[ 4 ] ∗A [ [ t ] ] [ i , k ] ) ) ) 176 ) ) ) ) ) ) ) ) ) 177 178 return(−g ) 179 } 180 181 ##### 182 # E s t i m a t i o n 183 184 R e s u l t s = nlm( Tdilemma , DRPLN<− c ( 0 . 5 , 0 . 5 , 0 . 5 , 0 . 5 , 0 . 5 ) ) 185 # R e s u l t s = optim (DRPLN <− c ( 0 . 5 , 0 . 5 , 0 . 5 , 0 . 5 , 0 . 5 ) , f n = Tdilemma , method = ”L−BFGS−B” , l o w e r = c ( 0 . 0 0 0 1 , 0 , 0 , 0 . 0 0 0 1 , 0) , upper = c ( 1 0 , 1 , 1 , 10 , 1/(1−DRPLN[ 2 ] ) ) ) 186 # R e s u l t s = optim (DRPLN <− c ( 0 . 5 , 0 . 5 , 0 . 5 , 0 . 5 , 0 . 5 ) , f n = Tdilemma , method = ”L−BFGS−B” , l o w e r = c ( 0 . 0 0 0 1 , 0 , 0 , 0 . 0 0 0 1 , 0) ) 187 # R e s u l t s = optim (DRPLN <− c ( 0 . 5 , 0 . 5 , 0 . 5 , 0 . 5 , 0 . 5 ) , f n = Tdilemma ) 188 189 l i b r a r y ( s t a r g a z e r ) 13 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 R e s u l t s . Table = cbind ( c ( ’ $R$ ’ , ’ $ d e l t a $ ’ , ’ $ p h i $ ’ , ’ $ rho $ ’ , ’ $ lambda$ ’ , ’ $N( 0 ) $ ’ ) , c ( ’ ( reward ) ’ , ’ ( i m a g i n a t i o n f a c t o r ) ’ , ’ ( a t t r a c t i o n decay ) ’ , ’ ( e x p e r i e n c e decay ) ’ , ’ ( p a y o f f s e n s i t i v i t y ) ’ , ’ ( i n i t i a l experience ) ’ ) , c (R, s i g n i f ( R e s u l t s [ [ 2 ] ] , 4 ) ) ) write ( s t a r g a z e r ( R e s u l t s . Table , t i t l e= paste ( ’ Parameters f o r EWA e s t i m a t i o n with R = ’ ,R) ) , ’ output . t e x ’ , append=TRUE) mA = matrix ( 1 ,T, 1 2 1 ) f o r ( t i n 1 : ( T−1) ) { for ( j in 1 : 1 2 1 ) { mA[ t , j ] = mean(A [ [ t ] ] [ 1 : 1 0 , j ] ) }} f o r ( t i n 1 :T) { mA[ t , ] = mA[ t , ] /sum(mA[ t , ] ) } mA = mA[ − 1 1 , ] df <− data . frame ( x = rep ( seq l e n ( ncol (mA) ) , each = nrow(mA) ), y = rep ( seq l e n (nrow(mA) ) , t i m e s = ncol (mA) ) , z = c (mA) ) require ( ”mgcv” ) mod <− gam( z ˜ t e ( x , y ) , data = df ) density <− matrix ( f i t t e d (mod) , ncol = 1 2 1 ) require ( ” l a t t i c e ” ) l a t t i c e . options ( axis . padding=l i s t ( factor =0.5) ) s u r f a c e = w i r e f r a m e ( t ( density ) , shade = TRUE, aspect = c ( 0 . 8 , 1) , l i g h t . source = c ( 1 0 , 0 , 1 0 ) , y l a b = ’ Time ’ , xlab = ’ Strategy ’ , z l a b = ’ Rel . A t t r a c t i o n ’ , cex . l a b =0.5 , main = ’ E x p e r i e n c e Weighted L e a r n i n g ’ , sub = paste ( c ( ’R = ’ ,R) ) , #par . s e t t i n g s = l i s t ( a x i s . l i n e = l i s t ( c o l = ” t r a n s p a r e n t ”) ) #par . box = c ( c o l = ” t r a n s p a r e n t ”) ) surface pdf ( paste ( ’R = ’ ,R, ’ s u r f a c e . pdf ’ ) , family=” Times ” , width =8, h e i g h t =8) surface 14 229 230 dev . o f f ( ) } 15 References C F Camerer. Behavioral Game Theory. 9780691090399. doi: 23qw3. 1 2003. ISBN 0691090394, Colin F. Camerer and T. Ho. Experience-weighted attraction learning in normal-form games. Econometrica, 67:827–874, 1999. 1, 2 C Monica Capra, Jacob K Goeree, Rosario Gomez, Charles a Holt, and By C Monica Capra. Anomalous Behavior in a Traveler’s Dilemma ? 89 (3):678–690, 1999. ISSN 0002-8282. doi: 10.1257/aer.89.3.678. 1, 2 Teck-Hua Ho, Colin F. Camerer, and Juin-kuan Chong. Functional EWA : A One-parameter Theory of Learning in Games. Learning, (December 2001):1–55, 2002. 1 Hidehiko Ichimura and Juergen Bracht. Estimation of Learning Models on Experimental Game Data. Learning, (2000), 2001. 1 Marco R Steenbergen. Maximum Likelihood Programming in R. Political Science, (January):1–7, 2006. doi: 10.2307/2335964. 1 Michinori Uwasu. Essays on Environmental Cooperation. 2007. 1 Wang Xiaojing, Tong Wei, Ren Jia, Ding Linjie, and Liu Jingning. Weighted Fairness Resource Allocation. 2(3):1–10, 2012. 1 16
© Copyright 2024