Download Report

“Heckmeck am Bratwurmeck”
or
How to grill the maximum number of worms
Roland C. Seydel
24/05/2012
(1)
“Heckmeck am Bratwurmeck”
24/05/2012
1 / 29
Overview
1
Introducing the dice game
The basic rules
Understanding the strategy
2
Computing the optimal strategy
Value function and state space
Computing the value function
3
The results
(2)
“Heckmeck am Bratwurmeck”
24/05/2012
2 / 29
Introducing the dice game
Overview
1
Introducing the dice game
The basic rules
Understanding the strategy
2
Computing the optimal strategy
Value function and state space
Computing the value function
3
The results
(3)
“Heckmeck am Bratwurmeck”
24/05/2012
3 / 29
Introducing the dice game
The basic rules
The rules (I)
Goal of the game: Get the most worms by throwing high (spot) numbers.
1
There are eight dice with six sides each, going from 1 to 6, where
instead of a 6 a worm is shown.
2
You can throw the dice as often as you want; after each throw you
choose a number which you haven’t chosen yet and put aside all the
dice showing this number.
3
The numbers of the dice put aside are added, where a worm (6)
counts as a 5. No 6 put aside ⇒ 0 points! You can take a piece from
the shelf with number ≤ your points, and put it on top of your pile.
(4)
“Heckmeck am Bratwurmeck”
24/05/2012
4 / 29
Introducing the dice game
The basic rules
The rules (II)
The winner is the player with the most worms in his pile.
Several additional rules which matter to us:
Not able to put aside a new number? ⇒ 0 points!
You can also take a piece from the top of your coplayers’ piles, if you
match the number exactly.
If your points are not sufficient to pick a piece, you lose the piece on
top of your own pile.
Rules we ignore for now:
Lost pieces are put back on the shelf; the piece with the largest
available number is removed from the shelf.
Your coplayers can also take pieces from you when it’s their turn!
The game is over when the shelf is empty.
(5)
“Heckmeck am Bratwurmeck”
24/05/2012
5 / 29
Introducing the dice game
The basic rules
Example game
(6)
“Heckmeck am Bratwurmeck”
24/05/2012
6 / 29
Introducing the dice game
Understanding the strategy
Optimal strategy: Easy to tell
Is selecting four 1 a good idea?
Are two 5 better, or two 6?
Are two 4 better, or two 5?
If I could lose four worms, should I bet or rather risk nothing?
(7)
“Heckmeck am Bratwurmeck”
24/05/2012
7 / 29
Introducing the dice game
Understanding the strategy
Optimal strategy: Difficult to tell
Are three 5 better, or two 6? And what about four 5?
Could it be optimal to select one or more 3 in the first throw?
Should I take the five 5, or is it too dangerous?
What is the expected / most likely outcome in terms of worms?
Should I stop with three 5 and two 6, or continue?
(8)
“Heckmeck am Bratwurmeck”
24/05/2012
8 / 29
Introducing the dice game
Understanding the strategy
Parallels to option pricing
Early exercise: The player can decide when to stop and exercise. ;
American option
Optimal control: At each point in time, a decision has to be taken.
; Swing option
Knock out: If there is no 6, or you are not able to put aside a new
number, our points are 0. ; Barrier option
Conclusion
Compute the optimal solution by option pricing methods?
(9)
“Heckmeck am Bratwurmeck”
24/05/2012
9 / 29
Computing the optimal strategy
Overview
1
Introducing the dice game
The basic rules
Understanding the strategy
2
Computing the optimal strategy
Value function and state space
Computing the value function
3
The results
(10)
“Heckmeck am Bratwurmeck”
24/05/2012
10 / 29
Computing the optimal strategy
Differences to option pricing
Option pricing
Continuous state space
Continuous time
Typically 1d state space
Same state space in time
Knockout happens at barrier line
Random noise is added to process
No decision needed
Heckmeck
Discrete state space
Only up to 8 times
Up to 8d state space
Changing state space
No clear line for knockout
Random is not added to state
Each throw needs a decision11
(control)
00
0011
11
0011
00
00
11
0011
11
0011
11
00
11
11
00
00
11
00
00
11
00
00
11
00
00
00
11
0011
11
0011
11
00
11
00
11
00
11
00
11
00
11
00
11
0011
11
0011
11
00
11
11
00
00
11
00
11
0000
11
0000
11
00
11
11
00
00
11
00
00
00
11
00
11
00
00
00
11
0011
11
0011
11
00
11
00
11
0011
11
0011
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
(11)
“Heckmeck am Bratwurmeck”
24/05/2012
11 / 29
Computing the optimal strategy
Assumptions
1
The player wants to maximize the expected number of worms in his
turn
2
Reactions of other players are not anticipated (single-player
optimization)
3
Not the total outcome of the game is optimized, but one single turn
of up to 6 throws!
(12)
“Heckmeck am Bratwurmeck”
24/05/2012
12 / 29
Computing the optimal strategy
Value function and state space
Value function
Situation
An array of pieces available on the shelf, on other players’ piles and on
your own pile is called a situation.
State space
A vector x ∈ {1, . . . , 6}t of already selected dice at time t ∈ {0, . . . , 8} is
called the state at time t. The state space for time t is the ensemble of all
possible selections with x ∈ {1, . . . , 6}t . We consider only sorted states
with notation x = {. . .}.
Value function
For a particular situation, the value function v (x) is the expected number
of worms (on pieces), assuming that starting from a state of x the
expectation-optimal decisions will be taken.
Caveat: From now on, worms are only worms on pieces, not on dice!
(13)
“Heckmeck am Bratwurmeck”
24/05/2012
13 / 29
Computing the optimal strategy
Value function and state space
Value function: payoff
(Worm) payoff
For a particular situation let w (n) be the number of worms you would get
or lose for a sum of points n. Then the (worm) payoff p(x) of a state x is
defined by the number of worms you would get or lose upon termination.
In formulas,
!
(P
i min(xi , 5) 6 ∈ x
.
p(x) = w (n(x)) = w
0
else
Each situation has a worst payoff w (0), equal to the negative number of
worms on top of your pile.
Examples:
p({5, 5, 5, 5, 5, 5, 5, 5}) = w (0)
←− {. . .} sorted vector!
p({1, 2, 3, 4, 5, 6, 6, 6}) = 3
p({1, 5, 5, 5, 6}) = 1
(14)
“Heckmeck am Bratwurmeck”
24/05/2012
14 / 29
Computing the optimal strategy
Value function and state space
Value function on intermediate states
Intermediate state
For a state x ∈ {1, . . . , 6}t and a throw y ∈ {1, . . . , 6}8−t , the tuple (x, y )
is called intermediate state, i.e., a state that still needs a decision.
We can also define the value function of intermediate states (x, y ): Either
there is a valid best choice y˜ ⊂ y (in particular y˜ ∩ x = ∅) such that
v ((x, y )) = v ({x, y˜ }),
or there is no valid choice, in which case v ((x, y )) = w (0) (worst payoff).
Conclusion: It is sufficient to compute only the value function on normal
states!
(15)
“Heckmeck am Bratwurmeck”
24/05/2012
15 / 29
Computing the optimal strategy
Value function and state space
Value function: optimal exercise
For states in {1, . . . , 6}8 , the value function is equal to the payoff
(assuming full shelf), e.g.:
v ({5, 5, 5, 5, 5, 5, 5, 5}) = p({5, 5, 5, 5, 5, 5, 5, 5}) = w (0)
v ({1, 2, 3, 4, 5, 6, 6, 6}) = p({1, 2, 3, 4, 5, 6, 6, 6}) = 3
We call this the terminal condition. There are other types of states for
which the value function is determined by the payoff function:
Optimal exercise
It is optimal to exercise in a state x ∈ {1, . . . , 6}t if v (x) = p(x), i.e., the
value function equals the payoff function. In this case the expected
optimal number of worms can be obtained immediately.
Yet we do not know the value function yet . . .
(16)
“Heckmeck am Bratwurmeck”
24/05/2012
16 / 29
Computing the optimal strategy
Value function and state space
State space in “time” t: example
Throw dice (random)
Decide
x = {4, 5, 5, 5}
x = {3, 3, 5, 5, 5}
y = {1, 3, 3, 4, 5}
x = {1, 5, 5, 5}
x = {5, 5, 5, 6, 6}
x = {5, 5, 5} y = {5, 5, 5, 6, 6}
Exercise?
y = {5, 5, 5, 5, 5}
Knock out!
t=3
(17)
t=4
“Heckmeck am Bratwurmeck”
t=5
24/05/2012
17 / 29
Computing the optimal strategy
Computing the value function
Finding the value function: Possible approaches
Monte Carlo simulation? This is the approach implicitly chosen by
experienced players:
Simulate dice throws on the computer
Try different strategies in different (simulated) games.
Problem: Causality difficult to establish because of multitude of
possible strategies ⇒ Need many simulations!
Use recursion programming principle? Solution fastest to implement:
Uses
(
v (x) =
max {p(x), E[maxy ⊂Y v ((x, y ))]}
p(x)
|x| < 8
|x| = 8
(1)
where Y follows a multinominal (dice) distribution
One command v ({}) starts the whole calculation recursively
Problem: Takes an eternity, because most states are computed multiple
times
Backward induction! Start at terminal time and compute p(x), then
go backwards in time using (1). Compute each state only once
(18)
“Heckmeck am Bratwurmeck”
24/05/2012
18 / 29
Computing the optimal strategy
Computing the value function
Needed: The multinomial distribution
Q: If you toss a coin 5 times, what is the probability of getting 5 − 4 heads
and 4 tails?
5!
5−4 (1 − 0.5)4
A: Binomial distribution! (5−4)!
4! 0.5
Multinomial (dice) distribution
If you throw a k-dimensional dice n times, then the probability of getting
xi times the spot number i for i = 1, . . . , k is
fM (x) =
where
Pk
i=1 pi
(19)
= 1 and
Pn
n!
p x1 · . . . · pkxk
x1 ! · . . . · xk ! 1
i=1 xi
= n.
“Heckmeck am Bratwurmeck”
24/05/2012
19 / 29
Computing the optimal strategy
Computing the value function
Backward induction
The backward induction makes the implicit equation (1) explicit by making
sure that the values on the right hand side are already computed:
Algorithm: Backward induction
1
2
At time t = 8, determine terminal payoff p(x) for all possible states
x ∈ {1, . . . , 6}8
Go backwards in time t → t − 1, for each t-state x do:
1
2
3
4
Compute distribution of possible dice scenarios
Check for each scenario whether knocked out (no new spot numbers),
or else to which future state the scenario could lead
Take the expectation EY [maxy ⊂Y v ((x, y ))] with v from future times
Take the maximum with p(x) (exercise instead of throwing dice)
MATLAB implementation runs in just a few seconds on a normal PC!
(20)
“Heckmeck am Bratwurmeck”
24/05/2012
20 / 29
The results
Overview
1
Introducing the dice game
The basic rules
Understanding the strategy
2
Computing the optimal strategy
Value function and state space
Computing the value function
3
The results
(21)
“Heckmeck am Bratwurmeck”
24/05/2012
21 / 29
The results
Contour lines of value function, full shelf, initial throw
Heckmeck initial throw
7
1.5
2
0.5
4
0.5
3
8
3.5
2.5
1
1
1.
0.
5
5
5
2
3
4
2.5
3
0.5
2
1.5
1
#dice selected
6
2
1.5
1
1
1
1.5
2
2.5
3
3.5
4
spot number
4.5
5
5.5
6
Figure: Contour lines of value function in initial throw, if shelf starts at 21;
dependent on the number of selected dice (vertical axis) of a particular spot
(22)
“Heckmeck am Bratwurmeck”
24/05/2012
22 / 29
The results
Distribution of points
Dice sum distribution for shelf starting at 30
0.5
0.45
0.45
0.4
0.4
0.35
0.35
0.3
Probability
Probability
Dice sum distribution for full shelf
0.5
0.25
0.3
0.25
0.2
0.2
0.15
0.15
0.1
0.1
0.05
0.05
0
0
5
10
15
20
25
Dice sum
30
35
40
0
0
5
10
15
20
25
Dice sum
30
35
40
Figure: Distribution of dice sum under optimal strategy, from a Monte Carlo
simulation with 1000 paths. Left: Full shelf, right: shelf starting at 30
Explanation: Left almost no values in [5, 20] because for these dice sums
knockout is
very likely.
(23)
“Heckmeck am Bratwurmeck”
24/05/2012
23 / 29
The results
Most likely selections
What is the most likely initial selection of dice, assuming we act
optimally? Answer: two 6
Number of dice
Spots
1
2
3
4
5
6
1
0.0003
0.0019
0.0113
0.0724
0.0037
0.0150
2
0.0000
0.0000
0.0507
0.1013
0.1666
0.2347
3
0.0000
0.0000
0.0009
0.0531
0.0982
0.1035
4
0.0000
0.0000
0.0003
0.0206
0.0260
0.0260
5
0.0000
0.0000
0.0001
0.0037
0.0042
0.0042
6
0.0000
0.0000
0.0000
0.0004
0.0004
0.0004
7
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0
0.0
0.0
0.0
0.0
0.0
Table: Probability that a number of dice (horizontal axis) carrying a certain
number of spots (vertical axis) is selected under the optimal strategy.
(24)
“Heckmeck am Bratwurmeck”
24/05/2012
24 / 29
The results
Optimal exercise decisions: Example
Original question: Stop with three 5 and two 6?
(25)
“Heckmeck am Bratwurmeck”
24/05/2012
25 / 29
The results
Optimal exercise decisions: Example
Original question: Stop with three 5 and two 6?
>> [v,A,A_inv] = heckmeck_v4(21:36, [], []);
>> heckmeck_value([5,5,5,6,6],A_inv,v)
2.6277
(25)
“Heckmeck am Bratwurmeck”
24/05/2012
25 / 29
The results
Optimal exercise decisions: Example
Original question: Stop with three 5 and two 6?
>> [v,A,A_inv] = heckmeck_v4(21:36, [], []);
>> heckmeck_value([5,5,5,6,6],A_inv,v)
2.6277
>> [v,A,A_inv] = heckmeck_v4([21:31,33:36], [], [32]);
>> heckmeck_value([5,5,5,6,6],A_inv,v)
2.4352
(25)
“Heckmeck am Bratwurmeck”
24/05/2012
25 / 29
The results
Optimal exercise decisions: Example
Original question: Stop with three 5 and two 6?
>> [v,A,A_inv] = heckmeck_v4(21:36, [], []);
>> heckmeck_value([5,5,5,6,6],A_inv,v)
2.6277
>> [v,A,A_inv] = heckmeck_v4([21:31,33:36], [], [32]);
>> heckmeck_value([5,5,5,6,6],A_inv,v)
2.4352
>> [v,A,A_inv] = heckmeck_v4([24,33,34], [21, 25,29], [32]);
>> heckmeck_value([5,5,5,6,6],A_inv,v)
2
(25)
“Heckmeck am Bratwurmeck”
24/05/2012
25 / 29
The results
Optimal strategy in Monte Carlo simulation
MC simulation vs. backward induction for initial throw, full shelf
1.9
Mean of MC sim
Value function (backward induction)
Value fct +/− stdev
1.85
Number of worms
1.8
1.75
1.7
1.65
1.6
1.55
1.5
0
1000
2000
3000
4000 5000 6000
Number of paths
7000
8000
9000 10000
Figure: Convergence of mean of Monte Carlo simulation depending on number of
paths (blue line) in initial throw, if shelf starts at 21. Green: backward induction
(26)
“Heckmeck am Bratwurmeck”
24/05/2012
26 / 29
The results
Optimal strategy in Monte Carlo simulation (2)
In the whole game, what is our probability of winning if others pursue
“average” strategies?
Test: Optimal strategy vs. fuzzy strategy
We test our results in a two-player game:
Player 1 follows the optimal strategy (derived from the value function
v)
Player 2 derives his strategy from a “value function” misestimated by
±0.1 worms (by randomly perturbing v with stdev of 0.1)
Result: Player 1 wins 17 out of 20 games!
Conclusion: Even a slight difference in optimality makes us win most of
the games (law of large numbers because of many rounds per game!)
(27)
“Heckmeck am Bratwurmeck”
24/05/2012
27 / 29
The results
Extensions / References
Possible extensions:
Incorporate risk in pricing? → Utility functions
Compute optimal strategies for the whole game
Reference:
Reiner Knizia: Heckmeck am Bratwurmeck (Pickomino), Zoch-Verlag
2005
(28)
“Heckmeck am Bratwurmeck”
24/05/2012
28 / 29
Value function dependent on shelf, initial throw
Value depending on shelf
3.5
die=4 #selected=1
die=4 #selected=2
die=4 #selected=3
die=4 #selected=4
die=4 #selected=5
die=5 #selected=1
die=5 #selected=2
die=5 #selected=3
die=5 #selected=4
die=5 #selected=5
die=6 #selected=1
die=6 #selected=2
die=6 #selected=3
die=6 #selected=4
die=6 #selected=5
Expected optimal worms
3
2.5
2
1.5
1
0.5
0
20
22
24
26
28
30
Shelf starts at...
32
34
36
Figure: Expected number of worms under optimal strategy for different initial dice
choices, dependent on the minimal number available on the shelf.
(29)
“Heckmeck am Bratwurmeck”
24/05/2012
29 / 29