Download Report

Playing Draw Poker with Convolu4onal Neural Nets Nikolai Yakovenko 4/22/15 for EE6894 What is Draw Poker? •  Five-‐card poker with one exchange •  10,000 machines in Las Vegas alone •  Pays out $0.95 to $1.007 per dollar*… *with perfect play •  Add photo of payout table 100% payout with perfect play. The Machine’s Edge What do you do here? Rule #7 Draw 3 to a Royal Flush! Worth $0.80 on average Worth $1.80 on average Easy to get 99.5% payout Just follow these 25 easy rules... No wonder people make mistakes Typical human plays 5% below expectaaon • 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
Four of a kind, straight ﬂush, royal ﬂush 4 to a royal ﬂush Three of a kind, straight, ﬂush, full house 4 to a straight ﬂush Two pair High pair 3 to a royal ﬂush 4 to a ﬂush Low pair 4 to an outside straight 3 to a straight ﬂush (type 1) AKQJ unsuited 2 suited high cards 4 to an inside straight with 3 high cards 3 to a straight ﬂush (type 2) KQJ unsuited QJ unsuited JT suited KQ, KJ unsuited QT suited AK, AQ, AJ unsuited KT suited One high card 3 to a straight ﬂush (type 3) Discard everything Can learning do becer? Data Representaaon All 32 sim results for hand [Kh,Ad,Kc,8h,Qc]:
[]:2000 sample: 0.27 ave 6.00 max
[4d,4h,Jh,5d,4c]
x23456789TJQKA
c..1..........
d..11.........
h..1......1...
s............. 4x13 binary matrix, each card 32-‐length vector for all draws [Kh]:20000 sample:
0.33 ave 9.00 max
[Ad]:20000 sample:
0.45 ave 25.00 max
[Kc]:20000 sample:
0.33 ave 25.00 max
[8h]:2000 sample:
0.35 ave 25.00 max
[Qc]:20000 sample:
0.44 ave 50.00 max
[Kh,Ad]:2000 sample: 0.41 ave 9.00 max
[Kh,Kc]:2000 sample: 1.54 ave 25.00 max
…
[Kc,Qc]:20000 sample: 0.66 ave 976.00 max
[8h,Qc]:2000 sample: 0.34 ave 9.00 max
[Kh,Ad,Kc]:2000 sample:
1.38 ave 25.00 max
…
[Kh,Kc,8h]:2000 sample:
1.42 ave 25.00 max
[Kh,Kc,Qc]:2000 sample:
1.45 ave 25.00 max
…
[Kh,Ad,Kc,8h]:2000 sample: 1.23 ave 3.00 max
[Kh,Ad,Kc,Qc]:2000 sample: 1.21 ave 3.00 max
[Kh,Ad,8h,Qc]:2000 sample: 0.18 ave 1.00 max
[Kh,Kc,8h,Qc]:2000 sample: 1.21 ave 3.00 max
[Ad,Kc,8h,Qc]:2000 sample: 0.16 ave 1.00 max
[Kh,Ad,Kc,8h,Qc]:2000 sample:
1.00 ave 1.00
max
best result:
[Kh,Kc]:2000 sample:
1.54 ave 25.00 max Why Convoluaonal Network? •  Card games are visual •  Learn properaes like pairs, ﬂushes, straights •  Proximity in the inputs macer [Kh,Qh,4h,3c,Jh]
[2d,2h,7c,6c,5c]
x23456789TJQKA
c.1...........
d.............
h..1......111.
s............. x23456789TJQKA
c...111.......
d1............
h1............
s............. Copy image ConvNet best pracaces Karen Simonyan & Andrew Zisserman explain… hcp://www.robots.ox.ac.uk/~karen/
pdf/ILSVRC_2014.pdf 5-‐Card Draw Poker: Network Shape input layer shape 100 x 5 x 17 x 17 convoluaon layer l_conv1. Shape (100, 16, 15, 15) convoluaon layer l_conv1_1. Shape (100, 16, 13, 13) maxPool layer l_pool1. Shape (100, 16, 7, 7) convoluaon layer l_conv2. Shape (100, 32, 5, 5) convoluaon layer l_conv2_2. Shape (100, 32, 3, 3) maxPool layer l_pool2. Shape (100, 32, 2, 2) hidden layer l_hidden1. Shape (100, 1024) dropout layer l_hidden1_dropout. Shape (100, 1024) ﬁnal layer l_out, into 32 dimension. Shape (100, 32) Training Simple Nuanced •  Learn all 32 outputs •  Loss = mean squared error •  Update with Nesterov Momentum •  Bias toward rare cases •  Round oﬀ large values –  Easily gets to 67% accurate moves –  99% [0.0 – 4.0] –  <1% [10.0 – 900.0] •  Switch to adapave learning –  AdaDelta works well –  Iniaalize working model Results! Real Return (100k hands) Valida4on% 15x15 same-‐shape 50k training size $0.930 73% 17x17 valid-‐shape 90k training size $0.944 70% 17x17 valid-‐shape 150k training size $0.955 78% 17x17 valid-‐shape 90k training (longer training) $0.983 77% Don’t trust the averages, since hugely asymmetric payoﬀ. Study your mistakes Biggest Errors 500 hands took 2397.96s
406 no error 52 tiny error
25 small error
17 big error
biggest errors:
(1.14,
(1.12,
(0.85,
(0.83,
(0.82,
(0.81,
(0.75,
(0.71,
…
(0.41,
'[4s,7h,Ah,3h,Jh]',
'[Ts,Kc,Jc,3h,Qc]',
'[Tc,2s,9c,8d,8c]',
'[3d,Td,9s,Ad,8d]',
'[6d,2c,3c,Tc,9c]',
'[Jc,9h,Kh,Ah,2h]',
'[2h,Qs,Jh,Kh,8h]',
'[9s,Th,As,4s,6s]',
'[7h,Ah,3h,Jh]', 1.26, '[4s,7h,Ah,Jh]', 0.12)
'[Kc,Jc,Qc]', 2.015, '[Ts,Kc,Jc,Qc]', 0.897)
'[8d,8c]', 0.852, '[Tc,2s,9c,8c]', 0.0)
'[3d,Td,Ad,8d]', 1.272, '[Td,Ad,8d]', 0.442)
'[2c,3c,Tc,9c]', 1.167, '[]', 0.340)
'[9h,Kh,Ah,2h]', 1.278, '[Jc,Kh,Ah]', 0.465)
'[2h,Jh,Kh,8h]', 1.261, '[Jh,Kh]', 0.513)
'[9s,As,4s,6s]', 1.229, '[As]', 0.511)
'[2s,Jh,Kd,Tc,Qs]', '[Jh,Kd,Tc,Qs]', 0.872, '[Jh,Kd,Qs]', 0.463) Struggling with straights, ﬂushes, straight ﬂushes. Will it learn with more ame? With becer examples? Lessons Learned Do Don’t do •  Keep network lean •  Endlessly ﬁddle with network shape •  Fiddle with learning rate •  Permute input data –  Deep but simple •  Use adapave learning –  Iniaalize with working model •  Train for a long ame •  Bias toward diﬃcult data –  Becer to get fresh samples Digital or Analog? DeepMind’s Atari AI, acempang to approximate an exact value. Neural Nets do bad “Exact Math” for games Obvious Improvements •  Train on errors –  Directly, or look for similar cases –  Generate more data, permute known cases •  Run much longer, on more data –  Total training: 400k cases, down-‐sampled to 150k •  Train mulaple models, and vote on result Beyond Draw Video Poker •  Diﬀerent video payout –  Start training on current model •  Triple Draw –  3 rounds, so train 3 models –  Same network shape, same output –  One big model? •  Incorporate be|ng, opponent hand informaaon Quesaons •  Diﬀerent network shape? •  How to handle input padding? •  Retrain on rare cases? –  Or a specialty network? –  (Backgammon AIs include 2-‐5 diﬀerent networks) Thank you! Bibliography •  GitHub: hcps://github.com/moscow25/deep_draw –  Ping me if you want to run it. Needs bit of cleanup. •  Lasagne: hcp://lasagne.readthedocs.org/en/latest/ •  Network shape (for images) –  hcp://www.robots.ox.ac.uk/~karen/pdf/ILSVRC_2014.pdf –  hcp://vision.stanford.edu/teaching/cs231n/slides/lecture8.pdf •  AdaDelta: hcp://www.machewzeiler.com/pubs/googleTR2012/googleTR2012.pdf •  DeepMind Atari: hcp://www.nature.com/nature/journal/v518/n7540/full/
nature14236.html •  PokerSnowie: hcps://www.pokersnowie.com/about/weaknesses.html •  Wizard of Odds: hcp://wizardofodds.com/games/video-‐poker/tables/jacks-‐or-‐becer/