Playing Draw Poker with Convolu4onal Neural Nets Nikolai Yakovenko 4/22/15 for EE6894 What is Draw Poker? • Five-‐card poker with one exchange • 10,000 machines in Las Vegas alone • Pays out $0.95 to $1.007 per dollar*… *with perfect play • Add photo of payout table 100% payout with perfect play. The Machine’s Edge What do you do here? Rule #7 Draw 3 to a Royal Flush! Worth $0.80 on average Worth $1.80 on average Easy to get 99.5% payout Just follow these 25 easy rules... No wonder people make mistakes Typical human plays 5% below expectaaon • • • • • • • • • • • • • • • • • • • • • • • • • Four of a kind, straight flush, royal flush 4 to a royal flush Three of a kind, straight, flush, full house 4 to a straight flush Two pair High pair 3 to a royal flush 4 to a flush Low pair 4 to an outside straight 3 to a straight flush (type 1) AKQJ unsuited 2 suited high cards 4 to an inside straight with 3 high cards 3 to a straight flush (type 2) KQJ unsuited QJ unsuited JT suited KQ, KJ unsuited QT suited AK, AQ, AJ unsuited KT suited One high card 3 to a straight flush (type 3) Discard everything Can learning do becer? Data Representaaon All 32 sim results for hand [Kh,Ad,Kc,8h,Qc]: []:2000 sample: 0.27 ave 6.00 max [4d,4h,Jh,5d,4c] x23456789TJQKA c..1.......... d..11......... h..1......1... s............. 4x13 binary matrix, each card 32-‐length vector for all draws [Kh]:20000 sample: 0.33 ave 9.00 max [Ad]:20000 sample: 0.45 ave 25.00 max [Kc]:20000 sample: 0.33 ave 25.00 max [8h]:2000 sample: 0.35 ave 25.00 max [Qc]:20000 sample: 0.44 ave 50.00 max [Kh,Ad]:2000 sample: 0.41 ave 9.00 max [Kh,Kc]:2000 sample: 1.54 ave 25.00 max … [Kc,Qc]:20000 sample: 0.66 ave 976.00 max [8h,Qc]:2000 sample: 0.34 ave 9.00 max [Kh,Ad,Kc]:2000 sample: 1.38 ave 25.00 max … [Kh,Kc,8h]:2000 sample: 1.42 ave 25.00 max [Kh,Kc,Qc]:2000 sample: 1.45 ave 25.00 max … [Kh,Ad,Kc,8h]:2000 sample: 1.23 ave 3.00 max [Kh,Ad,Kc,Qc]:2000 sample: 1.21 ave 3.00 max [Kh,Ad,8h,Qc]:2000 sample: 0.18 ave 1.00 max [Kh,Kc,8h,Qc]:2000 sample: 1.21 ave 3.00 max [Ad,Kc,8h,Qc]:2000 sample: 0.16 ave 1.00 max [Kh,Ad,Kc,8h,Qc]:2000 sample: 1.00 ave 1.00 max best result: [Kh,Kc]:2000 sample: 1.54 ave 25.00 max Why Convoluaonal Network? • Card games are visual • Learn properaes like pairs, flushes, straights • Proximity in the inputs macer [Kh,Qh,4h,3c,Jh] [2d,2h,7c,6c,5c] x23456789TJQKA c.1........... d............. h..1......111. s............. x23456789TJQKA c...111....... d1............ h1............ s............. Copy image ConvNet best pracaces Karen Simonyan & Andrew Zisserman explain… hcp://www.robots.ox.ac.uk/~karen/ pdf/ILSVRC_2014.pdf 5-‐Card Draw Poker: Network Shape input layer shape 100 x 5 x 17 x 17 convoluaon layer l_conv1. Shape (100, 16, 15, 15) convoluaon layer l_conv1_1. Shape (100, 16, 13, 13) maxPool layer l_pool1. Shape (100, 16, 7, 7) convoluaon layer l_conv2. Shape (100, 32, 5, 5) convoluaon layer l_conv2_2. Shape (100, 32, 3, 3) maxPool layer l_pool2. Shape (100, 32, 2, 2) hidden layer l_hidden1. Shape (100, 1024) dropout layer l_hidden1_dropout. Shape (100, 1024) final layer l_out, into 32 dimension. Shape (100, 32) Training Simple Nuanced • Learn all 32 outputs • Loss = mean squared error • Update with Nesterov Momentum • Bias toward rare cases • Round off large values – Easily gets to 67% accurate moves – 99% [0.0 – 4.0] – <1% [10.0 – 900.0] • Switch to adapave learning – AdaDelta works well – Iniaalize working model Results! Real Return (100k hands) Valida4on% 15x15 same-‐shape 50k training size $0.930 73% 17x17 valid-‐shape 90k training size $0.944 70% 17x17 valid-‐shape 150k training size $0.955 78% 17x17 valid-‐shape 90k training (longer training) $0.983 77% Don’t trust the averages, since hugely asymmetric payoff. Study your mistakes Biggest Errors 500 hands took 2397.96s 406 no error 52 tiny error 25 small error 17 big error biggest errors: (1.14, (1.12, (0.85, (0.83, (0.82, (0.81, (0.75, (0.71, … (0.41, '[4s,7h,Ah,3h,Jh]', '[Ts,Kc,Jc,3h,Qc]', '[Tc,2s,9c,8d,8c]', '[3d,Td,9s,Ad,8d]', '[6d,2c,3c,Tc,9c]', '[Jc,9h,Kh,Ah,2h]', '[2h,Qs,Jh,Kh,8h]', '[9s,Th,As,4s,6s]', '[7h,Ah,3h,Jh]', 1.26, '[4s,7h,Ah,Jh]', 0.12) '[Kc,Jc,Qc]', 2.015, '[Ts,Kc,Jc,Qc]', 0.897) '[8d,8c]', 0.852, '[Tc,2s,9c,8c]', 0.0) '[3d,Td,Ad,8d]', 1.272, '[Td,Ad,8d]', 0.442) '[2c,3c,Tc,9c]', 1.167, '[]', 0.340) '[9h,Kh,Ah,2h]', 1.278, '[Jc,Kh,Ah]', 0.465) '[2h,Jh,Kh,8h]', 1.261, '[Jh,Kh]', 0.513) '[9s,As,4s,6s]', 1.229, '[As]', 0.511) '[2s,Jh,Kd,Tc,Qs]', '[Jh,Kd,Tc,Qs]', 0.872, '[Jh,Kd,Qs]', 0.463) Struggling with straights, flushes, straight flushes. Will it learn with more ame? With becer examples? Lessons Learned Do Don’t do • Keep network lean • Endlessly fiddle with network shape • Fiddle with learning rate • Permute input data – Deep but simple • Use adapave learning – Iniaalize with working model • Train for a long ame • Bias toward difficult data – Becer to get fresh samples Digital or Analog? DeepMind’s Atari AI, acempang to approximate an exact value. Neural Nets do bad “Exact Math” for games Obvious Improvements • Train on errors – Directly, or look for similar cases – Generate more data, permute known cases • Run much longer, on more data – Total training: 400k cases, down-‐sampled to 150k • Train mulaple models, and vote on result Beyond Draw Video Poker • Different video payout – Start training on current model • Triple Draw – 3 rounds, so train 3 models – Same network shape, same output – One big model? • Incorporate be|ng, opponent hand informaaon Quesaons • Different network shape? • How to handle input padding? • Retrain on rare cases? – Or a specialty network? – (Backgammon AIs include 2-‐5 different networks) Thank you! Bibliography • GitHub: hcps://github.com/moscow25/deep_draw – Ping me if you want to run it. Needs bit of cleanup. • Lasagne: hcp://lasagne.readthedocs.org/en/latest/ • Network shape (for images) – hcp://www.robots.ox.ac.uk/~karen/pdf/ILSVRC_2014.pdf – hcp://vision.stanford.edu/teaching/cs231n/slides/lecture8.pdf • AdaDelta: hcp://www.machewzeiler.com/pubs/googleTR2012/googleTR2012.pdf • DeepMind Atari: hcp://www.nature.com/nature/journal/v518/n7540/full/ nature14236.html • PokerSnowie: hcps://www.pokersnowie.com/about/weaknesses.html • Wizard of Odds: hcp://wizardofodds.com/games/video-‐poker/tables/jacks-‐or-‐becer/
© Copyright 2024