Blackjack as a Test Bed for Learning Strategies
Blackjack as a Test Bed for Learning Strategies in Neural Networks A. Perez-Uribe and E. Sanchez Swiss Federal Institute of Technology IEEE IJCNN'98
Blackjack • Against the dealer • Dealer hits until 17 • Limited Implementation
Reinforcement Learning • • • Learning by Interaction Similar to MDP Delayed Reward Trial and Error Search Exploitation vs. Exploration Temporal Difference (TD)
Neural Network Basics • Neurons fire when weighted value is above a threshold. • Hidden layers of perceptrons.
Implementation • SARSA – TD – Bootstrapping Mechanism – Updates Q(s, a) using quintuple (s, a, r, s’, a’) – Directly approximates optimal Q* • Q-Learning – TDerr = r + g maxa’Q(s’a’) – Q(s, a)
Implementation
Setup • • No Ace: {4, 5, 6, …, 20} Ace: {22, 23, …, 31} Terminals: 21 and -1 (bust) e-greedy policy
Learning Constants • • • Reward r = -1 if loss, +1 if win, 0 after every hit Discount factor g = 0. 9 Step size a = 0. 01 e = 0. 01 Strategy Learned: – Only hit if score < 11 or ace held. – Very conservative – Double value of ace.
Fixed Strategies vs. Learned Strategy Avg(%) Max(%) Min(%) dealer’s 40. 7 57 25 hold 38. 3 51 24 random 31. 5 46 18 e Avg(%) Max(%) Min(%) 0. 1 39. 9 54 26 0. 01 41. 9 56 26 0. 1 * 0. 99 r 40. 9 53 26 • Over 1000 trials of 100 games each • Thorp’s Strategy can approach 49%
Other Experiments • Probabilistic Strategies – Nonstationary tasks – r = -1 for loss, 0 otherwise – Achieved 49. 14% win while learning • Three Players – Second player had option to watch first. – Stayed Conservative – Very low win percentage
References • • • A. Perez-Uribe and E. Sanchez, "Blackjack as a Test Bed for Learning Strategies in Neural Networks", Proceedings of the IEEE International Joint Conference on Neural Networks IJCNN'98 Frederic Meyer, “Java Blackjack and Reinforcement Learning”, http: //lslwww. epfl. ch/~anperez/Black. Jack/classes/RLJava. BJ. html Wikipedia – http: //en. wikipedia. org/wiki/Artificial_neural_network – http: //en. wikipedia. org/wiki/Blackjack – http: //en. wikipedia. org/wiki/Edward_O. _Thorp
- Slides: 11