|
Research idea: Gary Tesauro applied temporal difference (TD) learning to backgammon with spectacular success. Can we apply TD learning successfully to poker?
Articles that we will read and discuss:
- Temporal difference learning and TD-gammon, by Gerald Tesauro.
- Practical issues in temporal difference learning, by Gerald Tesauro.
- Learning to predict by the methods of temporal differences, by Richard Sutton.
- Using probabilistic knowledge and simulation to play poker, by Darse Billings, Lourdes Pena, Jonathan Schaeffer and Duane Szafron.
- Using selective-sampling simulations in poker, by Darse Billings, Denis Papp, Lourdes Pena, Jonathan Schaeffer and Duane Szafron.
- Opponent modeling in poker, by Darse Billings, Denis Papp, Jonathan Schaeffer and Duane Szafron.
- Representations and solutions for game-theoretic problems, by Daphne Koller and Avi Pfeffer.
Schedule: Each section has two groups. Each group meets twice a week. Group 1 will meet on Monday & Thursday, except as noted. Group 2 will meet on Tuesday & Friday, except at noted.
- Seminar overview (all, Monday).
- TD learning (Tesauro: backgammon), presentation.
- TD learning (Tesauro: backgammon), presentation and discussion.
- Poker (Schaeffer et al: Loki), presentation and discussion.
- Poker (Koller & Pfeffer: Gala), presentation and discussion.
- Summary discussion. Begin developing the project details.
- Complete development of project details.
- Team meeting (work on project).
- Status report.
- Status report.
- Status report.
- Final reports (oral presentation)
(all, Thursday and Friday, both 8th and 9th periods where possible).
Paper presentations: the team of six will be divided into three pairs. One pair will present TD learning, one Poker (Loki) and one Poker (Gala).
Final reports: the team of six will be divided into two trios. One trio will deliver an oral presenation, while the other delivers a written presentation. |
|