Project 1




	Research idea: Gary Tesauro applied temporal difference (TD) learning to backgammon with spectacular success. Can we apply TD learning successfully to poker? Articles that we will read and discuss: Temporal difference learning and TD-gammon, by Gerald Tesauro. Practical issues in temporal difference learning, by Gerald Tesauro. Learning to predict by the methods of temporal differences, by Richard Sutton. Using probabilistic knowledge and simulation to play poker, by Darse Billings, Lourdes Pena, Jonathan Schaeffer and Duane Szafron. Using selective-sampling simulations in poker, by Darse Billings, Denis Papp, Lourdes Pena, Jonathan Schaeffer and Duane Szafron. Opponent modeling in poker, by Darse Billings, Denis Papp, Jonathan Schaeffer and Duane Szafron. Representations and solutions for game-theoretic problems, by Daphne Koller and Avi Pfeffer. Schedule: Each section has two groups. Each group meets twice a week. Group 1 will meet on Monday & Thursday, except as noted. Group 2 will meet on Tuesday & Friday, except at noted. Seminar overview (all, Monday). TD learning (Tesauro: backgammon), presentation. TD learning (Tesauro: backgammon), presentation and discussion. Poker (Schaeffer et al: Loki), presentation and discussion. Poker (Koller & Pfeffer: Gala), presentation and discussion. Summary discussion. Begin developing the project details. Complete development of project details. Team meeting (work on project). Status report. Status report. Status report. Final reports (oral presentation) (all, Thursday and Friday, both 8th and 9th periods where possible). Paper presentations: the team of six will be divided into three pairs. One pair will present TD learning, one Poker (Loki) and one Poker (Gala). Final reports: the team of six will be divided into two trios. One trio will deliver an oral presenation, while the other delivers a written presentation.

[Present a paper]

[Your research]