Stat234:
Sequential Decision Making
Draft Course
Outline
Date |
Topic |
Reference |
Lecture Notes |
01/23 |
Intro |
Read Ch. 1 of Sutton & Barto, Ch.
13 of Mitchell, Ch. 1 of Littman’s Thesis Description of HeartSteps
V1 Study |
|
01/25 |
Bandits |
Read Ch. 2 of Sutton & Barto Description of Bariatric Surgery Study |
|
01/30 |
Bandits and Intro to Mobile Health |
Mash’s protocol paper on SARA, Draft of Pedja’s outcome paper for HeartSteps
V1; Shawna’s paper; Billie’s paper These papers are in Canvas |
|
02/01 |
Intro to Mobile Health and Bandits, MDPs Linear-Quadratic Control |
Read Ch. 3, 4 of Sutton & Barto and Ch. 2, section 2.2, 2.6 of Littman’s Thesis We skipped material below: Wiki, ILQR by Li
and Todorov, 2004, and section
3 of Fu, Levine, Abbeel, 2016 Instead we reviewed SARA mobile health study |
|
02/06 |
Intro to Mobile Health and Bandits, MDPs Linear-Quadratic Control |
Read Ch. 3, 4 of Sutton & Barto and Ch. 2, section 2.2, 2.6 of Littman’s Thesis We skipped material below: Wiki, ILQR by Li
and Todorov, 2004, and section
3 of Fu, Levine, Abbeel, 2016 Instead we reviewed SARA mobile health study |
|
02/08 |
MDPs |
Read Ch. 3-5 of Sutton & Barto |
|
02/13 |
MDPs |
Read Ch. 3-5 of Sutton & Barto |
|
02/15 |
Least Squares Methods in RL |
Lagoudakis,
Parr, Littman, 2002 Lagoudakis, Parr,
2003 |
|
02/20 |
Off-Policy Learning **Initial Project Proposal Due** |
Jiang and Li, 2016; Munos, Stepleton, Harutyunyan, Bellemare,
2016; Thomas, Brunskill, 2016 |
|
02/22 &02/27 &3/01 |
Off-Policy Learning |
Jiang and Li, 2016; Munos, Stepleton, Harutyunyan, Bellemare,
2016; Thomas, Brunskill, 2016 |
|
03/06 |
Finish up eligibility traces and off-policy learning |
Jiang and Li, 2016; Munos, Stepleton, Harutyunyan, Bellemare, 2016; Thomas, Brunskill, 2016 |
|
03/08 |
HeartSteps V2 |
Brainstorm re Plans for HeartSteps
V2 |
|
03/20 |
Recap of semester so far |
|
|
03/22 |
Regularization via a Planning Horizon |
|
|
03/27 |
NO CLASS |
|
|
03/29 |
Experience Replay & Prioritized Experience Replay |
Mnih et al.,
2015 & Shaul, Quan, Antonoglou, Silver,
2016; focus on the use of experience replay to speed up learning |
|
04/03 |
Experience Replay & Prioritized Experience Replay |
Mnih et al.,
2015 & Shaul, Quan, Antonoglou, Silver,
2016; focus on the use of experience replay to speed up learning |
|
04/05 |
Separating the modeling of the Advantage from the
Value function |
Wang
et al., 2016; focus on the use of
both experience replay as well as the separation of the models for the
advantage function from the value function |
|
04/10 |
Hindsight Experience Replay Susan
is out of town |
Andrychowicz et al., 2017; Tamar et al., 2017 |
Mash and Walter will lead this class |
04/12 |
NO CLASS Susan is out of town |
|
|
04/17 |
Hierarchical Reinforcement Learning |
Barto & Mahadevan, 2003, Hengst, 2017 |
|
04/19 |
Poster Session! |
|
|
04/24 |
Meta-Learning Shared Hierarchies & Projects Due |
Frans
et al., 2017 |
|