Stat234: Sequential Decision Making

 

Draft Course Outline

 

Date

Topic

Reference

Lecture Notes

01/23

Intro

Read Ch. 1 of Sutton & Barto, Ch. 13 of Mitchell,  Ch. 1 of Littman’s Thesis

Description of HeartSteps V1  Study

 

01/25

Bandits

Read Ch. 2 of Sutton & Barto

 

Description of Bariatric Surgery Study

 

01/30

Bandits and Intro to Mobile Health

Mash’s protocol paper on SARA, Draft of Pedja’s outcome paper for HeartSteps V1; Shawna’s paper;  Billie’s paper

These papers are in Canvas

 

02/01

Intro to Mobile Health and Bandits, MDPs

 

Linear-Quadratic Control

Read Ch. 3, 4 of Sutton & Barto and Ch. 2, section 2.2, 2.6 of Littman’s Thesis 

 

We skipped material below:

Wiki, ILQR by Li and Todorov, 2004, and section 3 of Fu, Levine, Abbeel, 2016

 

Instead we reviewed SARA mobile health study

 

02/06

Intro to Mobile Health and Bandits, MDPs

 

Linear-Quadratic Control

Read Ch. 3, 4 of Sutton & Barto and Ch. 2, section 2.2, 2.6 of Littman’s Thesis 

 

We skipped material below:

Wiki, ILQR by Li and Todorov, 2004, and section 3 of Fu, Levine, Abbeel, 2016

 

Instead we reviewed SARA mobile health study

 

02/08

MDPs

Read Ch. 3-5 of Sutton & Barto

 

02/13

MDPs

Read Ch. 3-5 of Sutton & Barto

 

02/15

Least Squares Methods in RL

Lagoudakis, Parr, Littman, 2002

Lagoudakis, Parr, 2003

 

02/20

Off-Policy Learning

**Initial Project Proposal Due**

Jiang and Li, 2016; Munos, Stepleton, Harutyunyan, Bellemare, 2016; Thomas, Brunskill, 2016

 

02/22 &02/27 &3/01

Off-Policy Learning

Jiang and Li, 2016; Munos, Stepleton, Harutyunyan, Bellemare, 2016; Thomas, Brunskill, 2016

 

03/06

Finish up eligibility traces and off-policy learning

Jiang and Li, 2016; Munos, Stepleton, Harutyunyan, Bellemare, 2016; Thomas, Brunskill, 2016

 

03/08

HeartSteps V2

Brainstorm re Plans for HeartSteps V2

 

03/20

Recap of semester so far

 

 

03/22

Regularization via a Planning Horizon

Jiang, Kulesza, Singh, Lewis, 2015

 

03/27

NO CLASS

 

03/29

Experience Replay & Prioritized Experience Replay

Mnih et al., 2015 & Shaul, Quan, Antonoglou, Silver, 2016; focus on the use of experience replay to speed up learning

 

04/03

Experience Replay & Prioritized Experience Replay

Mnih et al., 2015 & Shaul, Quan, Antonoglou, Silver, 2016; focus on the use of experience replay to speed up learning

 

04/05

Separating the modeling of the Advantage from the Value function

Wang et al., 2016; focus on the use of both experience replay as well as the separation of the models for the advantage function from the value function

 

04/10

Hindsight Experience Replay

Susan is out of town

Andrychowicz et al., 2017; Tamar et al., 2017

Mash and Walter will lead this class

04/12

NO CLASS Susan is out of town

 

04/17

Hierarchical Reinforcement Learning

Barto & Mahadevan, 2003, Hengst, 2017

 

04/19

Poster Session!

 

 

04/24

Meta-Learning Shared Hierarchies & Projects Due

Frans et al., 2017