Stat234: Sequential Decision Making

 

Draft Course Outline

 

Date

Topic

Reference

Instructor/Lecture Notes

01/29

Intro

Read Ch. 1 of Sutton & Barto, Ch. 13 of Mitchell

Description of some of mobile health studies

Susan

01/31

Bandit

Read Ch. 2 of Sutton & Barto

Susan

02/05

MDPs

Susan is out of town

Read Ch. 3-5 of Sutton & Barto

Walter Dempsey

02/07

 MDPs

Read Ch. 3-5 of Sutton & Barto

Susan

02/12

 Two decision making problems:  The learning algorithm (Bandit alg./RL algorithm) and the policy that solves MDP

 

 

Susan

Each student must arrange a skype/bluejeans call to discuss initial project ideas with Susan between 02/19-02/28

02/14

Intro to Thompson Sampling.  TS is a randomization Algorithm. Connect to L_2 penalization

Russo, Van Roy, Kazerouni, Osband and Wen, 2017

Susan (we covered eligibity traces instead)

02/19

Don’t really need to view T-S as Bayesian. 

Susan is out of town

Abeille, Lazariz, 2017

Serena Yeung

02/21

Least Squares Methods in RL

Susan is out of town

Bellman equation → LSTD, LSPI, Fitted Q-iteration, LSVI(see algorithm 3 in appendix)

Walter Dempsey

02/26

Off-Policy Learning and influence of behavior policy

Susan is out of town

Walter Dempsey

02/28

Continuation of last 3 classes

Susan is out of town

 

Walter Dempsey

03/05

A mix between a T-S Bandit and a full RL algorithm

Revision of Peng’s protocol paper

Peng Liao

**Initial Project Proposal Due 03/05**

03/07

Intro to Thompson Sampling.  TS is a randomization Algorithm. Connect to L_2 penalization

Russo, Van Roy, Kazerouni, Osband and Wen, 2017

Susan

03/12

Intro to Bayesian RL

Vlassis, Ghavamzadeh, Mannor, and Poupart  2012;

Ghavamzadeh,  Mannor,

Pineau, Tamar, 2015;

Russo, Van Roy, Kazerouni, Osband and Wen. 2017

Marianne Menictas & Susan

03/14

Bootstrapped Thompson Sampling and Deep Exploration

Osband and van Roy, 2015;

Russo, Van Roy, Kazerouni, Osband and Wen. 2017

Sabina Tomkins & Susan

**Revised Project Proposal Due 03/14**

03/26

Continuation of last 2 sessions

Sabina Tomkins & Susan

03/28

Pooled Reinforcement Learning in Mobile Health

 

 

Sabina Tomkins, Peng Liao & Serena Yeung

04/02

(More) Efficient RL via

Posterior Sampling

Osband, van Roy and Russo, 2013

Susan

04/04

(More) Efficient RL via

Posterior Sampling

Osband, van Roy and Russo, 2013

Susan

04/09

On Optimistic versus Randomized Exploration

in RL

And

Summarize Class so Far

Osband and van Roy, 2017a

Susan

04/11

Why is Posterior Sampling Better than Optimism for RL?

Osband and van Roy, 2017b

Susan

04/16

RLSVI

Osband, van Roy, Wen, 2016;

Osband, van Roy, Russo, Wen, 2018

Susan & Celine Liang & Serena Yeung

04/18

RLSVI

Osband, van Roy, Wen, 2016;

Osband, van Roy, Russo, Wen, 2018

Susan & Celine Liang & Serena Yeung

04/23

Continuation of last 2 sessions

Discuss how these ideas might be used in mobile health

Susan & Celine Liang & Serena Yeung

**Projects Due 04/23**

04/25

Poster Session!

Maxwell Dworkin lobby

 

04/30

Statistical Process Control for monitoring an RL algorithm

Review of a variety of control charts for monitoring the online performance of an RL algorithm.  Review role of ARL(average run length)   Discussion of whether this is useful in mHealth.

References: Wikipedia; Ulkhaq and Dewanta, 2017; MS thesis by Demirkol, 2008.

Tianchen Qian