Stat234:
Sequential Decision Making
Draft Course
Outline
Date |
Topic |
Reference |
Instructor/Lecture Notes |
01/29 |
Intro |
Susan |
|
01/31 |
Bandit |
Read Ch. 2 of Sutton &
Barto |
Susan |
02/05 |
MDPs Susan
is out of town |
Read Ch. 3-5 of Sutton &
Barto |
Walter Dempsey |
02/07 |
MDPs |
Read Ch. 3-5 of Sutton &
Barto |
Susan |
02/12 |
Two decision
making problems: The learning
algorithm (Bandit alg./RL algorithm) and the policy that solves MDP |
|
Susan |
Each
student must arrange a skype/bluejeans call to
discuss initial project ideas with Susan between 02/19-02/28 |
|||
02/14 |
Intro to Thompson Sampling. TS is a randomization Algorithm. Connect to
L_2 penalization |
Susan (we covered eligibity
traces instead) |
|
02/19 |
Don’t really need to view T-S as Bayesian. Susan
is out of town |
Serena Yeung |
|
02/21 |
Least Squares Methods in RL Susan
is out of town |
Bellman equation → LSTD, LSPI, Fitted
Q-iteration, LSVI(see algorithm
3 in appendix) |
Walter Dempsey |
02/26 |
Off-Policy Learning and influence of behavior policy Susan
is out of town |
Walter Dempsey |
|
02/28 |
Continuation of last 3 classes Susan
is out of town |
|
Walter Dempsey |
03/05 |
A mix between a T-S Bandit and a full RL algorithm |
Revision of Peng’s protocol
paper |
Peng Liao |
**Initial Project Proposal Due 03/05** |
|||
03/07 |
Intro to Thompson Sampling. TS is a randomization Algorithm. Connect to
L_2 penalization |
Susan |
|
03/12 |
Intro to Bayesian RL |
Marianne Menictas &
Susan |
|
03/14 |
Bootstrapped Thompson Sampling and Deep
Exploration |
Sabina Tomkins & Susan |
|
**Revised Project Proposal Due 03/14** |
|||
03/26 |
Continuation of last 2 sessions |
Sabina Tomkins & Susan |
|
03/28 |
Pooled Reinforcement Learning in Mobile Health |
|
Sabina Tomkins, Peng Liao & Serena Yeung |
04/02 |
(More) Efficient RL via Posterior Sampling |
Susan |
|
04/04 |
(More) Efficient RL via Posterior Sampling |
Susan |
|
04/09 |
On
Optimistic versus Randomized Exploration in RL And Summarize Class so Far |
Susan |
|
04/11 |
Why is Posterior Sampling Better than Optimism for RL? |
Susan |
|
04/16 |
RLSVI |
Susan & Celine Liang & Serena Yeung |
|
04/18 |
RLSVI |
Susan & Celine Liang & Serena Yeung |
|
04/23 |
Continuation of last 2
sessions |
Discuss how these ideas might be used in mobile
health |
Susan & Celine Liang & Serena Yeung |
**Projects Due 04/23** |
|||
04/25 |
Poster Session! |
Maxwell Dworkin lobby |
|
04/30 |
Statistical Process Control for monitoring an RL
algorithm |
Review of a variety of control charts for monitoring
the online performance of an RL algorithm.
Review role of ARL(average run length) Discussion of whether this is useful in mHealth. References: Wikipedia;
Ulkhaq and Dewanta, 2017;
MS
thesis by Demirkol, 2008. |
Tianchen
Qian |