Course Outline

Stat234: Sequential Decision Making

Draft Course Outline

Date	Topic	Reference	Instructor/Lecture Notes
01/29	Intro	Read Ch. 1 of Sutton & Barto, Ch. 13 of Mitchell Description of some of mobile health studies	Susan
01/31	Bandit	Read Ch. 2 of Sutton & Barto	Susan
02/05	MDPs Susan is out of town	Read Ch. 3-5 of Sutton & Barto	Walter Dempsey
02/07	MDPs	Read Ch. 3-5 of Sutton & Barto	Susan
02/12	Two decision making problems: The learning algorithm (Bandit alg./RL algorithm) and the policy that solves MDP		Susan
Each student must arrange a skype/bluejeans call to discuss initial project ideas with Susan between 02/19-02/28
02/14	Intro to Thompson Sampling. TS is a randomization Algorithm. Connect to L_2 penalization	Russo, Van Roy, Kazerouni, Osband and Wen, 2017	Susan (we covered eligibity traces instead)
02/19	Don’t really need to view T-S as Bayesian. Susan is out of town	Abeille, Lazariz, 2017	Serena Yeung
02/21	Least Squares Methods in RL Susan is out of town	Bellman equation → LSTD, LSPI, Fitted Q-iteration, LSVI(see algorithm 3 in appendix)	Walter Dempsey
02/26	Off-Policy Learning and influence of behavior policy Susan is out of town		Walter Dempsey
02/28	Continuation of last 3 classes Susan is out of town		Walter Dempsey
03/05	A mix between a T-S Bandit and a full RL algorithm	Revision of Peng’s protocol paper	Peng Liao
Initial Project Proposal Due 03/05
03/07	Intro to Thompson Sampling. TS is a randomization Algorithm. Connect to L_2 penalization	Russo, Van Roy, Kazerouni, Osband and Wen, 2017	Susan
03/12	Intro to Bayesian RL	Vlassis, Ghavamzadeh, Mannor, and Poupart 2012; Ghavamzadeh, Mannor, Pineau, Tamar, 2015; Russo, Van Roy, Kazerouni, Osband and Wen. 2017	Marianne Menictas & Susan
03/14	Bootstrapped Thompson Sampling and Deep Exploration	Osband and van Roy, 2015; Russo, Van Roy, Kazerouni, Osband and Wen. 2017	Sabina Tomkins & Susan
Revised Project Proposal Due 03/14
03/26	Continuation of last 2 sessions		Sabina Tomkins & Susan
03/28	Pooled Reinforcement Learning in Mobile Health		Sabina Tomkins, Peng Liao & Serena Yeung
04/02	(More) Efficient RL via Posterior Sampling	Osband, van Roy and Russo, 2013	Susan
04/04	(More) Efficient RL via Posterior Sampling	Osband, van Roy and Russo, 2013	Susan
04/09	On Optimistic versus Randomized Exploration in RL And Summarize Class so Far	Osband and van Roy, 2017a	Susan
04/11	Why is Posterior Sampling Better than Optimism for RL?	Osband and van Roy, 2017b	Susan
04/16	RLSVI	Osband, van Roy, Wen, 2016; Osband, van Roy, Russo, Wen, 2018	Susan & Celine Liang & Serena Yeung
04/18	RLSVI	Osband, van Roy, Wen, 2016; Osband, van Roy, Russo, Wen, 2018	Susan & Celine Liang & Serena Yeung
04/23	Continuation of last 2 sessions	Discuss how these ideas might be used in mobile health	Susan & Celine Liang & Serena Yeung
Projects Due 04/23
04/25	Poster Session!	Maxwell Dworkin lobby
04/30	Statistical Process Control for monitoring an RL algorithm	Review of a variety of control charts for monitoring the online performance of an RL algorithm. Review role of ARL(average run length) Discussion of whether this is useful in mHealth. References: Wikipedia; Ulkhaq and Dewanta, 2017; MS thesis by Demirkol, 2008.	Tianchen Qian