Statistics 234

Spring, 2024

This graduate course will focus on reinforcement learning algorithms and sequential decision-making methods with special attention to how these methods can be used in digital health. Reinforcement learning (RL) is the area of machine learning which is concerned with sequential decision making. We will focus on the areas of sequential decision making that concern both how to select optimal treatment actions as well as how to evaluate the impact of these actions.

Digital health is an area that lies within multiple scientific disciplines including statistical science, computer science, behavioral science, and cognitive neuroscience. This makes for very exciting interdisciplinary work! Smartphones and wearable devices have remarkable sensing capabilities allowing us to understand the context in which a person is at a given moment. These devices also can deliver treatment actions tailored to the specific needs of users in a given location at a given time. Figuring out when and in which context, which treatment actions to deliver, can assist people in achieving their longer-term health goals. Across multiple classes we will brainstorm about how the methods we discussed during that class might be useful in digital health.

This course will cover the following topics: Markov Decision Processes, on-policy and off-policy RL, least squares methods in RL and Bayesian RL, namely posterior sampling. Most of the course will focus on Bayesian RL via posterior sampling. Bayesian RL is particularly useful in mobile health as posterior sampling facilitates off-policy and continual learning. Further the Bayesian paradigm facilitates the use of prior data in initializing an RL algorithm. If time permits, we will spend some time at the end of the semester on hierarchical RL as this area provides a way to start thinking about managing multiple types of mHealth treatments each targeting a different reward. Other topics from statistics, machine learning and RL that are potentially important in digital health but that we won’t cover are (you could consider in your class project) include: 1) transfer learning (using data on other similar users to enable faster learning); 2) non-stationarity (dealing with slowly changing or abrupt changes in user behavior); 3) interpretability of policies (enabling communication with behavioral scientists by making connections to behavioral theories); 4) using approximate system dynamic models to speed up learning, 5) multi-agent RL and 6) multi-task learning.

Professor: Susan Murphy (samurphy@g.harvard.edu).

Class Times: Monday and Wednesday 1:30pm-2:45pm Science Center 222. No class 2/19, 3/11, 3/13.

TF:

Daiqi Gao (dgao@fas.harvard.edu)

Ziping Xu (zipingxu@fas.harvard.edu)

Office Hours:

Susan Murphy’s Office Hours: By appointment at 5:15pm EST on Thursdays in SEC 2.335 except 3/14

Ziping Xu’s Office Hours: 3-4pm Mondays in SC 316.06 except 3/11

Daiqi Gao’s Office Hours: 3:30-4:30pm on Wednesdays in SC 316.06 except 3/13

Canvas

Website

Book: Sutton R. & Barto A. (2020). Reinforcement Learning: An Introduction (2^nd Edition). Cambridge: The MIT Press. No purchase is necessary; you can download a pdf copy here.

Ch. 21 of Russell & Norvig, (Artificial Intelligence A Modern Approach, 3rd edition is on Canvas in the Files Section). 4th edition can be found via the Harvard Library Hollis.

Required Papers: A variety of papers will be assigned; see below.

Prerequisites: Recommended prerequisites are the equivalent of stat210 and compsci181.

Typical Class:

1:30pm: Finish turning in your Quiz on Canvas

1:30pm: Sit with your breakout group.

1:30-2:00pm: 30 Min. Lecture

2:00-2:10pm: Breakout with your group (Discuss quiz and question posed in Lecture)

2:10-2:20pm: Class Discussion (one of the groups leads the discussion)

2:20-2:45pm: 25 Min. Lecture

Course Outline: This outline will be constantly updated—please check prior to each class!

Date	Topic	Reading Assignments
01/22	Intro	Ch. 1-2 of Sutton & Barto, Description of some mobile health studies
01/24	Intro	Ch. 1-2 of Sutton & Barto, Description of some mobile health studies OptimalPolicyStationary.pdf On Canvas under Files> Discussions and Other Helpful Material
01/29	Bandit	Ch. 2 of Sutton & Barto
01/31	Bandit	Ch. 2 of Sutton & Barto
02/05	MDPs	Ch. 3 of Sutton & Barto
02/07	MDPs	Ch. 3,4,5 of Sutton & Barto Files on Canvas: OptimalPolicyStationary.pdf and M_Z_Estimating Functions.pdf
02/12	Two decision making problems: The learning algorithm (Bandit alg./RL algorithm) and the policy that solves MDP	TemporalCreditExplorationExploitationDiscussion.pdf This file is on Canvas under Files> Discussions and Other Helpful Material
02/14	TD Learning & Control	Ch. 6 of Sutton & Barto. TemporalCreditExplorationExploitationDiscussion.pdf On Canvas under Files> Discussions and Other Helpful Material
02/21	TD Learning	Ch. 6 of Sutton & Barto. EligibilityTracesDiscussion.pdf and M_Z_Estimating Functions.pdf On Canvas under Files> Discussions and Other Helpful Material
Each student must arrange a 30 min. meeting with Daiqi, Ziping or Susan to discuss initial project ideas between 02/26-03/02
02/26	Control Least Squares Methods in RL	Sections 6.4-6.6 of Sutton & Barto.
02/28	Batch, Off-Policy RL	The file Discount_Factor.pdf which is on Canvas under Files> Discusssions and Other Helpful Material Also read Review of Batch RL
03/04	Oralytics	An RL algorithm for the Oralytics digital app Oralytics RL.pdf On Canvas under Files> Discusssions and Other Helpful Material	Anna Trella
03/06	LSPI, More Batch, Off-Policy RL	LSPI, LSVI(only need to understand algorithm 3 in appendix) M_Z_Estimating Functions.pdf On Canvas under Files> Discussions and Other Helpful Material Review of Batch RL
Initial Project Proposal Due in Canvas 03/19, 5pmEST; see the STAT 234 Project Evaluation in Canvas Files.
03/18	LSPI, Fitted Q Iteration, More Batch, Off-Policy RL	LSPI, FQI
03/20	Fitted Q Iteration, More Batch, Off-Policy RL	FQI, LSVI(only need to understand algorithm 3 in appendix)
03/25	Thompson Sampling.	Tutorial on Thompson-Sampling, (Sections 1-4, 7.1,7.5, 8.1.3, example 8.4) File on Canvas: LSVI_Notes.pdf
03/27	Thompson Sampling. Connect to L_2 penalization	Russo, Van Roy, Kazerouni, Osband and Wen, 2017, revised 2020 (Sections 1-4, 7.1,7.5, 8.1.3, example 8.4)
04/01	Thompson Sampling. Connect to L_2 penalization	Russo, Van Roy, Kazerouni, Osband and Wen, 2017, revised 2020 (Sections 1-4, 7.1,7.5, 8.1.3, example 8.4)
Revised Project Proposal Due in Canvas on 04/02, 5pmEST; see the STAT 234 Project Evaluation in Canvas Files
04/03	Information theory viewpoint of TS	Some Information Theory. An Information-Theoretic Analysis of Thompson Sampling by Russo and van Roy.	Ziping Xu
04/08	Information theory viewpoint of TS	An Information-Theoretic Analysis of Thompson Sampling by Russo and van Roy.	Ziping Xu
04/10	Carryover from 4/08+ projects	Be prepared to be randomly selected to discuss your project plans!!!
04/15	Information theory & RL	Learning to Optimize via Information-Directed Sampling	Ziping Xu
04/17	MiWaves	An RL algorithm for the MiWaves app	Susobhan Ghosh
04/22	Information theory, Satisficing actions	Deciding What to Learn: A Rate Distortion Approach	Ziping Xu
Posters Due in Canvas on 04/23 at 5pm EST; see the STAT 234 Project Evaluation in Canvas Files
04/24	Poster Session!	Poster Session at Science Center Library
Projects Due 05/02 in Canvas 5pm EST; see the STAT 234 Project Evaluation in Canvas Files

Grading: Course grades will be based on a weighted average of quizzes (30%), participation (10%), final project (60%). The 60% credit for the project will be split as follows:

a. Project proposal (5%)

b. In-person poster presentation on 04/24 (5%)

c. Poster (10%)

d. Project final report (40%)

See the file STAT 234 Project Evaluation in Canvas Files for how project grades are determined.

Quizzes: The quiz is about the assigned reading and/or prior class material. Assigned Readings are provided above in the Course Outline. To help with various circumstances (expected or unexpected), your lowest three (3) quizzes will be dropped. Monday’s Quiz is available on Canvas starting at 1:30pm EST on Sunday and closing at 1:30pm EST on Monday at the beginning of class: similarly, Wednesday’s Quiz is available on Canvas starting at 1:30pm EST on Tuesday and closing at 1:30pm EST on Wednesday at the beginning of class. Once you start the quiz, you have 30 minutes to complete it. Collaboration on Quizzes is not permitted. Use of generative AI is not permitted.

Projects: An important component of the course is a final project which can either be a survey of some actively developing sub-topic within sequential decision making or a research project involving contributing novel research (theoretical result, statistical method, computational algorithm) to the area of sequential decision making. Example projects from prior years are here. Note that projects need not pertain to digital health.

Surveys must be written individually. However, teams of up to 2 students can be formed for a research project. To get full credit, surveys must be very high quality: they should be like a publishable survey article in a top journal. The bar for research projects will be lower because of the time constraint and the inherent uncertainty in the research process. While you’re not required to deliver publication quality research work by the end of the semester, you are encouraged to do so. We will provide some suggestions for research projects, but you should feel free to work on ANY problem in sequential decision making that interests you. The papers must be written according to the submission rules at ICML: https://icml.cc/Conferences/2023/StyleAuthorInstructions. It is easiest to use Latex with the style files ICML provides. These are 8-page papers.

Using generative AI tools such as ChatGPT to help with your project is allowed. In my own testing of ChatGPT, it often has good points but makes many mistakes. It could be a useful tool for suggesting ideas (and to chat about the material with or compare answers with) but it is error-prone (particularly concerning technical/mathematical material). The references are often completely fictious. Furthermore, working hard on the project is crucial for learning and to help you grow in independence. So even if ChatGPT did excellent writing ICML-style papers, relying on it too much would be harmful for you. In any case, your project and poster must reflect your own understanding of the material, explained in your own way, rather than being copied from any other source.

Posters: On 4/24 we will hold a poster session during class. Your poster should provide a summary of your project. Posters will be due in Canvas at 5pm on Tuesday 4/23. An example poster can be found here. Here is a pptx format for the poster. You should make 6 highly informative (non-dense) slides and display them in 2 x 3 or 3 x 2 format as in this pptx file. Other creative formats are also welcome.

Breakout Groups and Participation: Active participation is expected, through attending class (Monday & Wednesdays), completing quizzes and engaging in classroom discussions/breakout groups. Every two weeks you will be assigned a new breakout group. The first breakout groups will be formed Monday January 22, then Feb. 2, Feb 16, March 1, March 22 and April 5. Stat 234 is a challenging course covering subtle concepts, so let's all try to help create a supportive, collaborative community.

Accommodations: Harvard University’s goal is to remove barriers for disabled students related to inaccessible elements of instruction or design in this course. If reasonable accommodations are necessary to provide access, please contact the Disability Access Office (DAO). Accommodations do not alter fundamental requirements of the course and are not retroactive. Students should request accommodations as early as possible, since they may take time to implement. Students should notify DAO at any time during the semester if adjustments to their communicated accommodation plan are needed.

Where to get tech help: The Academic Resources Center has resources. For tech help you can chat with Daiqi or Ziping to see if either of them can help and/or you can call the HUIT help desk.