Statistics 234

Spring, 2021

This graduate course will focus on reinforcement learning algorithms and sequential decision making methods with special attention to how these methods can be used in mobile health. Reinforcement learning is the area of machine learning which is concerned with sequential decision making. We will focus on the areas of sequential decision making that concern both how to select optimal treatment actions as well as how to evaluate the impact of these actions.

Mobile health is an area that lies within multiple scientific disciplines including: statistical science, computer science, behavioral science and cognitive neuroscience. This makes for very exciting interdisciplinary science! Smartphones and wearable devices have remarkable sensing capabilities allowing us to understand the context in which a person is at a given moment. These devices also have the ability to deliver treatment actions tailored to the specific needs of users in a given location at a given time. Figuring out when and in which context, which treatment actions to deliver can assist people in achieving their longer term health goals. In the last 15-20 minutes of many of the classes we will brainstorm about how the methods we discussed during that class might be useful in mobile health.

This course will cover the following topics: Markov Decision Processes, on-policy and off-policy RL, least squares methods in RL and Bayesian RL, namely posterior sampling. Most of the course will focus on Bayesian RL via posterior sampling. Bayesian RL is particularly useful in mobile health as posterior sampling facilitates off-policy and continual learning. Also the Bayesian paradigm facilitates use of prior data in initializing an RL algorithm. And hierarchical RL provides a way to start thinking about managing multiple mHealth treatments each targeting a different reward. Other topics from statistics, machine learning and RL that are potentially important in mobile health but that we won’t cover are (you could consider in your class project) include: 1) transfer learning (using data on other similar users to enable faster learning); 2) non-stationarity (dealing with slowly changing or abrupt changes in user behavior); 3) interpretability of policies (enabling communication with behavioral scientists by making connections to behavioral theories); 4) using approximate system dynamic models to speed up learning, 5) hierarchical RL and 6) multi-task learning.

Professor: Susan Murphy (samurphy@fas.harvard.edu).

Time Zones: Times listed below and on the course webpage are Eastern Time. Note that Daylight Saving Time begins on March 14 in most of the US, and clocks in Massachusetts will move forward one hour then.

Class Times: Meeting Time: Wednesday 12pmEST-2:45pmEST on Zoom. Class and the Chat will be recorded. No class 3/31. On 02/10 and 02/17 we will run a workshop. 02/10 will be a lecture overview of AI 4 Mobile Health and 02/17 will be a practicum on AI 4 Mobile Health.

TF: Kelly Zhang (kellywzhang@seas.harvard.edu) & Peng Liao (pengliao@g.harvard.edu)

Office Hours:

Susan Murphy’s Office Hours: By appointment after class

Kelly Zhang’s Office Hours: 11am-12pm on Mondays

Peng Liao’s Office Hours: 5pm-6pm on Fridays

Course Webpage: https://canvas.harvard.edu/courses/78908

Book: Sutton R. & Barto A. (2020). Reinforcement Learning: An Introduction (2^nd Edition). Cambridge: The MIT Press. No purchase is necessary; you can download a pdf copy here.

Required Papers: A variety of scientific papers will be assigned; see below.

Prerequisites: Recommended prerequisites are the equivalent of stat210 and compsci181.

Typical Class:

12:00-12:10EST: Short Review. Discuss questions from prior class, Quiz

12:10-12:40EST: Lecture

12:40-12:50EST: Breakout with your group (Discuss Quiz and question posed in Lecture)

12:50-1:00EST: Class Discussion (one of the groups leads the discussion)

1:00-1:10EST: Break

1:10-1:40EST: Lecture

1:40-1:50EST: Breakout with your group (Discuss Quiz and question posed in Lecture)

1:50-2:00EST: Class Discussion (one of the groups leads the discussion)

2:00-2:10EST: Break

2:10-2:30EST: Lecture

2:30-2:45EST: Wrap-up

Technology Outage: If our class suffers a Zoom technology outage, please sign off Zoom. Wait 10 min and then sign back in. We will attempt to resume class 10 minutes after an outage and Susan will attempt to email all registered students. If class has to be cancelled because Susan’s Zoom is not working, then we will hold a makeup class on Wed 05/05 from 12:00pm-2:45pmEST. You will be notified via Canvas if this occurs.

Course Outline: This outline will be constantly updated—please check prior to each class!

Date	Topic	Reading Assignments
01/27	Intro	Ch. 1-2 of Sutton & Barto, Description of some of mobile health studies
02/03	Bandit	Ch. 1-2 of Sutton & Barto
02/10	Lecture AI in Health Workshop	12 noon to 3pm. This lecture was the first of a two part workshop at AI4Health School	Susan conducts the lecture
02/17	Practicum AI in Health Workshop	12 noon to 3pm. This practicum was the second of a two part workshop at AI4Health School	Walter Dempsey at University of Michigan conducts the practicum.
02/24	Bandits, MDPs & Two decision making problems: The learning algorithm (Bandit alg./RL algorithm) and the policy that solves MDP	Ch. 2-3 of Sutton & Barto Ch. 21 of Russell & Norvig, (3^rd edition is on Canvas in the Files Section). 4^th edition can be found via the Harvard Library Hollis.
Each student must arrange a 30 min zoom call with Kelly or Peng or Susan to discuss initial project ideas between 02/28-03/05
03/03	MDPs	File on Canvas: TemporalCreditExplorationExploitationDiscussion.pdf 4-6 of Sutton & Barto. Files on Canvas: M_Z_Estimating Functions.pdf
03/10	Control	Ch. 6 of Sutton & Barto. Files on Canvas: EligibilityTracesDiscussion.pdf and M_Z_Estimating Functions.pdf
03/17	Least Squares Methods in RL	Bellman equation → LSTD, LSPI, LSVI(see algorithm 3 in appendix) Files on Canvas: M_Z_Estimating Functions.pdf Review of Batch RL
Initial Project Proposal Due 03/19, 5pmEST
03/24	Finish LSPI. Thompson Sampling.	LSPI, LSVI(see algorithm 3 in appendix) Russo, Van Roy, Kazerouni, Osband and Wen, 2017, revised 2020 (Sections 1-4, 7.1,7.5, 8.1.3, example 8.4)
03/31	WELLNESS DAY!
04/07	Thompson Sampling. Connect to L_2 penalization	Russo, Van Roy, Kazerouni, Osband and Wen, 2017, revised 2020 (Sections 1-4, 7.1,7.5, 8.1.3, example 8.4)
04/14	(More) Efficient RL via Posterior Sampling and On Optimistic versus Randomized Exploration in RL	Osband, van Roy and Russo, 2013 Osband and van Roy, 2017a (a theoretical reference is Osband and van Roy, 2017b)
Revised Project Proposal Due 04/09, 5pmEST
04/21	(More) Efficient RL via Posterior Sampling and On Optimistic versus Randomized Exploration in RL	Osband, van Roy and Russo, 2013 Osband and van Roy, 2017a (a theoretical reference is Osband and van Roy, 2017b)
04/28	Poster Session!	Posters are due in Canvas with a link on the google doc by 5pmEST on 04/27; Questions are due on the google doc by 5pmEST on 04/28
Projects Due 05/06 5pm EST

Grading: Course grades will be based on a weighted average of quizzes (20%), participation (10%), final project (70%). The 70% credit for the project will be split as follows:

a. Project proposal (5%)

b. Project poster presentation in gather.town on 04/28 (10%)

c. Poster (15%)

d. Project final report (40%)

Project grades will be based on:

1. Was the problem stated clearly?

2. In the introduction did the author(s) clearly communicate the problem in an understandable way for non-specialists?

3. Was there a high quality summary of extant literature?

4. If a review/commentary, then

a. Did the review discuss multiple approaches and contrast these approaches?

b. Were the conclusions well justified (via implementing the approaches or using theoretical arguments)

5. If a research problem, then

a. Was the solution stated clearly?

b. Is the feasibility of the solution clearly evaluated and justified (via implementing the approaches or using theoretical arguments)?

Quizzes: The quiz is about the assigned reading and/or prior class material. Assigned Readings are provided above in the Course Outline. To help with various circumstances (expected or unexpected), your lowest three (3) quizzes will be dropped. Quizzes are available on Canvas starting at 12:00pm EST on Tuesday and close at 12:00pm EST on Wed. at the beginning of class. Collaboration on Quizzes is discouraged.

Projects: An important component of the course is a final project which can either be a survey of some actively developing sub-topic within sequential decision making, or a detailed commentary on a classic paper on sequential decision making, or a research project involving contributing novel research (theoretical result, statistical method, computational algorithm) to the area of sequential decision making. Example projects from prior years are here.

Surveys have to be written individually. However, teams of up to 2 students can be formed for a research project. To get full credit, surveys have to be very high quality: they should be similar to a publishable survey article in a top journal. The bar for research projects will be lower because of the time constraint and the inherent uncertainty in the research process. While you’re not required to deliver publication quality research work by the end of the semester, you are encouraged to do so. We will provide some suggestions for research projects but you should feel free to work on any problem in the area of sequential decision making that interests you. The papers must be written according to the submission rules at ICML: https://icml.cc/Conferences/2020/StyleAuthorInstructions It is easiest to use Latex with the style files ICML provides. These are 8 page papers.

Posters: On 4/28 we will hold a poster session during class. Your poster should provide a summary of your project. Posters will be due in Canvas at 5pm on Tuesday 4/27. An example poster can be found here. This poster used the template at https://github.com/anishathalye/gemini

Participation: Active participation is expected, through attending class (Wednesdays), completing quizzes and engaging in breakout room discussions. Unless you are speaking please mute your microphone. Please keep your camera on during class and section if you are comfortable doing so, though there may be occasional privacy or bandwidth reasons for turning your camera off. You will get more out of Stat 234 if you close all of applications on your computer during the class. Stat 234 is a challenging course covering subtle concepts and there are further challenges from being remote and the difficult times we are in, so let's all try to help create a supportive, collaborative community.

Accommodations: Students needing academic adjustments or accommodations because of a documented disability must present their Faculty Letter from the Accessible Education Office (AEO) and speak with the professor by the end of the second week of the term, (fill in specific date). Failure to do so may result in the Course Head's inability to respond in a timely manner. All discussions will remain confidential, although Faculty are invited to contact AEO to discuss appropriate implementation.

Zoom basics and where to get tech help: The Academic Resources Center has resources. For tech help you can chat with Kelly or Peng to see if they can help and/or you can call the HUIT help desk.