Statistics 234

Spring, 2022

This graduate course will focus on reinforcement learning algorithms and sequential decision-making methods with special attention to how these methods can be used in digital health.  Reinforcement learning (RL) is the area of machine learning which is concerned with sequential decision making.  We will focus on the areas of sequential decision making that concern both how to select optimal treatment actions as well as how to evaluate the impact of these actions.   

 

Digital health is an area that lies within multiple scientific disciplines including statistical science, computer science, behavioral science and cognitive neuroscience. This makes for very exciting interdisciplinary work! Smartphones and wearable devices have remarkable sensing capabilities allowing us to understand the context in which a person is at a given moment. These devices also have the ability to deliver treatment actions tailored to the specific needs of users in a given location at a given time. Figuring out when and in which context, which treatment actions to deliver, can assist people in achieving their longer-term health goals.  In the last 15-20 minutes of many of the classes we will brainstorm about how the methods we discussed during that class might be useful in digital health.

 

This course will cover the following topics:  Markov Decision Processes, on-policy and off-policy RL, least squares methods in RL and Bayesian RL, namely posterior sampling.  Most of the course will focus on Bayesian RL via posterior sampling. Bayesian RL is particularly useful in mobile health as posterior sampling facilitates off-policy and continual learning. The Bayesian paradigm facilitates the use of prior data in initializing an RL algorithm.  If time permits, we will spend some time at the end of the semester on hierarchical RL as this area provides a way to start thinking about managing multiple types of mHealth treatments each targeting a different reward. Other topics from statistics, machine learning and RL that are potentially important in digital health but that we won’t cover are (you could consider in your class project) include: 1) transfer learning (using data on other similar users to enable faster learning); 2) non-stationarity (dealing with slowly changing or abrupt changes in user behavior); 3) interpretability of policies (enabling communication with behavioral scientists by making connections to behavioral theories); 4) using approximate system dynamic models to speed up learning,  5) multi-agent RL and 6) multi-task learning.

 

Professor: Susan Murphy (samurphy@fas.harvard.edu).

 

Class Times: Monday and Wednesday 1:30pm-2:45pm at the Science Center, room 705.  No class 2/21, 3/14, 3/16.      

 

TF:

Eura Shin (eurashin@g.harvard.edu)  

Raaz Dwivedi (dwivediraaz@gmail.com)

 

Office Hours:

Susan Murphy’s Office Hours: By appointment at 5:15pm on Thursdays in SEC 2.335

Raaz Dwivedi’s Office Hours: 3-4pm Wednesday, location SC 316.06

Eura Shin’s Office Hours:  3-4pm Monday, location SC 316.06

 

Website:

Canvas

Website

 

Book: Sutton R. & Barto A. (2020). Reinforcement Learning: An Introduction (2nd Edition). Cambridge: The MIT Press.   No purchase is necessary; you can download a pdf copy here.   

Ch. 21 of Russell & Norvig, (Artificial Intelligence A Modern Approach, 3rd edition is on Canvas in the Files Section).  4th edition can be found via the Harvard Library Hollis. 

 

Required Papers:  A variety of papers will be assigned; see below.

 

Prerequisites: Recommended prerequisites are the equivalent of stat210 and compsci181.

 

Typical Class:

1:30pm: Quiz assigned on Canvas is due

1:30pm: Sit with your group.

1:30-2:00pm: 30 Min. Lecture

2:00-2:10pm: Breakout with your group (Discuss quiz and question posed in Lecture)

2:10-2:20pm: Class Discussion (one of the groups leads the discussion)

2:20-2:45pm: 25 Min. Lecture

 

Course Outline:  This outline will be constantly updated—please check prior to each class!

Date

Topic

Reading Assignments

01/24

Intro

 Ch. 1-2 of Sutton & Barto,

Description of some mobile health studies

01/26

Bandit

Ch. 2 of Sutton & Barto

01/31

Bandit

Ch. 2 of Sutton & Barto

 

02/02

MDPs

Ch. 3 of Sutton & Barto

File on Canvas: TemporalCreditExplorationExploitationDiscussion.pdf

 

 

02/07

MDPs

Ch. 3-4 of Sutton & Barto

 

File on Canvas: M_Z_Estimating Functions.pdf

 

 

02/09

MDPs

Ch. 5 of Sutton & Barto

 

02/14

Two decision making problems:  The learning algorithm (Bandit alg./RL algorithm) and the policy that solves MDP

Maybe   Ch. 21 of Russell & Norvig, (Artificial Intelligence A Modern Approach, 3rd edition is on Canvas in the Files Section).  4th edition can be found via the Harvard Library Hollis. 

 

 

02/16

Control

 

Ch. 6 of Sutton & Barto. 

 

Files on Canvas: EligibilityTracesDiscussion.pdf  and M_Z_Estimating Functions.pdf

 

 

02/23

Least Squares Methods in RL

 

 

Bellman equation → LSTD, LSPI, LSVI(see algorithm 3 in appendix) 

Files on Canvas:  M_Z_Estimating Functions.pdf

Review of Batch RL

 

 

02/28

Least Squares Methods in RL

 

 

Bellman equation → LSTD, LSPI, LSVI(see algorithm 3 in appendix) 

Files on Canvas:  M_Z_Estimating Functions.pdf

Review of Batch RL

 

guest speaker

03/02

Finish LSPI.

Thompson Sampling.

LSPI, LSVI(see algorithm 3 in appendix)

Russo, Van Roy, Kazerouni, Osband and Wen, 2017, revised 2020 (Sections 1-4, 7.1,7.5, 8.1.3, example 8.4)

Each student must arrange a meeting with Eura, Raaz or Susan to discuss initial project ideas between 02/28-03/04

03/07

 

An RL algorithm for the Oralytics digital app

Anna Trella and Kelly Zhang

03/09

Thompson Sampling.

Russo, Van Roy, Kazerouni, Osband and Wen, 2017, revised 2020 (Sections 1-4, 7.1,7.5, 8.1.3, example 8.4)

 

03/21

Thompson Sampling. Connect to L_2 penalization

 

Russo, Van Roy, Kazerouni, Osband and Wen, 2017, revised 2020 (Sections 1-4, 7.1,7.5, 8.1.3, example 8.4)

 

 

03/23

(More) Efficient RL via

Posterior Sampling   and

On Optimistic versus Randomized Exploration

in RL

Osband, van Roy and Russo, 2013

 

Osband and van Roy, 2017a (a theoretical reference is Osband and van Roy, 2017b)

 

 

 

**Initial Project Proposal Due in Canvas 03/25, 5pmEST**

03/28

(More) Efficient RL via

Posterior Sampling   and

On Optimistic versus Randomized Exploration

in RL

Osband, van Roy and Russo, 2013

 

Osband and van Roy, 2017a (a theoretical reference is Osband and van Roy, 2017b)

 

 

03/30

(More) Efficient RL via

Posterior Sampling   and

On Optimistic versus Randomized Exploration

in RL

Osband, van Roy and Russo, 2013

 

Osband and van Roy, 2017a (a theoretical reference is Osband and van Roy, 2017b)

 

 

 

04/04

 

 

 

04/06

 

 

 

**Revised Project Proposal Due in Canvas on 04/08, 5pmEST**

04/11

 

 

 

04/13

 

 

 

 

 

04/18

 

 

 

04/20

guest speaker

04/25

 

 

 

Posters Due in Canvas on 04/26 at 5pm EST

04/27

Poster Session!

Poster Session at Science Center Library

 

**Projects Due 05/05 in Canvas** 5pm EST

 

Grading: Course grades will be based on a weighted average of quizzes (30%), participation (10%), final project (60%). The 60% credit for the project will be split as follows:                                       

a.      Project proposal (5%)

b.      In-person poster presentation on 04/27 (5%)

c.      Poster (10%)

d.      Project final report (40%)

 

Project grades will be based on:

1.      Was the problem stated clearly?

2.      In the introduction did the author(s) clearly communicate the problem in an understandable way for non-specialists? 

3.      Was there a high quality summary of literature?

4.      If a review, then

a.      Did the review discuss multiple approaches and contrast these approaches?

b.      Were the conclusions well justified (via implementing the approaches or using theoretical arguments)

5.      If a research problem, then

a.      Was the solution stated clearly?

b.      Is the feasibility of the solution clearly evaluated and justified (via implementing the approaches or using theoretical arguments)?

 

Quizzes:  The quiz is about the assigned reading and/or prior class material.  Assigned Readings are provided above in the Course Outline.  To help with various circumstances (expected or unexpected), your lowest three (3) quizzes will be dropped.  Monday’s Quiz is available on Canvas starting at 1:30pm EST on Sunday and closing at 1:30pm EST on Monday at the beginning of class: similarly, Wednesday’s Quiz is available on Canvas starting at 1:30pm EST on Tuesday and closing at 1:30pm EST on Wednesday at the beginning of class. Once you start the quiz, you have 30 minutes to complete it.   Collaboration on Quizzes is not permitted.

 

Projects: An important component of the course is a final project which can either be a survey of some actively developing sub-topic within sequential decision making or a research project involving contributing novel research (theoretical result, statistical method, computational algorithm) to the area of sequential decision making.  Example projects from prior years are here. 

 

Surveys must be written individually. However, teams of up to 2 students can be formed for a research project. To get full credit, surveys must be very high quality: they should be similar to a publishable survey article in a top journal. The bar for research projects will be lower because of the time constraint and the inherent uncertainty in the research process. While you’re not required to deliver publication quality research work by the end of the semester, you are encouraged to do so. We will provide some suggestions for research projects but you should feel free to work on any problem in the area of sequential decision making that interests you.  The papers must be written according to the submission rules at ICML: https://icml.cc/Conferences/2022/StyleAuthorInstructions.  It is easiest to use Latex with the style files ICML provides.  These are 8 page papers.

 

Posters: On 4/27 we will hold a poster session during class. Your poster should provide a summary of your project.  Posters will be due in Canvas at 5pm on Tuesday 4/27.   An example poster can be found here.  This poster used the template at https://github.com/anishathalye/gemini

 

Participation: Active participation is expected, through attending class (Monday & Wednesdays), completing quizzes and engaging in classroom discussions.   Stat 234 is a challenging course covering subtle concepts and there are further challenges due to the difficult times we are in, so let's all try to help create a supportive, collaborative community.

 

Accommodations: Students needing academic adjustments or accommodations because of a documented disability must present to me their Faculty Letter from the Accessible Education Office (AEO) and speak with me by the end of the second week of the term, (Friday, 2/4/22). Failure to do so may result in my inability to respond in a timely manner. All discussions will remain confidential, although I may contact AEO to discuss appropriate implementation.

 

Where to get tech help: The Academic Resources Center has resources. For tech help you can chat with Eura to see if she can help and/or you can call the HUIT help desk.