Statistics 234
Spring,
2025
This
graduate course will focus on reinforcement learning algorithms and sequential
decision-making methods with special attention to how these methods can be used
in digital health. Reinforcement learning (RL) is the area of machine learning
which is concerned with sequential decision making. We will focus on the areas
of sequential decision making that concern both how to select optimal treatment
actions as well as how to evaluate the impact of these actions.
Digital
health is an area that lies within multiple scientific disciplines including
statistical science, computer science, behavioral science, and cognitive
neuroscience. This makes for very exciting interdisciplinary work! Smartphones
and wearable devices have remarkable sensing capabilities allowing us to
understand the context in which a person is at a given moment. These devices
also can deliver treatment actions tailored to the specific needs of users in a
given location at a given time. Figuring out when and in which context, which
treatment actions to deliver, can assist people in achieving their longer-term
health goals. Across multiple classes we will brainstorm about how the methods
we discussed during that class might be useful in digital health.
This course
will cover the following topics: Markov Decision Processes, on-policy and
off-policy RL, least squares methods in RL and Bayesian RL, namely posterior
sampling. Most of the course will focus on Bayesian RL via posterior sampling.
Bayesian RL is particularly useful in digital health as posterior sampling
facilitates off-policy and continual learning. Further the Bayesian paradigm
facilitates the use of prior data in initializing an RL algorithm. At the end
of the semester we will work through a TBN theoretical paper in RL. Other
topics from statistics, machine learning and RL that are potentially important
in digital health but that we may or may not cover are (you could consider in
your class project) include: 1) transfer learning (using data on other similar
users to enable faster learning); 2) non-stationarity (dealing with slowly
changing or abrupt changes in user behavior); 3) interpretability of policies
(enabling communication with behavioral scientists by making connections to
behavioral theories); 4) using RL to train LLMs, 5) multi-agent RL and 6)
multi-task learning.
Professor: Susan Murphy (samurphy@g.harvard.edu).
Classroom and Times: Science Ctr 116: Monday and Wednesday 1:30pm-2:45pm. No class 2/17, 3/17, 3/19.
TF: Benedikt Koch benedikt_koch@g.harvard.edu
CA: Ian Moore ian_moore@college.harvard.edu
Office
Hours:
Susan's Office Hours: SEC 2.335, by
appointment only, at 9:30am on Thursdays. NO OFFICE HOURS 3/6, 3/20
Benedikt's Office Hours: Science Ctr
104, 10:30am-11:30am on Tuesdays
Ian's Office Hours: SEC, in seating
area in front of room 2.348, Tuesdays 2:15-3:15pm
Book: Sutton R. & Barto A. (2020). Reinforcement Learning:
An Introduction (2nd Edition). Cambridge: The MIT Press. No
purchase is necessary; you can download a pdf copy here.
Ch. 21 of Russell & Norvig,
(Artificial Intelligence A Modern Approach, 3rd edition is on Canvas in the
Files Section). 4th edition can be found via the Harvard Library Hollis.
Required
Papers: A variety of papers will be assigned;
see below.
Prerequisites: Recommended
prerequisites are the equivalent of stat210 and compsci181.
Typical
Class:
1:30pm:
Finish turning in your Quiz on Canvas
1:30pm: Sit with
your breakout group.
1:30-2:00pm: 30 Min. Lecture
2:00-2:10pm: Breakout with your group
(Discuss quiz and question posed in Lecture)
2:10-2:20pm: Class Discussion (one of
the groups leads the discussion)
2:20-2:45pm: 25 Min. Lecture
Course
Outline: This outline will be constantly updated please
check prior to each class!
Date |
Topic |
Reading
Assignments |
|
01/27 |
Intro |
||
01/29 |
Intro |
OptimalPolicyStationary.pdf
On Canvas under Files> Discussions and Other Helpful Material |
|
02/03 |
Bandit |
Ch.
2 of Sutton &
Barto |
|
02/05 |
Oralytics
|
Trella,
et al. (2024). Deployed Online
Reinforcement Learning Algorithm In An Oral Health Clinical Trial. RL
algorithm is here. |
|
02/10 |
Bandit |
Ch.
2 of Sutton &
Barto |
|
02/12 |
MDPs |
Ch. 3
of Sutton &
Barto |
|
02/19 |
MDPs.
Two decision making problems: The learning algorithm (Bandit alg./RL
algorithm) and the policy that solves MDP |
Ch. 4
of Sutton &
Barto TemporalCreditExplorationExploitationDiscussion.pdf
This file is on Canvas under Files> Discussions and Other Helpful Material |
|
02/24 |
MDPs |
Ch. 5
of Sutton &
Barto Files
on Canvas: OptimalPolicyStationary.pdf and M_Z_Estimating Functions.pdf |
|
02/26 |
TD
Learning & Control |
Ch. 6
of Sutton & Barto. TemporalCreditExplorationExploitationDiscussion.pdf
On Canvas under Files> Discussions and Other Helpful Material |
|
Each student must arrange a 30 min. meeting with Benedikt or Ian or Susan
to discuss initial project ideas between 02/24-03/05 |
|||
03/03 |
Oralytics
future |
Previously
deployed RL algorithm is here.
|
|
03/05 |
TD
Learning |
Ch. 6
of Sutton & Barto. EligibilityTracesDiscussion.pdf
and M_Z_Estimating
Functions.pdf On
Canvas under Files> Discussions and Other Helpful Material |
|
03/10 |
Control Least
Squares Methods in RL |
Sections
6.4-6.6 of Sutton & Barto. |
|
03/12 |
Control Least
Squares Methods in RL |
Sections
6.4-6.6 of Sutton & Barto. The file
Discount_Factor.pdf which is on Canvas under Files> Discussions and Other
Helpful Material |
|
**Initial Project Proposal Due in Canvas 03/14, 5pmEST**; see the STAT
234 Project Evaluation in Canvas Files. |
|||
03/24 |
Batch,
Off-Policy RL |
The
file Discount_Factor.pdf which is on Canvas under Files> Discussions and
Other Helpful Material Read Review of Batch RL |
|
03/26 |
Social
pJITAIs |
Reinforcement
Learning on AYA Dyads to Enhance Medication Adherence |
|
**Revised Project Proposal Due in Canvas on 03/28, 5pmEST**; see the STAT
234 Project Evaluation in Canvas Files |
|||
03/31 |
MiWaves |
reBandit: Random Effects based Online
RL algorithm for Reducing Cannabis Use |
|
04/02 |
LSPI,
More Batch, Off-Policy RL |
LSPI, LSVI (only need to understand
algorithm 3 in appendix) M_Z_Estimating
Functions.pdf On
Canvas under Files> Discussions and Other Helpful Material |
|
04/07 |
Thompson
Sampling. Connect to L_2 penalization |
Russo, Van Roy, Kazerouni, Osband
and Wen, 2017, revised 2020 (Sections 1-4, 7.1,7.5,
8.1.3, example 8.4) |
|
04/09 |
HeartSteps
pJITAI past/future |
Gao, D., Lai,
H., P. Klasnja, S. Murphy (2024). Harnessing
Causality in Reinforcement Learning With Bagged Decision Times. |
|
04/14 |
Thompson
Sampling. Connect to L_2 penalization |
Russo, Van Roy, Kazerouni, Osband
and Wen, 2017, revised 2020 (Sections 1-4, 7.1,7.5,
8.1.3, example 8.4) |
|
04/16 |
Multi-Task RL |
Distral: Robust Multitask
Reinforcement Learning (sections 1-3, 6) |
Ian Moore and Susan Murphy |
04/21 |
Multi-Task RL |
QMP: Q-switch Mixture of Policies for
Multi-Task Behavior Sharing |
Ian Moore and Susan Murphy |
04/23 |
LSPI,
Fitted Q Iteration, More Batch, Off-Policy RL |
|
|
04/28 |
Fitted
Q Iteration, More Batch, Off-Policy RL |
|
|
Posters Due in Canvas on 04/29 at 5pm EST; see the STAT 234 Project
Evaluation in Canvas Files |
|||
04/30 |
Poster Session! |
Poster
Session at Science Center Library |
|
**Projects Due 05/08 in Canvas** 5pm EST; see the STAT 234 Project
Evaluation in Canvas Files |
Grading: Course grades will be based on a weighted average of quizzes
(30%), participation (10%), final project (60%). Note that the participation
grade concerns showing up for class and participating in class discussions. The
60% credit for the project will be split as follows:
a.
Project proposal (5%)
b.
In-person poster presentation on 04/24
(5%)
c.
Poster (10%)
d.
Project final report (40%)
See the file STAT 234 Project
Evaluation in Canvas Files for how project grades are determined.
Quizzes: The quiz is about the assigned reading and/or prior class
material. Assigned Readings are provided above in the Course Outline. To help
with various circumstances (expected or unexpected), your lowest three (3)
quizzes will be dropped. Monday s Quiz is available on Canvas starting at
1:30pm EST on Sunday and closing at 1:30pm EST on Monday at the beginning of
class: similarly, Wednesday s Quiz is available on Canvas starting at 1:30pm
EST on Tuesday and closing at 1:30pm EST on Wednesday at the beginning of
class. Once you start the quiz, you have 30 minutes to complete it. Collaboration on Quizzes is not permitted.
Use of generative AI is not permitted.
Projects: An important component of the course is a final project which
can either be a survey of some actively developing sub-topic within sequential
decision making or a research project involving contributing novel research
(theoretical result, statistical method, computational algorithm) to the area
of sequential decision making. Example projects from prior years are here. Note that projects need not pertain to digital health. See the STAT 234 Project
Evaluation in Canvas Files.
Surveys must be written individually.
However, teams of up to 2 students can be formed for a research project. To get
full credit, surveys must be very high quality: they should be like a
publishable survey article in a top journal. The bar for research projects will
be lower because of the time constraint and the inherent uncertainty in the
research process. While you re not required to deliver publication quality
research work by the end of the semester, you are encouraged to do so. We will
provide some suggestions for research projects, but you should feel free to
work on ANY problem in sequential decision making that interests you. The
papers must be written according to the submission rules at ICML: https://icml.cc/Conferences/2024/AuthorInstructions.
It is easiest to use Latex with the style files ICML provides. These are 8-page papers.
Using generative AI tools such as ChatGPT
to help with your project is allowed. In my own testing of ChatGPT, it often
has good points but makes many mistakes. It could be a useful tool for
suggesting ideas (and to chat about the material with or compare answers with)
but it is error-prone (particularly concerning technical/mathematical
material). The references can be completely fictious. Furthermore, working hard
on the project is crucial for learning and to help you grow in independence. So
even if ChatGPT did excellent writing ICML-style papers, relying on it too much
would be harmful for you. In any case, your project and poster must reflect
your own understanding of the material, explained in your own way, rather than
being copied from any other source.
Posters:
On 4/30 we will hold a poster session during class. Your poster
should provide a summary of your project. Posters will be due in Canvas at 5pm
on Tuesday 4/29. An example poster can be found here. Here is a pptx format for the poster.
You should make 6 highly informative (non-dense) slides and display them in 2 x
3 or 3 x 2 format as in this pptx file. Other creative formats are also
welcome.
Breakout
Groups and Participation: Active participation is expected,
through attending class (Monday & Wednesdays), completing quizzes and
engaging in classroom discussions/breakout groups. Every two weeks you will be
assigned a new breakout group. The first breakout groups will be formed Monday
evening January 27, then Feb. 10, Feb 24, March 10, March 24 and April 7. Stat
234 covers subtle concepts, so let's all try to help create a supportive,
collaborative community.
Accommodations: Harvard University s goal is to remove barriers for disabled
students related to inaccessible elements of instruction or design in this
course. If reasonable accommodations are necessary to provide access, please
contact the Disability Access Office (DAO). Accommodations do not alter fundamental requirements of
the course and are not retroactive. Students should request accommodation as
early as possible, since they may take time to implement. Students should
notify DAO at any time during the semester if adjustments to their communicated
accommodation plan are needed.
Where to get tech help: The Academic
Resources Center has resources.
For tech help you can chat with Benedikt or Ian to see if either of them can
help and/or you can call the HUIT help
desk.