Statistics 234
Spring,
2024
This
graduate course will focus on reinforcement learning algorithms and sequential
decision-making methods with special attention to how these methods can be used
in digital health. Reinforcement
learning (RL) is the area of machine learning which is concerned with
sequential decision making. We will
focus on the areas of sequential decision making that concern both how to
select optimal treatment actions as well as how to evaluate the impact of these
actions.
Digital
health is an area that lies within multiple scientific disciplines including
statistical science, computer science, behavioral science, and cognitive
neuroscience. This makes for very exciting interdisciplinary work! Smartphones
and wearable devices have remarkable sensing capabilities allowing us to
understand the context in which a person is at a given moment. These devices
also can deliver treatment actions tailored to the specific needs of users in a given location at a given time. Figuring out when and
in which context, which treatment actions to deliver, can assist people in
achieving their longer-term health goals.
In the last 15-20 minutes of many of the classes we will brainstorm
about how the methods we discussed during that class might be useful in digital
health.
This course
will cover the following topics: Markov
Decision Processes, on-policy and off-policy RL, least
squares methods in RL and Bayesian RL, namely posterior sampling. Most of the course will focus on Bayesian RL
via posterior sampling. Bayesian RL is particularly useful in mobile health as
posterior sampling facilitates off-policy and continual learning. The Bayesian
paradigm facilitates the use of prior data in initializing an RL
algorithm. If time permits, we will
spend some time at the end of the semester on hierarchical RL as this area
provides a way to start thinking about managing multiple types of mHealth
treatments each targeting a different reward. Other topics from statistics,
machine learning and RL that are potentially important in digital health but
that we won’t cover are (you could consider in your class project) include: 1)
transfer learning (using data on other similar users to enable faster
learning); 2) non-stationarity (dealing with slowly changing or abrupt changes
in user behavior); 3) interpretability of policies (enabling communication with
behavioral scientists by making connections to behavioral theories); 4) using
approximate system dynamic models to speed up learning, 5) multi-agent RL and 6) multi-task learning.
Professor: Susan Murphy (samurphy@g.harvard.edu).
Class Times: Monday
and Wednesday 1:30pm-2:45pm TBN.
No class 2/19, 3/11, 3/13.
TF:
Ziping
Xu (zipingxu@fas.harvard.edu)
TBN
Office
Hours:
Susan Murphy’s Office Hours: By
appointment at 5:00pm EST on Thursdays in SEC 2.335 except 3/13
Ziping
Xu’s Office
Hours: TBN
TBN’s Office Hours: TBN
Website:
Book: Sutton R. & Barto A. (2020). Reinforcement Learning:
An Introduction (2nd Edition). Cambridge: The MIT Press. No purchase is necessary; you can download a
pdf copy here.
Ch. 21 of Russell & Norvig,
(Artificial Intelligence A Modern Approach, 3rd
edition is on Canvas in the Files Section).
4th edition can be found via the Harvard
Library Hollis.
Required
Papers:
A variety of papers will be assigned; see below.
Prerequisites: Recommended
prerequisites are the equivalent of stat210 and compsci181.
Typical
Class:
1:30pm:
Finish turning in your Quiz on Canvas
1:30pm: Sit
with your group.
1:30-2:00pm: 30 Min. Lecture
2:00-2:10pm: Breakout with your group
(Discuss quiz and question posed in Lecture)
2:10-2:20pm: Class Discussion (one of
the groups leads the discussion)
2:20-2:45pm: 25 Min. Lecture
Course
Outline: This
outline will be constantly updated—please check prior to each class!
Date |
Topic |
Reading
Assignments |
|
01/22 |
Intro |
||
01/24 |
Intro |
||
01/29 |
Bandit |
Ch.
2 of Sutton & Barto |
|
01/31 |
Bandit |
Ch.
2 of Sutton & Barto |
|
02/05 |
MDPs |
Ch. 3
of Sutton & Barto |
|
02/07 |
MDPs |
Ch. 3
of Sutton & Barto Files
on Canvas: OptimalPolicyStationary.pdf and M_Z_Estimating
Functions.pdf |
|
02/12 |
MDPs |
Ch.
4-5 of Sutton & Barto File
on Canvas: M_Z_Estimating Functions.pdf |
|
02/14 |
Two
decision making problems: The learning
algorithm (Bandit alg./RL algorithm) and the policy that solves MDP |
Section 21.3 of Russell & Norvig, (Artificial Intelligence A Modern Approach, 3rd edition is on Canvas in the Files Section). Note that U is used to denote the value function, V. File
on Canvas: TemporalCreditExplorationExploitationDiscussion.pdf |
|
02/21 |
TD
Learning & Control |
Ch. 6
of Sutton & Barto. Files
on Canvas: EligibilityTracesDiscussion.pdf and M_Z_Estimating
Functions.pdf |
|
02/26 |
Batch,
Off-Policy RL |
File
on Canvas: Discount_Factor.pdf |
|
Each student must arrange a 30 min. meeting with Ziping,
TBN or Susan to discuss initial project ideas between 02/26-03/02 |
|||
02/28 |
TD
Learning |
Ch. 6
of Sutton & Barto. Files
on Canvas: EligibilityTracesDiscussion.pdf and M_Z_Estimating
Functions.pdf |
|
03/04 |
Oralytics
(???) |
An
RL algorithm for the Oralytics digital app Files
on Canvas: Slides03.07.22.Oralytics.pdf and Oralytics_Supplementary_Material.pdf
|
TBD |
03/06 |
Control Least
Squares Methods in RL |
Sections
6.4-6.6 of Sutton & Barto. Bellman
equation → LSTDQ |
|
03/18 |
LSPI |
LSPI, LSVI(only need to understand algorithm 3
in appendix) Files
on Canvas: M_Z_Estimating
Functions.pdf |
|
**Initial Project Proposal Due in Canvas 03/19, 5pmEST**; see the STAT
234 Project Evaluation in Canvas Files. |
|||
03/20 |
Thompson
Sampling. |
Tutorial on Thompson-Sampling,
(Sections 1-4, 7.1,7.5, 8.1.3, example 8.4) File on
Canvas: LSVI_Notes.pdf |
|
03/25 |
Thompson
Sampling. Connect to L_2 penalization |
Russo, Van Roy, Kazerouni,
Osband and Wen, 2017,
revised 2020 (Sections 1-4, 7.1,7.5, 8.1.3, example 8.4) |
|
03/27 |
Thompson
Sampling. Connect to L_2 penalization |
Russo, Van Roy, Kazerouni,
Osband and Wen, 2017,
revised 2020 (Sections 1-4, 7.1,7.5, 8.1.3, example 8.4) |
|
04/01 |
TBD |
|
|
**Revised Project Proposal Due in Canvas on 04/02, 5pmEST**; see the STAT
234 Project Evaluation in Canvas Files |
|||
04/03 |
TBN |
|
|
04/08 |
TBD |
|
|
04/10 |
TBD |
|
|
04/15 |
TBD |
|
|
04/17 |
TBD |
|
|
04/22 |
TBD |
|
|
Posters Due in Canvas on 04/23 at 5pm EST; see the STAT 234 Project
Evaluation in Canvas Files |
|||
04/24 |
Poster Session! |
Poster
Session at Science Center Library |
|
**Projects Due 05/02 in Canvas** 5pm EST; see the STAT 234 Project
Evaluation in Canvas Files |
Grading: Course grades will be based on a weighted average of quizzes (30%),
participation (10%), final project (60%). The 60% credit for the project will
be split as follows:
a.
Project proposal (5%)
b. In-person poster presentation on 04/24 (5%)
c. Poster (10%)
d. Project final report (40%)
See the file STAT 234 Project
Evaluation in Canvas Files for how project grades are determined.
Quizzes: The quiz is about the assigned reading and/or
prior class material. Assigned Readings
are provided above in the Course Outline. To help with various circumstances
(expected or unexpected), your lowest three (3) quizzes will be dropped. Monday’s Quiz is available on Canvas starting
at 1:30pm EST on Sunday and closing at 1:30pm EST on Monday at the beginning of
class: similarly, Wednesday’s Quiz is available on Canvas starting at 1:30pm
EST on Tuesday and closing at 1:30pm EST on Wednesday at the beginning of
class. Once you start the quiz, you have 30 minutes to complete it. Collaboration on Quizzes is not permitted.
Use of generative AI is not permitted.
Projects: An important component of the course is a final project which
can either be a survey of some actively developing sub-topic within
sequential decision making or a research project involving contributing
novel research (theoretical result, statistical method, computational
algorithm) to the area of sequential decision making. Example projects from prior years are here.
Surveys must be written individually.
However, teams of up to 2 students can be formed for a research project. To get
full credit, surveys must be very high quality: they should be like a
publishable survey article in a top journal. The bar for research projects will
be lower because of the time constraint and the inherent uncertainty in the
research process. While you’re not required to deliver publication quality
research work by the end of the semester, you are encouraged to do so. We will provide
some suggestions for research projects, but you should feel free to work on ANY
problem in sequential decision making that interests you. The papers must be written according to the
submission rules at ICML: https://icml.cc/Conferences/2023/StyleAuthorInstructions. It is easiest to use Latex
with the style files ICML provides. These are 8-page papers.
Using generative AI tools such as
ChatGPT to help with your project is allowed.
In my own testing of ChatGPT, it often has good points but makes many
mistakes. It could be a useful tool for suggesting ideas (and to chat about the
material with or compare answers with) but it is error-prone (particularly
concerning technical/mathematical material). The references are often
completely fictious. Furthermore,
working hard on the project is crucial for learning and to help you grow in
independence. So even if ChatGPT did excellent writing ICML-style papers,
relying on it too much would be harmful for you. In any case, your project and poster must reflect
your own understanding of the material, explained in your own way, rather than
being copied from any other source.
Posters:
On 4/24 we will hold a poster session during class. Your poster
should provide a summary of your project.
Posters will be due in Canvas at 5pm on Tuesday 4/23. An example poster can be found here. Here is a pptx format
for the poster.
You can make 6 highly informative (non-dense) slides and display them in 2 x 3
or 3 x 2 format as in this pptx file. Other creative formats are also welcome.
Participation: Active participation is expected, through attending class
(Monday & Wednesdays), completing quizzes and engaging in classroom
discussions. Stat 234 is a challenging
course covering subtle, so let's all try to help create a supportive,
collaborative community.
Accommodations: Students needing academic adjustments or
accommodations because of a documented disability must present to me their Faculty
Letter from the Accessible Education Office
(AEO) and
speak with me by the end of the second week of the term, (Friday, 2/2/24).
Failure to do so may result in my inability to respond in a timely manner. All
discussions will remain confidential, although I may contact AEO to discuss
appropriate implementation.
Where to get tech help: The Academic Resources
Center has resources. For tech help you can chat with Ziping to see if he can help and/or you can call the HUIT help desk.