Statistics 234
Spring,
2024
This
graduate course will focus on reinforcement learning algorithms and sequential
decision-making methods with special attention to how these methods can be used
in digital health. Reinforcement
learning (RL) is the area of machine learning which is concerned with
sequential decision making. We will
focus on the areas of sequential decision making that concern both how to
select optimal treatment actions as well as how to evaluate the impact of these
actions.
Digital
health is an area that lies within multiple scientific disciplines including
statistical science, computer science, behavioral science, and cognitive
neuroscience. This makes for very exciting interdisciplinary work! Smartphones
and wearable devices have remarkable sensing capabilities allowing us to
understand the context in which a person is at a given moment. These devices
also can deliver treatment actions tailored to the specific needs of users in a
given location at a given time. Figuring out when and in which context, which
treatment actions to deliver, can assist people in achieving their longer-term
health goals. Across multiple classes we
will brainstorm about how the methods we discussed during that class might be
useful in digital health.
This course
will cover the following topics: Markov
Decision Processes, on-policy and off-policy RL, least squares methods in RL
and Bayesian RL, namely posterior sampling.
Most of the course will focus on Bayesian RL via posterior sampling.
Bayesian RL is particularly useful in mobile health as posterior sampling
facilitates off-policy and continual learning. Further the Bayesian paradigm
facilitates the use of prior data in initializing an RL algorithm. If time permits, we will spend some time at
the end of the semester on hierarchical RL as this area provides a way to start
thinking about managing multiple types of mHealth treatments each targeting a
different reward. Other topics from statistics, machine learning and RL that
are potentially important in digital health but that we won’t cover are (you
could consider in your class project) include: 1) transfer learning (using data
on other similar users to enable faster learning); 2) non-stationarity (dealing
with slowly changing or abrupt changes in user behavior); 3) interpretability
of policies (enabling communication with behavioral scientists by making
connections to behavioral theories); 4) using approximate system dynamic models
to speed up learning, 5) multi-agent RL
and 6) multi-task learning.
Professor: Susan Murphy (samurphy@g.harvard.edu).
Class Times: Monday
and Wednesday 1:30pm-2:45pm Science Center 222. No class 2/19, 3/11, 3/13.
TF:
Daiqi
Gao (dgao@fas.harvard.edu)
Ziping Xu (zipingxu@fas.harvard.edu)
Office
Hours:
Susan Murphy’s Office Hours: By
appointment at 5:15pm EST on Thursdays in SEC 2.335 except 3/14
Ziping Xu’s Office Hours: 3-4pm Mondays
in SC 316.06 except 3/11
Daiqi Gao’s Office Hours: 3:30-4:30pm on Wednesdays in SC 316.06 except
3/13
Book: Sutton R. & Barto A. (2020). Reinforcement Learning:
An Introduction (2nd Edition). Cambridge: The MIT Press. No purchase is necessary; you can download a
pdf copy here.
Ch. 21 of Russell & Norvig,
(Artificial Intelligence A Modern Approach, 3rd edition is on Canvas in the
Files Section). 4th edition can be found
via the Harvard Library Hollis.
Required
Papers:
A variety of papers will be assigned; see below.
Prerequisites: Recommended
prerequisites are the equivalent of stat210 and compsci181.
Typical
Class:
1:30pm:
Finish turning in your Quiz on Canvas
1:30pm: Sit
with your breakout group.
1:30-2:00pm: 30 Min. Lecture
2:00-2:10pm: Breakout with your group
(Discuss quiz and question posed in Lecture)
2:10-2:20pm: Class Discussion (one of
the groups leads the discussion)
2:20-2:45pm: 25 Min. Lecture
Course
Outline: This outline
will be constantly updated—please check prior to each class!
Date |
Topic |
Reading
Assignments |
|
01/22 |
Intro |
||
01/24 |
Intro |
Description of some mobile health
studies OptimalPolicyStationary.pdf On Canvas under Files> Discussions and
Other Helpful Material |
|
01/29 |
Bandit |
Ch.
2 of Sutton & Barto |
|
01/31 |
Bandit |
Ch.
2 of Sutton & Barto |
|
02/05 |
MDPs |
Ch. 3
of Sutton & Barto |
|
02/07 |
MDPs |
Ch.
3,4,5 of Sutton & Barto Files
on Canvas: OptimalPolicyStationary.pdf and M_Z_Estimating Functions.pdf |
|
02/12 |
Two
decision making problems: The learning
algorithm (Bandit alg./RL algorithm) and the policy that solves MDP |
TemporalCreditExplorationExploitationDiscussion.pdf This file is on Canvas under Files>
Discussions and Other Helpful Material |
|
02/14 |
TD
Learning & Control |
Ch. 6
of Sutton & Barto. TemporalCreditExplorationExploitationDiscussion.pdf On Canvas under Files> Discussions and
Other Helpful Material |
|
02/21 |
TD
Learning |
Ch. 6
of Sutton & Barto. EligibilityTracesDiscussion.pdf and M_Z_Estimating
Functions.pdf On
Canvas under Files> Discussions and Other Helpful Material |
|
Each student must arrange a 30 min. meeting with Daiqi, Ziping or Susan
to discuss initial project ideas between 02/26-03/02 |
|||
02/26 |
Control Least
Squares Methods in RL |
Sections
6.4-6.6 of Sutton & Barto. |
|
02/28 |
Batch,
Off-Policy RL |
The
file Discount_Factor.pdf which is on Canvas under Files> Discusssions and
Other Helpful Material Also
read Review of Batch RL |
|
03/04 |
Oralytics |
An
RL algorithm for the Oralytics digital app Oralytics
RL.pdf On Canvas under Files>
Discusssions and Other Helpful Material |
Anna
Trella |
03/06 |
LSPI,
More Batch, Off-Policy RL |
LSPI, LSVI(only need to understand
algorithm 3 in appendix) M_Z_Estimating
Functions.pdf On
Canvas under Files> Discussions and Other Helpful Material |
|
**Initial Project Proposal Due in Canvas 03/19, 5pmEST**; see the STAT
234 Project Evaluation in Canvas Files. |
|||
03/18 |
LSPI,
Fitted Q Iteration, More Batch, Off-Policy RL |
|
|
03/20 |
Fitted
Q Iteration, More Batch, Off-Policy RL |
|
|
03/25 |
Thompson
Sampling. |
Tutorial on Thompson-Sampling,
(Sections 1-4, 7.1,7.5, 8.1.3, example 8.4) File on
Canvas: LSVI_Notes.pdf |
|
03/27 |
Thompson
Sampling. Connect to L_2 penalization |
Russo, Van Roy, Kazerouni, Osband
and Wen, 2017, revised 2020 (Sections 1-4, 7.1,7.5,
8.1.3, example 8.4) |
|
04/01 |
Thompson
Sampling. Connect to L_2 penalization |
Russo, Van Roy, Kazerouni, Osband
and Wen, 2017, revised 2020 (Sections 1-4, 7.1,7.5,
8.1.3, example 8.4) |
|
**Revised Project Proposal Due in Canvas on 04/02, 5pmEST**; see the STAT
234 Project Evaluation in Canvas Files |
|||
04/03 |
Information theory viewpoint of TS |
Some
Information Theory. An
Information-Theoretic Analysis of Thompson Sampling by Russo and van Roy.
|
Ziping
Xu |
04/08 |
Information theory viewpoint of TS |
An
Information-Theoretic Analysis of Thompson Sampling by Russo and van Roy.
|
Ziping
Xu |
04/10 |
Carryover from 4/08+ projects |
Be
prepared to be randomly selected to discuss your project plans!!! |
|
04/15 |
Information theory & RL |
Ziping
Xu |
|
04/17 |
MiWaves |
Susobhan
Ghosh |
|
04/22 |
Information theory, Satisficing actions |
Ziping
Xu |
|
Posters Due in Canvas on 04/23 at 5pm EST; see the STAT 234 Project
Evaluation in Canvas Files |
|||
04/24 |
Poster Session! |
Poster
Session at Science Center Library |
|
**Projects Due 05/02 in Canvas** 5pm EST; see the STAT 234 Project
Evaluation in Canvas Files |
Grading: Course grades will be based on a weighted average of quizzes
(30%), participation (10%), final project (60%). The 60% credit for the project
will be split as follows:
a.
Project proposal (5%)
b.
In-person poster presentation on 04/24
(5%)
c.
Poster (10%)
d.
Project final report (40%)
See the file STAT 234 Project
Evaluation in Canvas Files for how project grades are determined.
Quizzes: The quiz is about the
assigned reading and/or prior class material.
Assigned Readings are provided above in the Course Outline. To help with various circumstances (expected
or unexpected), your lowest three (3) quizzes will be dropped. Monday’s Quiz is available on Canvas starting
at 1:30pm EST on Sunday and closing at 1:30pm EST on Monday at the beginning of
class: similarly, Wednesday’s Quiz is available on Canvas starting at 1:30pm
EST on Tuesday and closing at 1:30pm EST on Wednesday at the beginning of
class. Once you start the quiz, you have 30 minutes to complete it. Collaboration
on Quizzes is not permitted. Use of
generative AI is not permitted.
Projects: An important component of the course is a final project which
can either be a survey of some actively developing sub-topic within
sequential decision making or a research project involving contributing
novel research (theoretical result, statistical method, computational
algorithm) to the area of sequential decision making. Example projects from prior years are here. Note that projects
need not pertain to digital health.
Surveys must be written individually.
However, teams of up to 2 students can be formed for a research project. To get
full credit, surveys must be very high quality: they should be like a
publishable survey article in a top journal. The bar for research projects will
be lower because of the time constraint and the inherent uncertainty in the
research process. While you’re not required to deliver publication quality
research work by the end of the semester, you are encouraged to do so. We will
provide some suggestions for research projects, but you should feel free to
work on ANY problem in sequential decision making that interests you. The papers must be written according to the
submission rules at ICML: https://icml.cc/Conferences/2023/StyleAuthorInstructions. It is easiest to use Latex with the style files ICML provides. These are
8-page papers.
Using generative AI tools such as
ChatGPT to help with your project is allowed.
In my own testing of ChatGPT, it often has good points but makes many
mistakes. It could be a useful tool for suggesting ideas (and to chat about the
material with or compare answers with) but it is error-prone (particularly
concerning technical/mathematical material). The references are often
completely fictious. Furthermore,
working hard on the project is crucial for learning and to help you grow in
independence. So even if ChatGPT did excellent writing ICML-style papers,
relying on it too much would be harmful for you. In any case, your project and poster must
reflect your own understanding of the material, explained in your own way, rather
than being copied from any other source.
Posters:
On 4/24 we will hold a poster session during class. Your poster
should provide a summary of your project.
Posters will be due in Canvas at 5pm on Tuesday 4/23. An example poster can be found here. Here is a pptx format
for the poster.
You should make 6 highly informative (non-dense) slides and display them in 2 x
3 or 3 x 2 format as in this pptx file. Other creative formats are also
welcome.
Breakout
Groups and Participation: Active participation is expected,
through attending class (Monday & Wednesdays), completing quizzes and
engaging in classroom discussions/breakout groups. Every two weeks you will be assigned a new
breakout group. The first breakout
groups will be formed Monday January 22, then Feb. 2, Feb 16, March 1, March 22
and April 5. Stat 234 is a challenging
course covering subtle concepts, so let's all try to help create a supportive,
collaborative community.
Accommodations: Harvard University’s goal is to remove barriers for disabled
students related to inaccessible elements of instruction or design in this
course. If reasonable accommodations are necessary to provide access, please
contact the Disability Access Office (DAO). Accommodations do not alter fundamental requirements of
the course and are not retroactive. Students should request accommodations as
early as possible, since they may take time to implement. Students should
notify DAO at any time during the semester if adjustments to their communicated
accommodation plan are needed.
Where to get tech help: The Academic
Resources Center has resources.
For tech help you can chat with Daiqi or Ziping to see if either of them can
help and/or you can call the HUIT help
desk.