Statistics 234
Spring,
2022
This
graduate course will focus on reinforcement learning algorithms and sequential decision-making
methods with special attention to how these methods can be used in digital
health. Reinforcement learning (RL) is
the area of machine learning which is concerned with sequential decision
making. We will focus on the areas of
sequential decision making that concern both how to select optimal treatment
actions as well as how to evaluate the impact of these actions.
Digital
health is an area that lies within multiple scientific disciplines including
statistical science, computer science, behavioral science
and cognitive neuroscience. This makes for very exciting interdisciplinary work!
Smartphones and wearable devices have remarkable sensing capabilities allowing
us to understand the context in which a person is at a given moment. These
devices also have the ability to deliver treatment
actions tailored to the specific needs of users in a given location at a given
time. Figuring out when and in which context, which treatment actions to
deliver, can assist people in achieving their longer-term health goals. In the last 15-20 minutes of many of the
classes we will brainstorm about how the methods we discussed during that class
might be useful in digital health.
This course
will cover the following topics: Markov
Decision Processes, on-policy and off-policy RL, least
squares methods in RL and Bayesian RL, namely posterior sampling. Most of the course will focus on Bayesian RL
via posterior sampling. Bayesian RL is particularly useful in mobile health as
posterior sampling facilitates off-policy and continual learning. The Bayesian
paradigm facilitates the use of prior data in initializing an RL
algorithm. If time permits, we will spend
some time at the end of the semester on hierarchical RL as this area provides a
way to start thinking about managing multiple types of mHealth treatments each
targeting a different reward. Other topics from statistics, machine learning
and RL that are potentially important in digital health but that we won’t cover
are (you could consider in your class project) include: 1) transfer learning
(using data on other similar users to enable faster learning); 2)
non-stationarity (dealing with slowly changing or abrupt changes in user
behavior); 3) interpretability of policies (enabling communication with
behavioral scientists by making connections to behavioral theories); 4) using
approximate system dynamic models to speed up learning, 5) multi-agent RL and 6) multi-task learning.
Professor: Susan Murphy (samurphy@fas.harvard.edu).
Class Times: Monday
and Wednesday 1:30pm-2:45pm at the Science Center, room 705. No class 2/21, 3/14, 3/16.
TF:
Eura Shin (eurashin@g.harvard.edu)
Raaz
Dwivedi (dwivediraaz@gmail.com)
Office
Hours:
Susan Murphy’s Office Hours: By appointment at 5:15pm on Thursdays in SEC 2.335
Raaz Dwivedi’s Office Hours: 3-4pm Wednesday, location SC 316.06
Eura Shin’s Office Hours: 3-4pm Monday, location SC 316.06
Website:
Book: Sutton R. & Barto A. (2020). Reinforcement
Learning: An Introduction (2nd Edition). Cambridge: The MIT
Press. No purchase is necessary; you
can download a pdf copy here.
Ch. 21 of Russell & Norvig, (Artificial Intelligence A
Modern Approach, 3rd edition is on Canvas in the Files Section). 4th edition can be found via the Harvard
Library Hollis.
Required
Papers:
A variety of papers will be assigned; see below.
Prerequisites: Recommended
prerequisites are the equivalent of stat210 and compsci181.
Typical
Class:
1:30pm: Quiz
assigned on Canvas is due
1:30pm: Sit
with your group.
1:30-2:00pm: 30 Min. Lecture
2:00-2:10pm: Breakout with your group (Discuss
quiz and question posed in Lecture)
2:10-2:20pm: Class Discussion (one of
the groups leads the discussion)
2:20-2:45pm: 25 Min. Lecture
Course Outline: This outline will be
constantly updated—please check prior to each class!
Date |
Topic |
Reading Assignments |
|
01/24 |
Intro |
||
01/26 |
Intro |
||
01/31 |
Bandit |
Ch.
2 of Sutton & Barto |
|
02/02 |
Bandit |
Ch.
2 of Sutton & Barto |
|
02/07 |
MDPs |
Ch. 3 of Sutton & Barto |
|
02/09 |
MDPs |
Ch. 3 of Sutton & Barto Files on Canvas: OptimalPolicyStationary.pdf |
|
02/14 |
MDPs |
Ch. 4-5 of Sutton & Barto File on Canvas: M_Z_Estimating
Functions.pdf |
|
02/16 |
Two decision making problems: The learning algorithm (Bandit alg./RL
algorithm) and the policy that solves MDP |
Section 21.3 of
Russell & Norvig, (Artificial Intelligence A Modern Approach, 3rd edition is on Canvas in the Files
Section). Note that U is used to denote the value function, V. File on Canvas: TemporalCreditExplorationExploitationDiscussion.pdf |
|
02/23 |
TD Learning & Control |
Ch. 6 of Sutton & Barto. Files on Canvas: EligibilityTracesDiscussion.pdf and M_Z_Estimating
Functions.pdf |
|
02/28 |
Batch, Off-Policy RL |
File on Canvas: Discount_Factor.pdf |
Raaz will be our guest
speaker! |
Each student must arrange a 30 min. meeting with
Eura, Raaz or Susan to discuss initial project ideas between 02/28-03/04 |
|||
03/02 |
TD Learning |
Ch. 6 of Sutton & Barto. Files on Canvas: EligibilityTracesDiscussion.pdf and M_Z_Estimating
Functions.pdf |
|
03/07 |
Oralytics |
An RL
algorithm for the Oralytics digital app
|
Anna Trella and Kelly Zhang |
03/09 |
Control Least Squares Methods in RL |
Sections 6.4-6.6 of Sutton & Barto. Bellman equation → LSTDQ
|
|
03/21 |
LSPI |
LSPI, LSVI(only need to understand
algorithm 3 in appendix) Files on Canvas:
M_Z_Estimating Functions.pdf |
|
03/23 |
Thompson Sampling. |
Tutorial on Thompson-Sampling,
(Sections 1-4, 7.1,7.5, 8.1.3, example 8.4) File on Canvas:
LSVI_Notes.pdf |
|
**Initial Project Proposal Due in Canvas 03/25, 5pmEST**; see the STAT 234 Project Evaluation in Canvas Files. |
|||
03/28 |
Thompson Sampling. Connect to L_2 penalization |
Russo, Van Roy, Kazerouni,
Osband and Wen, 2017, revised 2020 (Sections 1-4, 7.1,7.5, 8.1.3,
example 8.4) |
|
03/30 |
Thompson Sampling. Connect to L_2 penalization |
Russo, Van Roy, Kazerouni,
Osband and Wen, 2017, revised 2020 (Sections 1-4, 7.1,7.5, 8.1.3,
example 8.4) |
|
04/04 |
Hierarchical RL |
In Canvas see the files Feudal_RL.pdf and Safe_Delivery_App.pdf
|
Eura Lectures!
|
04/06 |
Hierarchical RL |
In Canvas see the files Feudal_RL.pdf and Safe_Delivery_App.pdf
|
Eura Lectures!
|
**Revised Project Proposal Due in Canvas
on 04/08, 5pmEST**; see the STAT 234 Project Evaluation in Canvas Files |
|||
04/11 |
Bootstrapped TS |
|
|
04/13 |
Bootstrapped TS |
|
Hsyin-Yu Visits: HeartSteps
|
04/18 |
Bayesian Regret Bounds |
Osband, van Roy and Russo, 2013 In Canvas see the file: BayesRLRegret04.11.22.pdf
|
Raaz Lectures!
|
04/20 |
Bayesian Regret Bounds |
Osband, van Roy and Russo, 2013 In Canvas see the file: BayesRLRegret04.11.22.pdf
|
Raaz Lectures!
|
04/25 |
Bayesian Regret Bounds |
Osband, van Roy and Russo, 2013 In Canvas see the file: BayesRLRegret04.11.22.pdf
|
Raaz Lectures!
|
Posters Due in Canvas on 04/26 at 5pm EST; see the STAT 234 Project Evaluation in Canvas Files |
|||
04/27 |
Poster Session! |
Poster Session at Science Center Library |
|
**Projects Due 05/05 in Canvas** 5pm EST;
see the STAT 234 Project Evaluation in Canvas Files |
Grading: Course grades will be based on a weighted average of quizzes
(30%), participation (10%), final project (60%). The 60% credit for the project
will be split as follows:
a.
Project proposal (5%)
b. In-person poster presentation on 04/27 (5%)
c. Poster (10%)
d. Project final report (40%)
See the file STAT 234 Project Evaluation in Canvas Files for how project grades
are determined.
Quizzes: The
quiz is about the assigned reading and/or prior class material. Assigned Readings are provided above in the
Course Outline. To
help with various circumstances (expected or unexpected), your lowest three (3)
quizzes will be dropped. Monday’s Quiz
is available on Canvas starting at 1:30pm EST on Sunday and closing at 1:30pm
EST on Monday at the beginning of class: similarly, Wednesday’s Quiz is available
on Canvas starting at 1:30pm EST on Tuesday and closing at 1:30pm EST on
Wednesday at the beginning of class. Once you start the quiz, you have 30 minutes to complete it.
Collaboration
on Quizzes is not permitted.
Projects: An important component of the course is a final project which
can either be a survey of some actively developing sub-topic within
sequential decision making or a research project involving contributing
novel research (theoretical result, statistical method, computational
algorithm) to the area of sequential decision making. Example projects from prior years are here.
Surveys must be written individually.
However, teams of up to 2 students can be formed for a research project. To get
full credit, surveys must be very high quality: they should be similar to a publishable survey article in a top journal.
The bar for research projects will be lower because of the time constraint and
the inherent uncertainty in the research process. While you’re not required to
deliver publication quality research work by the end of the semester, you are
encouraged to do so. We will provide some suggestions for research projects but you should feel free to work on any problem in
the area of sequential decision making that interests you. The papers must be written according to the
submission rules at ICML: https://icml.cc/Conferences/2022/StyleAuthorInstructions. It is easiest to use Latex
with the style files ICML provides. These are 8 page
papers.
Posters:
On 4/27 we will hold a poster session during class. Your poster
should provide a summary of your project.
Posters will be due in Canvas at 5pm on Tuesday 4/26. An example poster can be found here. Here is a pptx format for the
poster.
You can make 6 highly informative (non-dense) slides
and display them in 2 x 3 or 3 x 2 format as in this pptx file. Other creative formats are also welcome.
Participation: Active participation is expected, through attending class (Monday
& Wednesdays), completing quizzes and engaging in classroom discussions. Stat 234 is a challenging course covering
subtle concepts and there are further challenges due to the difficult times we
are in, so let's all try to help create a supportive, collaborative community.
Accommodations: Students needing academic adjustments or
accommodations because of a documented disability must present to me their
Faculty Letter from the Accessible Education Office (AEO) and speak with me by the end of the
second week of the term, (Friday, 2/4/22). Failure to do so may result in my inability
to respond in a timely manner. All discussions will remain confidential,
although I may contact AEO to discuss appropriate implementation.
Where to get tech
help: The
Academic Resources
Center
has resources. For tech help you can chat with Eura to see if she can help
and/or you can call the HUIT help desk.