Statistics 234
Spring,
2021
This
graduate course will focus on reinforcement learning algorithms and sequential
decision making methods with special attention to how these methods can be used
in mobile health. Reinforcement learning
is the area of machine learning which is concerned with sequential decision
making. We will focus on the areas of
sequential decision making that concern both how to select optimal treatment
actions as well as how to evaluate the impact of these actions.
Mobile
health is an area that lies within multiple scientific disciplines including:
statistical science, computer science, behavioral science and cognitive
neuroscience. This makes for very exciting interdisciplinary science!
Smartphones and wearable devices have remarkable sensing capabilities allowing
us to understand the context in which a person is at a given moment. These
devices also have the ability to deliver treatment actions tailored to the
specific needs of users in a given location at a given time. Figuring out when
and in which context, which treatment actions to deliver can assist people in
achieving their longer term health goals.
In the last 15-20 minutes of many of the classes we will brainstorm
about how the methods we discussed during that class might be useful in mobile
health.
This course
will cover the following topics: Markov
Decision Processes, on-policy and off-policy RL, least squares methods in RL
and Bayesian RL, namely posterior sampling.
Most of the course will focus on Bayesian RL via posterior sampling.
Bayesian RL is particularly useful in mobile health as posterior sampling
facilitates off-policy and continual learning. Also the Bayesian paradigm
facilitates use of prior data in initializing an RL algorithm. And hierarchical RL provides a way to start
thinking about managing multiple mHealth treatments each targeting a different
reward. Other topics from statistics, machine learning and RL that are
potentially important in mobile health but that we won’t cover are (you could
consider in your class project) include: 1) transfer learning (using data on
other similar users to enable faster learning); 2) non-stationarity (dealing
with slowly changing or abrupt changes in user behavior); 3) interpretability
of policies (enabling communication with behavioral scientists by making
connections to behavioral theories); 4) using approximate system dynamic models
to speed up learning, 5) hierarchical RL
and 6) multi-task learning.
Professor: Susan Murphy (samurphy@fas.harvard.edu).
Time
Zones: Times listed below and on the course
webpage are Eastern Time. Note that Daylight Saving Time begins on March 14 in
most of the US, and clocks in Massachusetts will move forward one hour then.
Class Times: Meeting
Time: Wednesday 12pmEST-2:45pmEST on Zoom. Class and the Chat will be
recorded. No class 3/31. On 02/10 and 02/17 we will run a
workshop. 02/10 will be a lecture
overview of AI 4 Mobile Health and 02/17 will be a practicum on AI 4 Mobile
Health.
TF: Kelly Zhang (kellywzhang@seas.harvard.edu) & Peng Liao (pengliao@g.harvard.edu)
Office
Hours:
Susan Murphy’s Office Hours: By
appointment after class
Kelly Zhang’s Office Hours: 11am-12pm on Mondays
Peng Liao’s Office Hours: 5pm-6pm
on Fridays
Course
Webpage: https://canvas.harvard.edu/courses/78908
Book: Sutton R. & Barto A. (2020). Reinforcement Learning:
An Introduction (2nd Edition). Cambridge: The MIT Press. No purchase is necessary; you can download a
pdf copy here.
Required
Papers:
A variety of scientific papers will be assigned; see below.
Prerequisites: Recommended
prerequisites are the equivalent of stat210 and compsci181.
Typical
Class:
12:00-12:10EST: Short Review. Discuss questions from prior class, Quiz
12:10-12:40EST: Lecture
12:40-12:50EST: Breakout with your
group (Discuss Quiz and question posed in
Lecture)
12:50-1:00EST: Class Discussion (one of
the groups leads the discussion)
1:00-1:10EST: Break
1:10-1:40EST: Lecture
1:40-1:50EST: Breakout with your
group (Discuss Quiz and question posed in
Lecture)
1:50-2:00EST: Class Discussion (one of
the groups leads the discussion)
2:00-2:10EST: Break
2:10-2:30EST: Lecture
2:30-2:45EST: Wrap-up
Technology
Outage: If our class suffers a Zoom technology
outage, please sign off Zoom. Wait 10
min and then sign back in. We will
attempt to resume class 10 minutes after an outage and Susan will attempt to
email all registered students. If class
has to be cancelled because Susan’s Zoom is not working, then we will hold a
makeup class on Wed 05/05 from 12:00pm-2:45pmEST. You will be notified via Canvas if this
occurs.
Course Outline: This outline will be
constantly updated—please check prior to each class!
Date |
Topic |
Reading Assignments |
|
01/27 |
Intro |
||
02/03 |
Bandit |
Ch.
1-2 of Sutton & Barto |
|
02/10 |
Lecture AI in Health Workshop |
12
noon to 3pm. This lecture was the
first of a two part workshop at AI4Health
School
|
Susan conducts the lecture |
02/17 |
Practicum AI in Health Workshop |
12
noon to 3pm. This practicum was the
second of a two part workshop at AI4Health
School |
Walter
Dempsey at University of Michigan
conducts the practicum. |
02/24 |
Bandits, MDPs
& Two decision making problems: The learning algorithm (Bandit alg./RL
algorithm) and the policy that solves MDP |
Ch. 2-3 of Sutton & Barto Ch. 21 of Russell
& Norvig, (3rd edition is on Canvas in the Files
Section). 4th edition can
be found via the Harvard Library Hollis. |
|
Each student must arrange a 30 min zoom call
with Kelly or Peng or Susan to discuss initial project ideas between
02/28-03/05 |
|||
03/03 |
MDPs |
File on Canvas: TemporalCreditExplorationExploitationDiscussion.pdf 4-6 of Sutton & Barto. Files on Canvas: M_Z_Estimating Functions.pdf |
|
03/10 |
Control |
Ch. 6 of Sutton & Barto. Files on Canvas: EligibilityTracesDiscussion.pdf and M_Z_Estimating Functions.pdf |
|
03/17 |
Least Squares Methods in RL |
Bellman equation → LSTD, LSPI,
LSVI(see
algorithm 3 in appendix) Files on Canvas:
M_Z_Estimating Functions.pdf |
|
**Initial Project Proposal Due 03/19,
5pmEST** |
|||
03/24 |
Finish LSPI. Thompson Sampling. |
LSPI, LSVI(see algorithm 3 in appendix) Russo, Van Roy, Kazerouni, Osband and Wen, 2017, revised 2020 (Sections 1-4, 7.1,7.5, 8.1.3,
example 8.4) |
|
03/31 |
WELLNESS DAY! |
|
|
04/07 |
Thompson Sampling. Connect to L_2 penalization |
Russo, Van Roy, Kazerouni, Osband and Wen, 2017, revised 2020 (Sections 1-4, 7.1,7.5, 8.1.3,
example 8.4) |
|
04/14 |
(More) Efficient RL via Posterior Sampling and On Optimistic versus Randomized
Exploration in RL |
Osband, van Roy and Russo, 2013 Osband and van Roy, 2017a (a theoretical reference
is Osband and van Roy, 2017b) |
|
**Revised Project Proposal Due 04/09,
5pmEST** |
|||
04/21 |
(More) Efficient RL via Posterior Sampling and On Optimistic versus Randomized
Exploration in RL
|
Osband, van Roy and Russo, 2013 Osband and van Roy, 2017a (a theoretical reference
is Osband and van Roy, 2017b) |
|
04/28 |
Poster Session! |
Posters are due in Canvas with a link on the google doc
by 5pmEST on 04/27; Questions are due on the google doc
by 5pmEST on 04/28 |
|
**Projects Due 05/06** 5pm EST |
Grading: Course grades will be based on a weighted average of quizzes
(20%), participation (10%), final project (70%). The 70% credit for the project
will be split as follows:
a.
Project proposal (5%)
b. Project poster presentation in gather.town on 04/28 (10%)
c. Poster (15%)
d. Project final report (40%)
Project grades will be based on:
1.
Was the problem stated clearly?
2.
In the introduction did the author(s)
clearly communicate the problem in an understandable way for non-specialists?
3.
Was there a high quality summary of
extant literature?
4.
If a review/commentary, then
a.
Did the review discuss multiple
approaches and contrast these approaches?
b.
Were the conclusions well justified
(via implementing the approaches or using theoretical arguments)
5.
If a research problem, then
a.
Was the solution stated clearly?
b.
Is the feasibility of the solution clearly
evaluated and justified (via implementing the approaches or using theoretical
arguments)?
Quizzes: The
quiz is about the assigned reading and/or prior class material. Assigned Readings are provided above in the
Course Outline. To
help with various circumstances (expected or unexpected), your lowest three (3)
quizzes will be dropped. Quizzes are
available on Canvas starting at 12:00pm EST on Tuesday and close at 12:00pm EST
on Wed. at the beginning of class. Collaboration
on Quizzes is discouraged.
Projects: An important component of the course is a final project which
can either be a survey of some actively developing sub-topic within sequential
decision making, or a detailed commentary on a classic paper on sequential
decision making, or a research project involving contributing novel research
(theoretical result, statistical method, computational algorithm) to the area
of sequential decision making. Example
projects from prior years are here.
Surveys have to be written individually.
However, teams of up to 2 students can be formed for a research project. To get
full credit, surveys have to be very high quality: they should be similar to a
publishable survey article in a top journal. The bar for research projects will
be lower because of the time constraint and the inherent uncertainty in the
research process. While you’re not required to deliver publication quality
research work by the end of the semester, you are encouraged to do so. We will
provide some suggestions for research projects but you should feel free to work
on any problem in the area of sequential decision making that interests you. The papers must be written according to the
submission rules at ICML: https://icml.cc/Conferences/2020/StyleAuthorInstructions It is easiest to use Latex
with the style files ICML provides. These are 8 page papers.
Posters:
On 4/28 we will hold a poster session during class. Your poster
should provide a summary of your project.
Posters will be due in Canvas at 5pm on Tuesday 4/27. An example poster can be found here. This poster used the
template at https://github.com/anishathalye/gemini
Participation: Active participation is expected, through attending class (Wednesdays),
completing quizzes and engaging in breakout room discussions. Unless you are
speaking please mute your microphone. Please
keep your camera on during class and section if you are comfortable doing so,
though there may be occasional privacy or bandwidth reasons for turning your
camera off. You will get more out of
Stat 234 if you close all of applications on your computer during the
class. Stat 234 is a challenging course
covering subtle concepts and there are further challenges from being remote and
the difficult times we are in, so let's all try to help create a supportive,
collaborative community.
Accommodations: Students needing academic adjustments or
accommodations because of a documented disability must present their Faculty
Letter from the Accessible Education Office
(AEO)
and speak with the professor by the end of the second week of the term, (fill
in specific date). Failure to do so may result in the Course Head's inability
to respond in a timely manner. All discussions will remain confidential,
although Faculty are invited to contact AEO to discuss appropriate
implementation.
Zoom basics and
where to get tech help: The Academic Resources
Center
has resources. For tech help you can chat with Kelly or Peng to see if they can
help and/or you can call the HUIT help desk.