Stat234: Sequential Decision Making



1.    Course Description


This graduate course will focus on reinforcement learning algorithms and sequential decision making methods with special attention to how these methods can be used in mobile health.Reinforcement learning is the area of machine learning which is concerned with sequential decision making.We will focus on the areas of sequential decision making that concern both how to select optimal actions as well as how to evaluate the impact of these actions.The choice of action is operationalized via a policy.A policy is a (stochastic) deterministic mapping from the available data at each time t into (a probability space over) the set of actions. We will consider both off-line and on-line methods for learning good policies.


Mobile health is an area that lies within multiple scientific disciplines including: statistical science, computer science, behavioral science and cognitive neuroscience. This makes for very exciting interdisciplinary science! Smartphones and wearable devices have remarkable sensing capabilities allowing us to understand the context in which a person is at a given moment. These devices also have the ability to deliver treatment actions tailored to the specific needs of users in a given location at a given time. Figuring out when and in which context, which treatment actions to deliver can assist people in achieving their longer term health goals.In the last 15-20 minutes of each class we will brainstorm about how the methods we discussed during that class might be useful in mobile health.


This course will cover the following topics:Markov Decision Processes, on-policy and off-policy RL and topics in RL that currently appear most useful if one is interested in mobile health (Experience Replay, Hierarchical RL).Most of these topics have to do with speeding up the rate at which we can learn a good policy.††† Other topics from RL that are important in mobile health but that we wonít cover are (you could consider in your class project): 1) transfer learning (using data on other similar users to enable faster learning); 2) non-stationarity (dealing with slowly changing or abrupt changes in user behavior); 3) interpretability of policies (enabling communication with behavioral scientists by making connections to behavioral theories); 4) using approximate system dynamic models to speed up learning and 5) Bayesian RL.


2.   Draft Course Outline


3.   Grading



4.   Teaching


Susan Murphy is Professor of Statistics and Computer Science & Radcliffe Alumnae Professor at the Radcliffe Institute, Harvard University.She earned her Ph.D. at the University of North Carolina, Chapel Hill.She is a MacArthur Fellow, a Fellow of National Academy of Sciences & a Fellow of National Academy of Medicine.Her work is focused on sequential decision making, causal inference and new ways for carrying out sequential experimentation particular focus on mobile health.Her email address is†† Susan Murphy's office hours are by appointment during 7-8pm Wednesdays and 5-5:30pm on Fridays in Science Center 703.







Our head teaching fellow is Ryan Lee. Ryan is a G2 student in the Department of Statistics, Harvard. His email address isís office hours are TBA.







5.    Postdoctoral Fellows and Graduate Students




Members of the Statistical Reinforcement Learning Lab will be giving some of the lectures.†† They include:



Walter Dempsey.Walter received his Ph.D. in Statistics at University of Chicago.His email address is









Mashfiqui (Mash) Rabbi.Mash received his Ph.D. in Information Science at Cornell University.His email address is






Peng Liao.Peng is a 4th Ph.D. graduate student at University of Michigan and an Associate in the Department of Statistics at Harvard University.His email address is






Tianchen Qian.Tianchen received his Ph.D. in Biostatistics from Johns Hopkins University.His email address is