Statistics 186
Spring,
2022
Causal
Inference concerns the very difficult, challenging problem of addressing
questions such as, "Would vaccinating children 16 and younger against
COVID 19 lead to fewer deaths among public school teachers?" and "Would
providing Harvard students access to a mobile health application designed to
help them manage school stress, lead to improved school performance?" This
class will include 4 modules. The first module introduces the nuanced
world of causal inference along with a fundamental tool: the language of
potential outcomes. The second module covers randomized experiments
and how data from randomized experiments can be used to make causal
statements. The third module introduces the rather tricky problem of using
observational (non-randomized) data to attempt to make causal
statements. The final module introduces a new and challenging area
in which the goal is to make causal inference about the effect of sequences of
treatments.
Professor: Susan Murphy (samurphy@fas.harvard.edu).
Class
Times: MW 4:30 pm-5:45 pm (EST) at the
Science Center, room 705. No class 2/21,
3/14, 3/16.
TF: Dae Woong (David) Ham (daewoongham@g.harvard.edu)
Sections
and Office Hours:
Susan Murphy’s Office Hours: 4:15pm-5:15pm EST on Thursday in 2.335 SEC except for 3/17.
David Ham’s Office Hours: 1:30pm-2:30pm EST on Monday, location is SC 706. And 3:00pm-4:00pm EST on Wednesday, location is SC 705
Course
Webpage: https://canvas.harvard.edu/courses/89128
Book: Imbens,
G., & Rubin, D. (2015). Causal Inference for Statistics, Social, and
Biomedical Sciences: An Introduction. Cambridge: Cambridge University
Press. doi:10.1017/CBO9781139025751. No purchase is necessary; you can download pdf copies of the book
chapters from the “Library Reserves” section of the Stat 186 Canvas site. You can also purchase the hard copy or an Adobe e-book at the Harvard Coop Bookstore.
Other Papers: A
variety of scientific papers will be assigned.
Recommended Texts: Hernán MA,
Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman &
Hall/CRC. See https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/
Prerequisites: Stat110, Stat111, Stat139.
Probability and statistical inference are needed extensively, and statistical
linear models are often used.
Computing
and Simulation: Some homework problems will mainly involve
statistical reasoning and probability whereas other homework problems will
require programming in R. Students may
use other software programs such as Python and Matlab,
but we will only provide support for R. I
recommend RStudio as an interface for R. Both R and RStudio are freely
available.
Typical
Class:
4:30pm: Quiz
assigned on Canvas is due
4:30pm: Sit
with your group.
4:30-5:00pm: 30 Min. Lecture
5:00-5:10pm: Breakout with your group
(Discuss quiz and questions posed in Lecture)
5:10-5:20pm: Class Discussion (one of
the groups leads the discussion)
5:20-5:45pm: 25 Min. Lecture
Topics
Covered:
Module
1:
[Potential Outcomes, Assignment Mechanisms]
Tentative
Dates: 1/24, 1/26, 1/31
This module provides the crucial
underpinning for this entire course.
This unit will assist you in thinking critically about statements made
in everyday life about cause and effect. We provide a language for
investigating causality—this language will help you translate statements such
as “Students who walk at least 10,000 steps per day are less likely to be
stressed” into mathematical statements and then if needed reframe these
statements to enhance precision. This
then allows us to be precise in how we use data to investigate causality.
Assigned
Reading Material: Chapters 1 & 3 of Imbens
and Rubin
Module
2:
[Randomized Experiments and Associated Data Analyses]
Much of our course focuses on Module 2;
Module 2 will help you understand why randomized experiments facilitate causal
inference and also this module will help you
understand how to reason about and conduct causal inference with experimental
data. This Module provides first hints
about how you might be able to conduct causal inference when you have
observational data instead of experimental data.
1. Classical
Randomized Experiments
Dates: 1/31, 2/02
This section concerns experimental
settings in which we determine the assignment mechanism, that is the probability
distribution of the randomized treatment assignment. We will learn about some of the pros and cons
of different approaches to randomization.
Assigned
Reading Material: Chapter 4 of Imbens and
Rubin.
Other
interesting material: “Statistical Properties of Randomization in Clinical
Trials” by Lachin, “Properties of Simple Randomization in Clinical Trials” by
Lachin and “Randomization in Clinical Trials: Conclusions and Recommendations”
by Lachin, Matts and Wei. These papers
are under Files on Canvas.
2.
Fisher's Approach to Causal Reasoning about Treatment Effects for the Population
of N Individuals in the Sample
Tentative Dates: 2/02, 2/07, 2/09
This
section concerns finite sample inference.
In this section you will learn about how you can conduct causal
inference about the N units (individuals) in the experiment (finite sample
inference). Randomization tests are
crucial tools.
Assigned
Reading Material: Chapter 5 of Imbens and
Rubin; Section 5.1 is critical but very dense.
I suggest reading Section 5.1 again and again as you go through the
other sections so that you gradually began to understand Section 5.1.
Other
interesting material: “Statistical Properties of Randomization in Clinical
Trials” by Lachin, “Properties of Simple Randomization in Clinical Trials” by
Lachin and “Randomization in Clinical Trials: Conclusions and Recommendations”
by Lachin, Matts and Wei. These papers
are under Files on Canvas.
3.
Neyman's Approach to to Causal Reasoning about Treatment Effects for the Population
of N Units in the Sample and it's Extension to using the Sample to Conduct Causal
Inference about Treatment Effects in a Large Population.
Tentative Dates:
02/14, 02/16, 02/23
Often we aim to use the sample of N units (individuals)
to inform decisions about a larger population (should we provide the new
take-home chemotherapy to all adolescents recovering from leukemia as opposed
to the current take-home chemotherapy?).
You will learn why this causal inference both requires more assumptions
and at the same is less restrictive than
finite sample causal inference. You will learn first statistical approaches
to conducting this type of causal inference.
Assigned
Reading Material: Chapter 6 of Imbens and
Rubin
4.
Using Regression to Conduct Causal Inference.
Tentative Dates: 02/28, 03/02, 03/07
Regression
is one of the earliest and continues to be one of the most common tools used to
conduct causal inference. In regression we often add outside knowledge about
the form of the mean of the outcome conditional on covariates. In this section you will learn how you can
use covariates to improve your ability to detect and conduct inference about
causal effects. You will learn about the
consequences of miss-specifying the regression.
Assigned
Reading Material: Chapter 7 of Imbens and
Rubin; additional reading material may be assigned.
Module
3:
[Observational Studies]
In many areas of science, experiments are
unethical, for example, we might be interested in the causal effect of parental
divorce on children’s elementary school performance. Or for monetary or societal reasons, data
from experiments is not available. These are all settings in which the
“assignment mechanism” is unknown. In
this module you will learn about first approaches to conducting causal
inference in these thorny problems.
1.
Unconfounded
Treatment Assignment.
Tentative
Dates: 03/09, 03/21, 03/23, 03/28
In this section you will learn about the
critical role the propensity score plays in conducting causal inference, in particular for use in settings in which science along
with high quality observational data can be harnessed to explain the assignment
mechanism.
Assigned
Reading Material: Chapter
12 of Imbens and Rubin;
additional reading material may be assigned.
2.
Estimating the Propensity Score.
Tentative Dates: 03/30, 04/04
In the analysis of observational data with
propensity scores you will need to estimate the propensity score. You will learn about methods for doing this and,
how to think about estimation when the goal is to reduce confounding as opposed
to fitting a good model.
Assigned Reading Material.
Chapter 13 Imbens and Rubin; additional reading
material may be assigned.
3.
Using the Propensity Score to Conduct Causal
Inference in Observational Studies.
Tentative Dates: 04/06, 04/11, 04/13, 04/18
Here we discuss how to use
the propensity score via stratification/blocking in causal inference. If time permits, we will discuss a second
approach, namely propensity score weighting.
Assigned
Reading Material: Chapter 17 Imbens and
Rubin; additional reading material may be assigned.
Module
4:
[Dynamic Treatment Regimes & Sequential Experimentation]
Tentative Date: 04/20, 04/25, 04/27
This is causal inference on steroids!! In
this module you will learn about how to reason about potential outcomes when
treatments are sequential. When treatments are sequential, it is easy for the
analysis method to accidentally introduce confounding even though the
treatments are randomized.
Assigned
Reading Material: MRTs for Developing Digital Interventions.
This paper is written for behavioral scientists.
Grading: Course grades will be based on a weighted average of homework
scores (40%), quizzes (20%), participation (10%) and a final exam on a date to
be determined (30%). Additional information about each of these components is
below. The course is letter-graded by default,
but you have my permission to switch to SAT/UNSAT grading if you prefer. If you
are considering SAT/UNSAT you should discuss it with your advisor, and check whether
it would count for what you want it to count for. A grade of SAT corresponds to
a letter grade of C- or above.
Quizzes: Prior to each class there will be a quiz on Canvas
about the assigned reading. Assigned
Readings are listed above in each section.
To help with various circumstances
(COVID, expected or unexpected), your lowest three (3) quizzes will be dropped. Quizzes for Monday are available on Canvas
starting at 4:30pm on Sunday and close at 4:30pm (EST) on Monday; similarly
quizzes for Wednesday are available on Canvas starting at 4:30pm on Tuesday and
close at 4:30pm (EST) on Wednesday; once you start the quiz you will have 10
minutes to complete it. Collaboration on Quizzes is not permitted.
Homework: Problem sets will be assigned on every other Thursday 4pm via Canvas and will be
due two weeks later on the following Thursday
at 4pmEST in Canvas. The first assignment will appear in Canvas at 4pm on 1/27 and is due in Canvas at 4pm on 2/10.
Homework must be submitted via the
Canvas course website; no submissions on paper or by email will be accepted. Your submission must be a single PDF file,
no more than 20 MB in size, except that computer code can be uploaded in a separate
supplementary file if that is more convenient for you (i.e., a
.R or .Rmd file with your R code).
The outputs from your code,
e.g., plots and summary statistics, must still be in your main PDF file. Your homework can be typeset, written using a tablet, or
scanned from handwritten work, but must be clear and easily legible (not blurry
or faint), and correctly rotated (e.g., not upside down). Always check your
submission: download it after uploading it in Canvas, and make sure that it is
the correct file and that it got uploaded successfully.
Late homework submissions are not accepted. To help with various circumstances
(COVID, expected or unexpected), your lowest two homework scores will be dropped.
Unless otherwise specified, please show
your work, simplify fully, and give clear, careful justifications for your answers (using words and sentences to explain your
logic, not just formulas).
Homework
Collaboration Policy: Beginning the first week, every other week
students are randomly divided into collaborative groups of people on Thursdays
at 4pm. This is your discussion + homework
group for next two weeks. Each student
individually submits their homework solution with a list of who was in their
assigned collaboration group. You must
write up your solutions yourself and in your own words. Copying someone
else's solution, or just making trivial changes for the sake of not copying
verbatim, is not acceptable. For example, in problems where you must
make up a story or example, two students should not have the exact same answer,
or almost the same answer except one has an example with dogs chasing cats and
the other has an example with cats chasing mice, with the same structure and
the same numbers. I
highly recommend starting problem sets early enough so that you have time to
work hard on the problems on your own first, before discussing them with your
group. But in any case, your solutions must reflect your own understanding of
the material, explained in your own way.
Participation: Active participation is expected, through attending class (Mondays
and Wednesdays), completing quizzes and engaging in discussions. Stat 186 is a
challenging course covering subtle concepts and there are further challenges
from being remote and the difficult times we are in, so let's all try to help
create a supportive, collaborative community.
Final: The Final is 9am-12noonEST on May 11. You can bring
two 8.5 by 11 sheets of notes (using both back and front) with you to the final. Otherwise the final is closed book,
no internet access and no computer access.
Accommodations: Students needing academic adjustments or
accommodations because of a documented disability must present their Faculty Letter
from the Accessible Education Office
(AEO)
and speak with the professor by the end of the second week of the term, (fill
in specific date). Failure to do so may result in the Course Head's inability to
respond in a timely manner. All discussions will remain confidential, although
Faculty are invited to contact AEO to discuss appropriate implementation.
Where to get tech help: The Academic Resources
Center
has resources. For tech help you can chat with David to see if he can help
and/or you can call the HUIT help desk.