Ph.D. candidate in computer science at Harvard's School of Engineering and Applied Sciences
I work in the field of statistical machine learning, developing spatiotemporal models and approximate inference methods. My application areas range from astronomy to healthcare to sports analytics. I am a member of the Harvard Intelligent Probabilistic Systems (HIPS) group, advised by Ryan Adams (now at Princeton), and I also work with Luke Bornn in the statistics department (now at Simon Fraser). [CV]


  • Reducing Reparameterization Gradient Variance

    Andrew C. Miller, Nicholas J. Foti, Alexander D'Amour, and Ryan P. Adams
    Advances in Neural Information Processing Systems (NIPS), 2017
    [abstract] [arxiv] [code]

    Optimization with noisy gradients has become ubiquitous in statistics and machine learning. Reparameterization gradients, or gradient estimates computed via the “reparameterization trick,” represent a class of noisy gradients often used in Monte Carlo variational inference (MCVI). However, when these gradient estimators are too noisy, the optimization procedure can be slow or fail to converge. One way to reduce noise is to use more samples for the gradient estimate, but this can be computationally expensive. Instead, we view the noisy gradient as a random variable, and form an inexpensive approximation of the generating procedure for the gradient sample. This approximation has high correlation with the noisy gradient by construction, making it a useful control variate for variance reduction. We demonstrate our approach on non-conjugate multi-level hierarchical models and a Bayesian neural net where we observed gradient variance reductions of multiple orders of magnitude (20-2,000x).
  • Variational Boosting: Iteratively Refining Posterior Approximations

    Andrew C. Miller, Nicholas J. Foti, and Ryan P. Adams
    International Conference on Machine Learning (ICML), 2017
    Early version in AABI 2016 (NIPS Workshop)
    [abstract] [arxiv] [code]

    Abstract: We present a black-box variational inference (BBVI) method to approximate intractable posterior distributions with an increasingly rich approximating class. Using mixture distributions as the approximating class, we first describe how to apply the re-parameterization trick and existing BBVI methods to mixtures. We then describe a method, termed Variational Boosting, that iteratively refines an existing approximation by defining and solving a sequence of optimization problems, allowing the practitioner to trade computation time for increased accuracy.
  • Possession Sketches: Mapping NBA Strategies

    Andrew C. Miller and Luke Bornn
    MIT Sloan Sports Analytics Conference, 2017
    third place, MIT SSAC Research Paper Competition
    [abstract] [pdf]

    We develop a method for automatically organizing and exploring possessions of basketball player-tracks by offensive structure. Our method centers around building a data-driven dictionary of individual player actions, and then fitting a global hierarchical model to all possessions, yielding a concise summary, or sketch of the collective action taken by the offensive team in a basketball possession.
  • Learning a Similarity Measure for Dynamic Point Clouds

    Andrew C. Miller and Luke Bornn
    in submission

    Abstract: We develop a novel measure of similarity between two dynamic point clouds, where a dynamic point cloud is a collection of spatiotemporal trajectories representing multiple agents moving and interacting. Certain types of variation in trajectory data make this challenging; two dynamic point clouds may describe the same joint action, but sub-actions may occur at different speeds, spatial locations, or may be performed by different agents. As such, for the purposes of clustering and classification, known similarity measures fail. To solve this problem we construct a similarity measure in two parts. We first construct a novel distance metric between two sets of points (i.e. static point clouds). We then integrate this distance metric into dynamic time warping, yielding a similarity measure between dynamic point clouds. The resulting similarity measure is invariant to permutation of the agents and robust to spatiotemporal variation. Importantly, we describe how to differentiate through dynamic time warping in order to learn a similarity measure specific to an objective function. We use our method to describe the similarity of basketball sequences using player-tracking data from the National Basketball Association.
  • Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems

    Scott W. Linderman, Matthew J. Johnson, Andrew C. Miller, Ryan P. Adams, David M. Blei, and Liam Paninski
    AISTATS, 2017
    [abstract] [arxiv] [aistats]

    Abstract: Many natural systems, such as neurons firing in the brain or basketball teams traversing a court, give rise to time series data with complex, nonlinear dynamics. We can gain insight into these systems by decomposing the data into segments that are each explained by simpler dynamic units. Building on switching linear dynamical systems (SLDS), we present a new model class that not only discovers these dynamical units, but also explains how their switching behavior depends on observations or continuous latent states. These "recurrent" switching linear dynamical systems provide further insight by discovering the conditions under which each unit is deployed, something that traditional SLDS models fail to do. We leverage recent algorithmic advances in approximate inference to make Bayesian inference in these models easy, fast, and scalable.
  • A Gaussian Process Model of Quasar Spectral Energy Distributions

    Andrew C. Miller, Albert Wu, Jeffrey Regier, Jon McAuliffe, Dustin Lang, Mr Prabhat, David Schlegel, and Ryan P. Adams
    Advances in Neural Information Processing Systems (NIPS), 2015
    [abstract] [pdf]

    Abstract: We propose a method for combining two sources of astronomical data, spectroscopy and photometry, which carry information about sources of light (e.g., stars, galaxies, and quasars) at extremely different spectral resolutions. Our model treats the spectral energy distribution (SED) of the radiation from a source as a latent variable, hierarchically generating both photometric and spectroscopic observations. We place a flexible, nonparametric prior over the SED of a light source that admits a physically interpretable decomposition, and allows us to tractably perform inference. We use our model to predict the distribution of the redshift of a quasar from five-band (low spectral resolution) photometric data, the so called "photo-z" problem. Our method shows that tools from machine learning and Bayesian statistics allow us to leverage multiple resolutions of information to make accurate predictions with well-characterized uncertainties.
  • Advances in nowcasting influenza-like illness rates using search query logs

    Vasileios Lampos, Andrew C. Miller, Steve Crossan, and Christian Stefansen
    Scientific Reports, 2015
    [abstract] [pdf]

    Description: This paper presents an improvement on the Google Flu Trends model, an epidemiological surveillance tool for measuring the current rate of influenza like illness (ILI) in the population. These methods relate patterns in user search queries to historical influenza estimates to obtain real-time ILI estimates. We develop a non-linear model based on Gaussian processes and a family of autoregressive models. We compare it to many already proposed methods, assessing predictive performance over five years of flu seasons, 2008-2013, and show that it obtains state of the art predictive performance.
  • Celeste: Variational inference for a generative model of astronomical images

    Jeffrey Regier, Andrew C. Miller, Jon McAuliffe, Ryan Adams, Matt Hoffman, Dustin Lang, David Schlegel, and Mr Prabhat
    International Conference on Machine Learning, 2015
    [abstract] [link]

    Abstract: We present a new, fully generative model of optical telescope image sets, along with a variational procedure for inference. Each pixel intensity is treated as a Poisson random variable, with a rate parameter dependent on latent properties of stars and galaxies. Key latent properties are themselves random, with scientific prior distributions constructed from large ancillary data sets. We check our approach on synthetic images. We also run it on images from a major sky survey, where it exceeds the performance of the current state-of-the-art method for locating celestial bodies and measuring their colors.
  • Characterizing the Spatial Structure of Defensive Skill in Professional Basketball

    Alexander Franks, Andrew Miller, Luke Bornn, and Kirk Goldsberry
    The Annals of Applied Statistics, 2015
    [abstract] [AoAS] [arxiv]

    Description: We develop a spatial model to analyze the defensive ability of professional basketball players. We first define two preprocessing steps to find a representation of players and posessions, and then we define a parametric model with effects that correspond to interpretable defensive ability.
  • Counterpoints: Advanced Defensive Metrics for NBA Basketball

    Alexander Franks*, Andrew Miller*, Luke Bornn, and Kirk Goldsberry
    MIT Sloan Sports Analytics Conference, 2015
    best paper award, MIT SSAC Research Paper Competition
    [more] [pdf] [talk]

    press: Description: This paper develops new advanced defensive metrics for measuring the ability of professional basketball players, derived from player tracking data. We use a who's guarding whom model to define a new suite of metrics designed to measure how suppressive and disruptive players are on average, and throughout the entire possession.
  • Factorized Point Process Intensities: A Spatial Analysis of Professional Basketball

    Andrew Miller, Luke Bornn, Ryan Adams and Kirk Goldsberry
    ICML, 2014
    [more] [arxiv]

    Description: We develop a dimensionality reduction method that can be applied to collections of point processes on a common space. Using this representation, we analyze the shooting habits of professional basketball players, create a new characterization of offensive player types and model shooting efficiency.
  • A Heterogeneous Framework for Large-Scale Dense 3-d Reconstruction from Aerial Imagery

    Andrew Miller, Vishal Jain and Joseph L. Mundy

    This paper presents a scalable system of multiple GPUs and CPUs to reconstruct dense 3-D models. This is a continuation Miller 2011 (which constructed models of size ~ 1 billion voxels) that extends the system to models in the 50-100 billion voxel range. Results are shown for building a 3-d model of an area of about 2 square kilometers (< 1 meter resolution) represented by 50 billion voxels over 4 GPUs in near real-time.

    Demo rendering:
  • A Multi-sensor Fusion Framework in 3-D

    Vishal Jain, Andrew Miller and Joseph L. Mundy
    2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
    [more] [pdf]

    This paper presents a system that fuses both optical and infrared imagery to build a volumetric model. We develop a technique to tightly register multiple volumetric models, and show the benefits of the multi-modal datasource by developing classifiers to label high level features of the landscape (road, sidewalk, pavement, buildings, etc.).
  • Real-time rendering and dynamic updating of 3-d volumetric data

    Andrew Miller, Vishal Jain and Joseph L. Mundy
    Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, ASPLOS 2011
    [more] [pdf]

    We develop and optimize a parallel ray tracing-inspired algorithm for both constructing and rendering a high fidelity 3-d volumetric model from aerial imagery. This paper goes over the engineering effort to gain an 800x speedup over serial implementations using a single gpu.


  • pydtw

    Simple, lightweight dynamic time warping implementation (and visualization) in numpy/python/cython.

  • CelestePy

    A python module for astronomical source discovery and classification.

  • Sampyl

    Sampyl is a package for sampling from probability distributions using MCMC methods. Similar to PyMC3 using theano to compute gradients, Sampyl uses autograd to compute gradients. However, you are free to write your own gradient functions, autograd is not necessary. This project was started as a way to use MCMC samplers by defining models purely with Python and numpy.