Andrew Miller

I am a Ph.D. candidate in computer science at Harvard's School of Engineering and Applied Sciences, focusing on statistics and machine learning.
My research focuses on probabilistic modeling and approximate inference methods with applications ranging from astronomy to basketball. I am a member of the Harvard Intelligent Probabilistic Systems (HIPS) group, advised by Ryan Adams, and I also work closely with Luke Bornn in the statistics department.


  • A Gaussian Process Model of Quasar Spectral Energy Distributions

    Andrew Miller, Albert Wu, Jeffrey Regier, Jon McAuliffe, Dustin Lang, Mr Prabhat, David Schlegel, Ryan Adams

    Neural Information Processing Systems 2015 [pdf]

    We propose a method for combining two sources of astronomical data, spectroscopy and photometry, which carry information about sources of light (e.g., stars, galaxies, and quasars) at extremely different spectral resolutions. Our model treats the spectral energy distribution (SED) of the radiation from a source as a latent variable, hierarchically generating both photometric and spectroscopic observations. We place a flexible, nonparametric prior over the SED of a light source that admits a physically interpretable decomposition, and allows us to tractably perform inference. We use our model to predict the distribution of the redshift of a quasar from five-band (low spectral resolution) photometric data, the so called "photo-z" problem. Our method shows that tools from machine learning and Bayesian statistics allow us to leverage multiple resolutions of information to make accurate predictions with well-characterized uncertainties.
  • Advances in nowcasting influenza-like illness rates using search query logs

    Vasileios Lampos, Andrew Miller, Steve Crossan, and Christian Stefansen

    Scientific Reports [pdf]

    This paper presents an improvement on the Google Flu Trends model, an epidemiological surveillance tool for measuring the current rate of influenza like illness (ILI) in the population. These methods relate patterns in user search queries to historical influenza estimates to obtain real-time ILI estimates. We develop a non-linear model based on Gaussian processes and a family of autoregressive models. We compare it to many already proposed methods, assessing predictive performance over five years of flu seasons, 2008-2013, and show that it obtains state of the art predictive performance.
  • Celeste: Variational inference for a generative model of astronomical images

    Jeffrey Regier, Andrew Miller, Jon McAuliffe, Ryan Adams, Matt Hoffman, Dustin Lang, David Schlegel, Mr Prabhat

    Proceedings of The 32nd International Conference on Machine Learning, pp. 2095–2103, 2015 [link]

    We present a new, fully generative model of optical telescope image sets, along with a variational procedure for inference. Each pixel intensity is treated as a Poisson random variable, with a rate parameter dependent on latent properties of stars and galaxies. Key latent properties are themselves random, with scientific prior distributions constructed from large ancillary data sets. We check our approach on synthetic images. We also run it on images from a major sky survey, where it exceeds the performance of the current state-of-the-art method for locating celestial bodies and measuring their colors.
  • Characterizing the Spatial Structure of Defensive Skill in Professional Basketball

    Alexander Franks, Andrew Miller, Luke Bornn, and Kirk Goldsberry

    The Annals of Applied Statistics, [AoAS] [arxiv]

    We develop a spatial model to analyze the defensive ability of professional basketball players. We first define two preprocessing steps to find a representation of players and posessions, and then we define a parametric model with effects that correspond to interpretable defensive ability.
  • Counterpoints: Advanced Defensive Metrics for NBA Basketball

    Alexander Franks*, Andrew Miller*, Luke Bornn, and Kirk Goldsberry

    MIT Sloan Sports Analytics Conference, 2015 [pdf] [talk]

    best paper award. press:

    This paper describes some advanced defensive metrics for NBA basketball, derived from player tracking data. We use the who's guarding whom model from this paper to define a new suite of metrics designed to measure how suppressive and disruptive players are on average, and throughout the entire possession.
  • Factorized Point Process Intensities: A Spatial Analysis of Professional Basketball

    Andrew Miller, Luke Bornn, Ryan Adams and Kirk Goldsberry

    International Conference on Machine Learning (ICML), 2014 [arxiv]

    We develop a dimensionality reduction method that can be applied to collections of point processes on a common space. Using this representation, we analyze the shooting habits of professional basketball players, create a new characterization of offensive player types and model shooting efficiency.
  • A Heterogeneous Framework for Large-Scale Dense 3-d Reconstruction from Aerial Imagery

    Andrew Miller, Vishal Jain and Joseph L. Mundy

    IEEE Transactions on Parallel and Distributed Systems (submitted for review)

    This paper presents a scalable system of multiple GPUs and CPUs to reconstruct dense 3-d models. This is a continuation Miller 2011 (which constructed models of size ~ 1 billion voxels) that extends the system to models in the 50-100 billion voxel range. Results are shown for building a 3-d model of an area of about 2 square kilometers (< 1 meter resolution) represented by 50 billion voxels over 4 GPUs in near real-time.
  • A Multi-sensor Fusion Framework in 3-D

    Vishal Jain, Andrew Miller and Joseph L. Mundy

    2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) pdf

    This paper presents a system that fuses both optical and infrared imagery to build a volumetric model. We develop a technique to tightly register multiple volumetric models, and show the benefits of the multi-modal datasource by developing classifiers to label high level features of the landscape (road, sidewalk, pavement, buildings, etc.).
  • Real-time rendering and dynamic updating of 3-d volumetric data

    Andrew Miller, Vishal Jain and Joseph L. Mundy

    Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, ASPLOS 2011 pdf

    We develop and optimize a parallel ray tracing-inspired algorithm for both constructing and rendering a high fidelity 3-d volumetric model from aerial imagery. This paper goes over the engineering effort to gain an 800x speedup over serial implementations using a single gpu.


  • pydtw

    Simple, lightweight dynamic time warping implementation (and visualization) in numpy/python/cython.

  • CelestePy

    A python module for astronomical source discovery and classification.

  • Sampyl

    Sampyl is a package for sampling from probability distributions using MCMC methods. Similar to PyMC3 using theano to compute gradients, Sampyl uses autograd to compute gradients. However, you are free to write your own gradient functions, autograd is not necessary. This project was started as a way to use MCMC samplers by defining models purely with Python and numpy.


(in progress)