Yaron Singer
contact teaching papers research about

For the past few years I have been motivated by the following question: what are the powers and limitations of algorithms when the input is learned from data? This question led to independent lines of work that include exponential acceleration of optimization algorithms, optimization under noise, and robust machine learning. The description below is a summary of my research statement.

Limitations and possibilities of data-driven optimization

The traditional approach in optimization assumes there is an objective function that is either known or accessible via black-box to the algorithm designer, and the goal is to optimize the objective function under some constraints. In a routing problem, for example, we are given a weighted graph which models congestion times between intersections in a traffic network. The objective function that we aim to optimize is the time from one source in the network to a destination. When we design an algorithm to solve a routing problem we assume the objective is known.

In practice, however, we often do not actually know the true objective functions we wish to optimize since they depend on the behavior of the world generating the model. When solving routing problems in the real world, we do not know our objective function since it depends on congestion times that will realize in the future.

Since we do not know the true objective function, we learn a surrogate function from data and then run an optimization algorithm on the surrogate function and use the solution of the surrogate as the solution for the true objective function we aim to optimize. But what guarantees do we have then? In other words, can we actually optimize objective functions from the training data we use to learn them?

We initiated the study of this question a few years ago and proved a series of stark impossibilities: there are classes of objective functions that are statistically learnable, amendable to optimization, and heavily used in practice, but any algorithm that obtains a reasonable approximation guarantee to the optimal solution requires exponentially-many (in the function's dimension) training samples. This result holds for maximizing cover and submodular objectives (STOC 2017), problems whose optimal solution can be obtained in polynomial time like submodular minimization (NIPS 2017) and continuous optimization problems like convex minimization (COLT 2017).

These depressing impossibilities beg the question: when and how can we optimize objectives learned from data? To address this question, we have taken several approaches. We showed we can obtain strong approximation guarantees when we have assumptions about the objective functions such as bounded curvature (NIPS 2016), or community structure and stochastic block models in social networks (NIPS 2017). Another approach is to consider different optimization criteria to capture optimal solutions with respect to a distribution (ICML 2018). The latest approach we considered evolved into the adaptive complexity model which led to our results on exponential acceleration of combinatorial optimization algorithms.

Exponential acceleration of optimization algorithms

The questions about limitations and possibilities of data-driven optimization gave birth to a major line of work we initiated and have been pushing in the past two years. This line of work is on algorithms that achieve exponential speedups in parallel runtime for submodular maximization and other related problems. Until our work, for the past 40 years the best known parallel running time to obtain a constant-factor approximation for canonical problems in submodular maximization was linear in the size of the data. Our first work (STOC 2018) showed how to obtain strong constant factor approximations for such problems in parallel time that is logarithmic in the size of the data. Consequently, this shows that an extremeley broad range of optimization problems can now be solved exponentially faster.

Following this work we improved the approximation guarantee (ICML 2018) and later showed how to obtain optimal approximation guarantees with exponential acceleration for cannoical problems of submodular maximization under cardinality constaints (SODA 2019) and the notoriously harder problem of submodular maximization under matroid constraints (STOC 2019). We also showed that exponential acceleration with desirable approximation guarantees is achievable for problems that are not submodular like feature selection (NeurIPS 2019).

Beyond the theory, we've been pushing for the design of optimization algorithms whose actual parallel runtime on real problem instances is fast. The problem is that when we design algorithms for optimal theoretical guarantees, the algorithms typically depend on huge constants and polynomial dependencies in precision parameters that make them computationally infeasible. In our recent work we design algorithms with optimal approximation guarantees that are exponentially faster in theory and are three orders of magnitute faster than the state-of-the-art.

The line of work on adaptivity emerged from our investigation of data-driven optimization. The technical reason that we cannot optimize functions learned from data is due to the fact that the optimization algorithms become non-adaptive in this setting. That is, the algorithm is given samples of the objective function without the ability to adapt its queries to the function values it observes. The natural question was then: how many rounds of adaptivity (or samples) does an algorithm need to solve an optimization problem. Our work shows that the answer is logarithmic in the dimension of the function, and not linear as was previouly conjectured. In our original work (STOC 2018) we also proved a lower bound, showing that logarthmically-many rounds is not only sufficient but necessary to achieve reasonable (i.e. constant) approximation guarantees.

Optimization under noise

In order to understand data-driven optimization, we began exploring optimization of functions subject to noise. The motivation is simple: when we learn an objective function, we generate a noisy surrogate function which approximates the original objective function generating the data. There has been a long line of research on learning submodular functions where the goal is to an adversarially noisy surrogate of the submodular function. Since we are interested in optimizing functions learned from data, we would hope that a surrogate learned from samples can be used to optimize the original function that generates the samples.

We first showed that small adversarial errors in function evaluation makes standard optimization problems infeasible. This includes convex minimization (NIPS 2015) submodular optimization (COLT 2017), even when the error vanishes as the size of the data grows (NIPS 2016). The main result however is that if the noise is not adversarially contructed then optimal guarantees are achievable. We first showed for most regimes of the problem this with a novel combinatorial optimization technique called smoothing (COLT 2017) and later for all regimes of the problem using a new technique we call sampled mean approximation (NIPS 2018). This line of work also touch on stochastic noise, which led us to prove robust guarantees for stochastic greedy algorithms (ICML 2017).

Our work on optimization under noise also led us to investiage the fundamental problem in statisitcs of selecting a best arm in a multiarm bandit setting. The gap between the best upper and lower bound for this problem is a constant factor of around 300,000. In recent work we give a new optimal algorithm and lower bound for selecting a best arm which reduce this gap to 1.

Robust Machine Learning

Our work on noise robust optimization continued to developing machine learning models that are robust to adversarial examples. There has recently been a great deal of work showing that small, adversarial, pertubations in data can lead to failure of state-of-the-art machine learning models. Machine learning models are trained by optimizing a loss function defined by the data, and the reason the models fail when the data is pertubed is because the optimization algorithms are not robust to noise.

We showed how to use boosting to design robust classifiers (NIPS 2017) as well as for designing optimal adversarial attacks on machine learning models.

Recently we founded Robust Intelligence to confront real-world problems in this space and build scalable soltuions that can allow for safe deployment of machine learning.