CS 229r: Algorithms for Big Data
Big data is data so
large that it does not fit in the main memory of a single
machine, and the need to process big data by efficient algorithms arises in Internet search, network traffic
monitoring, machine learning, scientific computing, signal
processing, and several other areas. This course will cover
mathematically rigorous models for developing such algorithms, as well as some provable limitations of
algorithms operating in those models. Some topics we will cover
This course is intended for both graduate students and advanced undergraduate students satisfying the below prerequisites.
- Sketching and Streaming. Extremely small-space data
structures that can be updated on the fly in a fast-moving stream
- Dimensionality reduction. General techniques and
impossibility results for reducing data dimension while still
preserving geometric structure.
- Numerical linear algebra. Algorithms for big matrices
(e.g. a user/product rating matrix for Netflix or
Amazon). Regression, low rank approximation, matrix completion,
- Compressed sensing. Recovery of (approximately) sparse
signals based on few linear measurements.
- External memory and cache-obliviousness. Algorithms
and data structures minimizing I/Os for data not fitting on
memory but fitting on disk. B-trees, buffer trees, multiway
- MapReduce/Hadoop. Mathematical models for
designing/analyzing algorithms for these systems.
- More scribe notes for each lecture here, courtesy of Sam Elder.
- The class now has a Piazza site for Q&A.
- Sign up for the course mailing list here.
Office hours: Mondays 10am-12pm, Maxwell Dworkin 125 (Jelani).
Mondays 4-6pm, Maxwell Dworkin 138 (Thomas).
- Lecture time: Tuesday & Thursday
- First lecture: Tuesday, September 3, 2013
- Lecture room: Pound Hall 201
- Harvard College/GSAS Catalog Number: 3730
- Contact: Email
Mathematical maturity and comfort with algorithms (e.g. CS 124), discrete probability, and linear algebra.
- Scribing lectures (10%). See
- Homework (40%). See
- Final project, paper (40%) and presentation (10%).
See project page.
Homework solutions, scribe notes, and final projects must be
typeset in LaTeX. If you are not familiar with LaTeX, see
this introduction. The lecture and assignment pages also have
templates to get you started.
There is no textbook for this class (we will rely on our scribe
notes). Also, here is a(n incomplete) list of courses with scribe notes for
overlapping material taught at other institutions:
This website's layout and some course policies have been borrowed
from this course.