CS 229r: Algorithms for Big Data
[Home] [Lectures]
[Assignments] [Project]
Big data is data so
large that it does not fit in the main memory of a single
machine, and the need to process big data by efficient algorithms arises in Internet search, network traffic
monitoring, machine learning, scientific computing, signal
processing, and several other areas. This course will cover
mathematically rigorous models for developing such algorithms, as well as some provable limitations of
algorithms operating in those models. Some topics we will cover
include:
 Sketching and Streaming. Extremely smallspace data
structures that can be updated on the fly in a fastmoving stream
of input.
 Dimensionality reduction. General techniques and
impossibility results for reducing data dimension while still
preserving geometric structure.
 Numerical linear algebra. Algorithms for big matrices
(e.g. a user/product rating matrix for Netflix or
Amazon). Regression, low rank approximation, matrix completion,
...
 Compressed sensing. Recovery of (approximately) sparse
signals based on few linear measurements.
 External memory and cacheobliviousness. Algorithms
and data structures minimizing I/Os for data not fitting on
memory but fitting on disk. Btrees, buffer trees, multiway
mergesort, ...
 MapReduce/Hadoop. Mathematical models for
designing/analyzing algorithms for these systems.
This course is intended for both graduate students and advanced undergraduate students satisfying the below prerequisites.
Announcements
 More scribe notes for each lecture here, courtesy of Sam Elder.
 The class now has a Piazza site for Q&A.
 Sign up for the course mailing list here.

Office hours: Mondays 10am12pm, Maxwell Dworkin 125 (Jelani).
Mondays 46pm, Maxwell Dworkin 138 (Thomas).
Specifics
 Lecture time: Tuesday & Thursday
11:30–1
 First lecture: Tuesday, September 3, 2013
 Lecture room: Pound Hall 201
 Harvard College/GSAS Catalog Number: 3730
 Contact: Email
cs229rf13staff@seas.harvard.edu
Prerequisites
Mathematical maturity and comfort with algorithms (e.g. CS 124), discrete probability, and linear algebra.
Grading
 Scribing lectures (10%). See
lectures page.
 Homework (40%). See
assignments page.
 Final project, paper (40%) and presentation (10%).
See project page.
Homework solutions, scribe notes, and final projects must be
typeset in LaTeX. If you are not familiar with LaTeX, see
this introduction. The lecture and assignment pages also have
templates to get you started.
Textbook
There is no textbook for this class (we will rely on our scribe
notes). Also, here is a(n incomplete) list of courses with scribe notes for
overlapping material taught at other institutions:
This website's layout and some course policies have been borrowed
from this course.