CS 229r: Algorithms for Big Data
Prof. Jelani Nelson TF: Jarosław Błasiok
[Home] [Lectures]
[Assignments] [Project]
Big data is data so
large that it does not fit in the main memory of a single
machine, and the need to process big data by efficient algorithms arises in Internet search, network traffic
monitoring, machine learning, scientific computing, signal
processing, and several other areas. This course will cover
mathematically rigorous models for developing such algorithms, as well as some provable limitations of
algorithms operating in those models. Some topics we will cover
include:
 Sketching and Streaming. Extremely smallspace data
structures that can be updated on the fly in a fastmoving stream
of input.
 Dimensionality reduction. General techniques and
impossibility results for reducing data dimension while still
preserving geometric structure.
 Numerical linear algebra. Algorithms for big matrices
(e.g. a user/product rating matrix for Netflix or
Amazon). Regression, low rank approximation, matrix completion,
...
 Compressed sensing. Recovery of (approximately) sparse
signals based on few linear measurements.
 External memory and cacheobliviousness. Algorithms
and data structures minimizing I/Os for data not fitting on
memory but fitting on disk. Btrees, buffer trees, multiway
mergesort, ...
This course is intended for both graduate students and advanced undergraduate students satisfying the below prerequisites.
Announcements

Course now has a Piazza site.

The course time and room have changed; see below.

Email us at
cs229rf15staff@seas.harvard.edu
to be added to the course mailing list.

Office hours: Mondays 46pm, Maxwell Dworkin 125 (Jelani).
Fridays 24pm, Maxwell Dworkin 138 (Jarosław).
Specifics
 Lecture time: Tuesday & Thursday
11:30am–1pm
 First lecture: Thursday, September 3, 2015
 Lecture room: MaxwellDworkin G115
 Harvard College/GSAS Catalog Number: 3730
 Contact: Email
cs229rf15staff@seas.harvard.edu
Prerequisites
Mathematical maturity and comfort with algorithms (e.g. CS 124), discrete probability, and linear algebra.
Grading
 Scribing lectures (10%). See
lectures page.
 Homework (40%). See
assignments page.
 Final project, paper (40%) and presentation (10%).
See project page.
Homework solutions, scribe notes, and final projects must be
typeset in LaTeX. If you are not familiar with LaTeX, see
this introduction. The lecture and assignment pages also have
templates to get you started.
Textbook
There is no textbook for this class (we will rely on our scribe
notes). Also, here is a(n incomplete) list of courses with scribe notes for
overlapping material taught at other institutions:
This website's layout and some course policies have been borrowed
from this course.