CS 229r: Algorithms for Big Data
Prof. Jelani Nelson TF: Jarosław Błasiok
[Home] [Lectures]
[Assignments] [Project]
Big data is data so
large that it does not fit in the main memory of a single
machine, and the need to process big data by efficient algorithms arises in Internet search, network traffic
monitoring, machine learning, scientific computing, signal
processing, and several other areas. This course will cover
mathematically rigorous models for developing such algorithms, as well as some provable limitations of
algorithms operating in those models. Some topics we will cover
include:
- Sketching and Streaming. Extremely small-space data
structures that can be updated on the fly in a fast-moving stream
of input.
- Dimensionality reduction. General techniques and
impossibility results for reducing data dimension while still
preserving geometric structure.
- Numerical linear algebra. Algorithms for big matrices
(e.g. a user/product rating matrix for Netflix or
Amazon). Regression, low rank approximation, matrix completion,
...
- Compressed sensing. Recovery of (approximately) sparse
signals based on few linear measurements.
- External memory and cache-obliviousness. Algorithms
and data structures minimizing I/Os for data not fitting on
memory but fitting on disk. B-trees, buffer trees, multiway
mergesort, ...
This course is intended for both graduate students and advanced undergraduate students satisfying the below prerequisites.
Announcements
-
Course now has a Piazza site.
-
The course time and room have changed; see below.
-
Email us at
cs229r-f15-staff@seas.harvard.edu
to be added to the course mailing list.
-
Office hours: Mondays 4-6pm, Maxwell Dworkin 125 (Jelani).
Fridays 2-4pm, Maxwell Dworkin 138 (Jarosław).
Specifics
- Lecture time: Tuesday & Thursday
11:30am–1pm
- First lecture: Thursday, September 3, 2015
- Lecture room: Maxwell-Dworkin G115
- Harvard College/GSAS Catalog Number: 3730
- Contact: Email
cs229r-f15-staff@seas.harvard.edu
Prerequisites
Mathematical maturity and comfort with algorithms (e.g. CS 124), discrete probability, and linear algebra.
Grading
- Scribing lectures (10%). See
lectures page.
- Homework (40%). See
assignments page.
- Final project, paper (40%) and presentation (10%).
See project page.
Homework solutions, scribe notes, and final projects must be
typeset in LaTeX. If you are not familiar with LaTeX, see
this introduction. The lecture and assignment pages also have
templates to get you started.
Textbook
There is no textbook for this class (we will rely on our scribe
notes). Also, here is a(n incomplete) list of courses with scribe notes for
overlapping material taught at other institutions:
This website's layout and some course policies have been borrowed
from this course.