Category Archives: 10 Lectures and 42 Open Problems

18.S096: Johnson-Lindenstrauss Lemma and Gordon’s Theorem

The fifth set of Lecture notes for my course is available here. They are about dimension reduction, Johnson-Lindenstrauss Lemma and Gordon’s Escape Through a Mesh Theorem, it also includes three open problems. As usual, I will document the open problems here, while referring a much more detailed description of the problems on the notes, including description of partial progress.

Continue reading 18.S096: Johnson-Lindenstrauss Lemma and Gordon’s Theorem

Advertisements

18.S096: Concentration Inequalities, Scalar and Matrix Versions

The fourth set of Lecture notes for my course is available here. This one is large deviation and concentration inequalities, for sums of independent scalar or matrices random variables. It also has 5 open problems related to problems involving concentration of certain random matrices. As usual, I will document the open problems here, while referring a much more detailed description of the problems on the notes, including description of partial progress.

Continue reading 18.S096: Concentration Inequalities, Scalar and Matrix Versions

18.S096: Spectral Clustering and Cheeger’s Inequality

A new set of Lecture notes for my course is available here. This one is about spectral clustering and Cheeger’s Inequality. In a nutshell spectral clustering can be seen as attempting to cluster a graph by clustering the corresponding points of its Diffusion Maps embedding and Cheeger’s Inequality provides a guarantee of performance (for the case of two clusters). Take a look!

As usual, I will document the open problems here. I remind that there is a much more detailed description of the problems on the notes, including description of partial progress.

Continue reading 18.S096: Spectral Clustering and Cheeger’s Inequality

18.S096: Graphs, Diffusion Maps, and Semi-supervised Learning

The second set of Lecture notes for my course is now available here.  This week’s notes are about graphs, embedding of graphs in Euclidean space (focusing in Diffusion Maps) and relations between behavior of a graph based semi-supervised learning method and Sobolev Embedding Theorem. Given the nature of these topics, these notes have a lot more images than normal, take a look!

The notes also describe three open problems that I would like to document here, there is a much more detailed description of the problems on the notes, including description of partial progress.

Continue reading 18.S096: Graphs, Diffusion Maps, and Semi-supervised Learning

10 Lectures and 42 Open Problems in Mathematics of Data Science

This upcoming fall, I am teaching a special topics course at the Math Department in MIT, called Topics in Mathematics of Data Science. This will be a mostly self-contained research-oriented course focusing on the theoretical aspects of algorithms that aim to extract information from data.

I have divided the content of the class in ten topics (or “lectures”), I’ll describe them below. The biggest novelty perhaps is that I have decided to present a number of open problems on each of these lectures. Given that this list of problems (and their description) may be of interest to the readers of this blog, I plan to include short versions of the lecture notes as blog posts (linking to the proper lecture notes) and include a description of a total of forty open problems over the course of ten future posts. I am hoping interesting discussions about some of these problems arise from comments on these posts!

This “post zero” serves as an announcement for the class (if you are a student at MIT, think about taking the class!) and a warm-up for the open problems, I am including two below. But first, the content of the class:

Continue reading 10 Lectures and 42 Open Problems in Mathematics of Data Science