I have just added an extra open problem (4.6.) to the fourth set of lecture notes. I am documenting it here.

# 18.S096: Community dection and the Stochastic Block Model

A new set of lecture notes is available here about community detection and recovery in the stochastic block model, including five open problems . As usual, I will document the open problems here, while referring a much more detailed description of the problems on the notes, including description of partial progress.

Continue reading 18.S096: Community dection and the Stochastic Block Model

# 18.S096: Group Testing and Error-Correcting Codes

A new set of Lecture notes is available here. These ones are about group testing and contain a very brief “crash-course” on error-correction codes. They also include five open problems. As usual, I will document the open problems here, while referring a much more detailed description of the problems on the notes.

Continue reading 18.S096: Group Testing and Error-Correcting Codes

# 18.S096: Johnson-Lindenstrauss Lemma and Gordon’s Theorem

The fifth set of Lecture notes for my course is available here. They are about dimension reduction, Johnson-Lindenstrauss Lemma and Gordon’s Escape Through a Mesh Theorem, it also includes three open problems. As usual, I will document the open problems here, while referring a much more detailed description of the problems on the notes, including description of partial progress.

Continue reading 18.S096: Johnson-Lindenstrauss Lemma and Gordon’s Theorem

# 18.S096: Concentration Inequalities, Scalar and Matrix Versions

The fourth set of Lecture notes for my course is available here. This one is large deviation and concentration inequalities, for sums of independent scalar or matrices random variables. It also has 5 open problems related to problems involving concentration of certain random matrices. As usual, I will document the open problems here, while referring a much more detailed description of the problems on the notes, including description of partial progress.

Continue reading 18.S096: Concentration Inequalities, Scalar and Matrix Versions

# 18.S096: Spectral Clustering and Cheeger’s Inequality

A new set of Lecture notes for my course is available here. This one is about spectral clustering and Cheeger’s Inequality. In a nutshell spectral clustering can be seen as attempting to cluster a graph by clustering the corresponding points of its Diffusion Maps embedding and Cheeger’s Inequality provides a guarantee of performance (for the case of two clusters). Take a look!

As usual, I will document the open problems here. I remind that there is a much more detailed description of the problems on the notes, including description of partial progress.

Continue reading 18.S096: Spectral Clustering and Cheeger’s Inequality

# 18.S096: Graphs, Diffusion Maps, and Semi-supervised Learning

The second set of Lecture notes for my course is now available here. This week’s notes are about graphs, embedding of graphs in Euclidean space (focusing in Diffusion Maps) and relations between behavior of a graph based semi-supervised learning method and Sobolev Embedding Theorem. Given the nature of these topics, these notes have a lot more images than normal, take a look!

The notes also describe three open problems that I would like to document here, there is a much more detailed description of the problems on the notes, including description of partial progress.

Continue reading 18.S096: Graphs, Diffusion Maps, and Semi-supervised Learning

# 18.S096: Principal Component Analysis in High Dimensions and the Spike Model

I just made available here the first lecture note for my course this fall. It includes three open problems that I will present below. In short, the content of the lecture is as follows:

Continue reading 18.S096: Principal Component Analysis in High Dimensions and the Spike Model

# 10 Lectures and 42 Open Problems in Mathematics of Data Science

This upcoming fall, I am teaching a special topics course at the Math Department in MIT, called Topics in Mathematics of Data Science. This will be a mostly self-contained research-oriented course focusing on the theoretical aspects of algorithms that aim to extract information from data.

I have divided the content of the class in ten topics (or “lectures”), I’ll describe them below. The biggest novelty perhaps is that I have decided to present a number of * open problems* on each of these lectures. Given that this list of problems (and their description) may be of interest to the readers of this blog, I plan to include short versions of the lecture notes as blog posts (linking to the proper lecture notes) and include a description of a total of forty open problems over the course of ten future posts. I am hoping interesting discussions about some of these problems arise from comments on these posts!

This “post zero” serves as an announcement for the class (*if you are a student at MIT, think about taking the class!*) and a warm-up for the open problems, I am including two below. But first, the content of the class:

Continue reading 10 Lectures and 42 Open Problems in Mathematics of Data Science

# Courant Institute of Mathematical Sciences

Finally, I am no longer in the job market! I am excited to announce that I will join the Courant Institute of Mathematical Sciences as an Assistant Professor in the Department of Mathematics with a joint appointment in the Center for Data Science!

I will join Courant in the Summer of 2016, until then I am spending a year in the Department of Mathematics at MIT as an Instructor of Applied Mathematics.

Warm thanks to all the other departments that hosted me this Spring and all of the people that helped me enjoy each and every visit, rendering my final decision extremely difficult! I had an amazing time, albeit completely exhausting.