I just made available here the first lecture note for my course this fall. It includes three open problems that I will present below. In short, the content of the lecture is as follows:

It starts with a description (and derivation) of Principal Component Analysis and follows by (using Random Matrix Theory) attempting to understand the performance of Principal Component Analysis by looking into the distribution of the spectrum of the the sample covariance matrix when the true distributional covariance is the identity, arriving at the Marchenko-Pastur distribution.

A natural question is then what happens if the distributional covariance matrix has a low dimensional structure, and a particularly simple such example is a rank 1 spike . We derive, in that case, for which values of we expect to see also a spike on the eigenvalue distribution of the sample covariance matrix (an eigenvalue popping out of the support of the Marchenko-Pastur), and realize that there is a critical value of for which this starts happening (this is generally referred to as the BPP transition).

Take a look at it here!

Now for the Open problems:

**Open Problem 1.1.**

A problem by Mallat and Zeitouni that I discussed on this blog, see here.

**Open Problem 1.2.**

A problem regarding the monotonicity of the average singular value of a Gaussian matrix, also discussed previously in this blog, see here.

**Open Problem 1.3.**

**What follows is a problems posed by Andrea Montanari and Subhabrata Sen**

Let denote a symmetric Wigner matrix with i.i.d. entries . Also, given symmetric, define:

Define as

What is the value of , defined as

It is known that (see Montanari and Sen’s paper and/or my notes). A reasonable conjecture is that it is equal to . Remarkably , this would imply that the SDP relaxation for clustering under the Stochastic Block Model on 2 clusters is also optimal for detection.

### Like this:

Like Loading...

*Related*

Hi

It might be silly, But am I ask how do we know that \sum \beta_i = 0.

Thanks

Hi nikbakhtian,

Thanks for the question. The point is that if then the term can be absorbed into . Say we take , then and we can just update by . Setting is a natural way to avoid this ambiguity.

Does this make sense?

Best,

Afonso

Yes it makes sense. But in fact $sum \beta_{i} = 0$. I think I found a way to demonstrate it. Restating the minimisation of error like a least sure regression, adding the translation part into matrix V, a column of 1s, normalised, calling the new matrix A. The last column of $AA_{T}$ has to be perpendicular to affine subspace, and that is enough demonstrate that the affine subspace has to pass through the centroid , $\mu_{n} \$.

Hi nikbakhtian,

In general there is an ambiguity, let’s say that , then we could take and it is clear that, for any taking would give a solution with no error. But clearly, all these solution are equivalent, asking that is a natural way of removing that ambiguity.

Best,

Afonso