18.S096: Principal Component Analysis in High Dimensions and the Spike Model

I just made available here the first lecture note for my course this fall. It includes three open problems that I will present below. In short, the content of the lecture is as follows:

It starts with a description (and derivation) of Principal Component Analysis and follows by (using Random Matrix Theory) attempting to understand the performance of Principal Component Analysis by looking into the distribution of the spectrum of the the sample covariance matrix when the true distributional covariance is the identity, arriving at the Marchenko-Pastur distribution.

A natural question is then what happens if the distributional covariance matrix has a low dimensional structure, and a particularly simple such example is a rank 1 spike \Sigma = I + \beta vv^T. We derive, in that case, for which values of \beta we expect to see also a spike on the eigenvalue distribution of the sample covariance matrix (an eigenvalue popping out of the support of the Marchenko-Pastur), and realize that there is a critical value of \beta for which this starts happening (this is generally referred to as the BPP transition).

Take a look at it here!

Now for the Open problems:

Open Problem 1.1.

A problem by Mallat and Zeitouni that I discussed on this blog, see here.

Open Problem 1.2.

A problem regarding the monotonicity of the average singular value of a Gaussian matrix, also discussed previously in this blog, see here.

Open Problem 1.3.

What follows is a problems posed by Andrea Montanari and Subhabrata Sen

Let {W} denote a symmetric Wigner matrix with i.i.d. entries {W_{ij}\sim \mathcal{N}(0,1)}. Also, given {B\in\mathbb{R}^{n\times n}} symmetric, define:

Q(B) = \max\left\{tr(BX): X\succeq 0, X_{ii}=1 \right\}.

Define {q(\xi)} as

q(\xi) = \lim_{n\rightarrow\infty} \frac1n\mathbb{E} Q\left( \frac{\xi}n\mathbf{1}\mathbf{1}^T + \frac1{\sqrt{n}}W \right).

What is the value of {\xi_\ast}, defined as

\displaystyle \xi_\ast = \inf\{ \xi\geq 0: q(\xi)>2\}.

 

It is known that {1\leq \xi_\ast <2} (see Montanari and Sen’s paper and/or my notes). A reasonable conjecture is that it is equal to {1}. Remarkably , this would imply that the SDP relaxation for clustering under the Stochastic Block Model on 2 clusters  is also optimal for detection.

Advertisement

4 thoughts on “18.S096: Principal Component Analysis in High Dimensions and the Spike Model

    1. Hi nikbakhtian,

      Thanks for the question. The point is that if \sum \beta_i \neq 0 then the term \sum \beta_i can be absorbed into \mu. Say we take \beta_i' = \beta_i - \frac1n\sum \beta_i, then \sum \beta_i' = 0 and we can just update \mu by \mu' = \mu + V\left( \frac1n\sum \beta_i \right). Setting \sum \beta_i = 0 is a natural way to avoid this ambiguity.

      Does this make sense?

      Best,
      Afonso

      1. Yes it makes sense. But in fact $sum \beta_{i} = 0$. I think I found a way to demonstrate it. Restating the minimisation of error like a least sure regression, adding the translation part into matrix V, a column of 1s, normalised, calling the new matrix A. The last column of $AA_{T}$ has to be perpendicular to affine subspace, and that is enough demonstrate that the affine subspace has to pass through the centroid , $\mu_{n} \$.

      2. Hi nikbakhtian,
        In general there is an ambiguity, let’s say that d=p, then we could take V=I and it is clear that, for any \mu taking \beta_i = x_i -\mu would give a solution with no error. But clearly, all these solution are equivalent, asking that \sum_{i=1}^n \beta_i = 0 is a natural way of removing that ambiguity.
        Best,
        Afonso

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s