The Lowner-Heinz Theorem and a simple result that should have a simpler proof

A few months ago, while trying to show that the analysis in this paper is optimal (I discussed the paper in a past post), I was faced with the following question:

Given {d} non-negative variance values {v_1,\dots,v_d} that sum to {d}, consider a {d\times d} random matrix whose entries {(i,j)} are independent gaussians with mean zero and variance equal to {v_j}. For which choice of {v_1,\dots,v_d} is the expected value of the average singular value of that matrix maximized?

After thinking about it a bit, I conjectured that this was maximized when {v_1,\dots,v_d} were all equal to {1}, and thought this had to have an elementary proof. The result is indeed true but I wasn’t able to come up with an elementary proof for it. I ended up proving it using a non-elementary result that I didn’t know about before, the Lowner-Heinz Theorem. The purpose of this blog post is twofold, 1) to challenge the reader to find an elementary proof for this fact and 2) to briefly present this cool and powerful matrix inequality, by showing how it allowed me to solve the problem above.

Here is a more precise version of the statement:

Theorem 1 Let {G} be a {d\times d} random matrix with i.i.d. standard gaussian entries, and {\sigma_i(M)} the {i}-th singular value of {M}. Then,
\displaystyle \max_{ \substack{D \text{ diagonal} \\ \|D\|_F^2 = d \\ D \succeq 0 } }\mathbb{E} \sum_{i=1}\sigma_i(GD) = \mathbb{E} \sum_{i=1}\sigma_i(G),

I suggest the reader to think about this Theorem a bit and try to come up with an elementary proof for it before scrolling down and reading the proof based on the Lowner-Heinz Theorem. If you do come up with an elementary proof, I would love to hear about it.

Proof sketch (for the full proof see Theorem 18 in this paper): Recall that we want to show:

\displaystyle \max_{ \substack{D \text{ diagonal} \\ \|D\|_F^2 = d \\ D \succeq 0 } }\mathbb{E} \sum_{i=1}\sigma_i(GD) = \mathbb{E} \sum_{i=1}\sigma_i(G),

where {G} is a {d\times d} matrix with i.i.d. entries {\mathcal{N}\left(0,1\right)}. By taking {V = D^2}, and using the definition of singular value, we can rewrite the equality as
\displaystyle \max_{ \substack{V \text{ diagonal} \\ Tr(V) = d \\ V \succeq 0 } }\mathbb{E} Tr\left[\left( GVG^T \right)^{\frac12}\right]= \mathbb{E} Tr\left[\left( GG^T \right)^{\frac12}\right], \ \ \ \ \ (1)

We will use concavity arguments. The idea is to show that if {V^{(\ast)}} and {V^{(\ast\ast)}} are maximizers, then so needs to be their average. After we show this fact the rest of the proof is easy: out of all the maximizers consider the one with the smallest Frobenius norm {V^{(\ast)}}, if {V^{(\ast)}} has constant diagonal we are done, otherwise we can swap two of its different diagonal elements and create {V^{(\ast\ast)}}. By rotation invariance of the Gaussian, {V^{(\ast\ast)}} has to also be a minimizer, and since {V^{(\ast\ast)}} and {V^{(\ast)}} are not a multiple of each other, their average needs to have smaller Frobenius norm which, together with the concavity, would lead in a contradiction.

We will proceed by contradiction, suppose (1) does not hold, since the optimization space is compact and the function continuous it must have a maximum that is attended by a certain {V\neq I_{d\times d}}. Out of all maximizers {V}, let {V^{(\ast)}} be one with smallest possible Frobenius norm, the idea will be to use concavity arguments to build an optimal {V^{(\#)}} with smallest Frobenius norm, arriving at a contradiction and hence showing the theorem.

Note that, by linearity of expectation, what we want to show is that

\displaystyle Tr\left[\left( \frac{GV^{(\ast)}G^T+GV^{(\ast\ast)}G^T }2 \right)^{\frac12}\right] - \frac{ Tr\left[\left( GV^{(\ast)} G^T \right)^{\frac12}\right] + Tr\left[\left( GV^{(\ast\ast)} G^T \right)^{\frac12}\right] }2,

has non-negative expectation. Note that, if the exponent {\frac12} was outside of the trace, then concavity of the square-root function would give us what we want.
Since the trace is also linear, this is equivalent to showing that the trace of

\displaystyle \left( \frac{GV^{(\ast)}G^T+GV^{(\ast\ast)}G^T }2 \right)^{\frac12}- \frac{ \left( GV^{(\ast)} G^T \right)^{\frac12} + \left( GV^{(\ast\ast)} G^T \right)^{\frac12} }2, \ \ \ \ \ (2)

has non-negative expectation. It turns out that the much stronger statement is true: For any realization of the matrix in (2), all it’s eigenvalues are non-negative (which implies that the trace is always non-negative). In fact, this follows from a property of the square-root function called operator concavity: given two {d\times d} matrices {A\succeq 0} and {B\succeq 0}, and any {0\leq t\leq 1}, the following holds:
\displaystyle \left( (1-t)A + tB \right)^{\frac12} - (1-t)A ^{\frac12} + tB^{\frac12} \succeq 0. \ \ \ \ \ (3)

Lowner-Heinz Theorem (whose proof is in this very nice notes) establishes operator concavity (or convexity) of many functions, including the square root function.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s