Norm of random matrices with independent entries

A few weeks ago, Ramon van Handel and I uploaded a paper entitled “Sharp non asymptotic bounds on the norm of random matrices with independent entries” to the arxiv. In this blog post, I’ll attempt to describe and motivate the results there.

The main motivation comes from the vast number of applications for bounds on the spectral norm of random matrices, some of which I have described on this blog (see herehere, and here). In many (applied) math problems one needs to control the spectral norm of a certain application-specific random matrix. A common approach, which has found great success, is to introduce randomness in the models and control this quantity on certain random matrices. Perhaps the simplest example of how randomness can help is the spectral norm of an {n\times n} standard Wigner matrix, a symmetric matrix with iid standard gaussian entries. As each entry of the matrix is essentially of constant order the spectral norm of such a matrix can go up to {\Omega(n)}, however, due to a concentration phenomenon, it is almost always roughly {2\sqrt{n}}. Unfortunately, random matrix in many applications do not correspond to this simple model.

Several fairly general techniques exists to bound the spectral norm of random matrices. A particularly notable one is a non-commutative version of Khintchine inequality (see here). Recently, there has been some effort in developing an alternative approach to this problem, via “matrix concentration inequalities”, essentially matrix versions of Chernoff-type bounds. The first instance of this is by Ahlswede and Winter (see here) and was improved by Oliveira and Tropp. I recommend Tropp’s monograph for a particularly elegant exposition and proof of this type of results and a vast description of interesting examples.

Although the results mentioned above apply in more general settings we will restrict our attention to matrices with independent entries. We will focus on bounding the expected value of the spectral norm of matrices with gaussian entries — in fact, it is not difficult to adapt our results to deal with either different distributions or to create tail bounds (see the paper for more details).

More specifically, consider a symmetric random matrix {X} with independent entries distributed as

\displaystyle X_{ij} \sim \mathcal{N}\left(0,b_{ij}^2\right).

The results described above, when applied to this setting, give the following inequality

\displaystyle \mathbb{E} \|X\| \leq c \sigma \sqrt{\log n}, \quad \text{ with }\quad \sigma = \max_j\sqrt{\sum_i b_{ij}^2},

for some universal constant {c} (the constants tend to be better in the matrix concentration approach).

Unfortunately, this bounds fails to capture the right asymptotic order already for the simple Wigner case, where the bound gives {\mathbb{E}\|X\| \leq c \sqrt{n\log n}} while {\mathbb{E}\|X\| \sim \sqrt{n}}. On the other hand, the logarithmic term is needed in some cases, when {X} is a diagonal matrix ({b_{ij} = \delta_{ij}}) when {\sigma = 1} while {\|X\|} is distributed as the maximum as the maximum of {n} iid standard gaussians and so {\mathbb{E}\|X\| \sim \sqrt{\log n}}.

It is intuitive that if we decrease the variance of some entries of a Wigner matrix, {\mathbb{E}\|X\|} should not increase. Indeed, such intuition can be made precise and gives the following bound (often attributed to Gordon, see here).

\displaystyle \mathbb{E}\|X\| \leq c \sigma_\ast \sqrt{n}, \quad \text{ with }\quad \sigma_\ast = \max_{ij}\left| b_{ij} \right|,

for some universal constant {c}. While this explains the Wigner matrix case it is worst than the bound above in most other situations.

The purpose of our paper is to provide a general sharp bound for this quantity. To help guiding intuition let us consider an example that interpolates between the two above, given {d}, let us consider a symmetric {n\times n} matrix {X^{(d)}} consisting of {\frac{n}{d}\times \frac{n}{d}} blocks of size {d\times d}. On each of the diagonal blocks the entries are iid standard gaussians, and the rest of the matrix is equal to zero. {X^{(n)}} corresponds to the Wigner matrix example and {X^{(1)}} to the diagonal example. It is also easy to see that {\sigma = \sqrt{d}} and {\sigma_\ast = 1}. In this family of examples {\|X\|} is the maximum spectral norm among {\frac{n}{d}} independent {d\times d} standard Wigner matrices. In this case, it is not very hard to show that

\displaystyle \mathbb{E}\|X\| \leq c \left( \sqrt{d} + \sqrt{\log n} \right),

for some universal constant {c}.

Not only this observation serves to motivate the inequality that we prove, but, in fact, we prove our inequality using a combinatorial comparison to reduce the general problem to this case (see the paper for more details, I really like the proof!). More specifically, we show, for {X} a symmetric {n\times n} random matrix with gaussian entries and for any {\epsilon>0},

\displaystyle \mathbb{E}\|X\| \leq (2+\epsilon) \sigma + c_\epsilon \sigma_\ast\sqrt{\log n},

for some constant {c_\epsilon}, depending only on {\epsilon}.

This bound is indeed sharp in a very natural sense. As {\sigma \sim \max_k\mathbb{E}\|Xe_k\|}, where {e_k} is the {k}-th element of the canonical basis, {\sigma} is a lower bound. Also, as long as there are {\mathrm{poly}(n)} entries with variance comparable to {\sigma_\ast^2}, the {\sigma_\ast\sqrt{\log n}} is also needed. Also, the gaussian assumption on the entries can be dropped and it is easy to obtain tail bounds on {\|X\|} using these bounds on {\mathbb{E}\|X\|} and standard concentration techniques.

There are many fascinating open problems left unresolved. Perhaps the most important is to provide general sharp bounds for the setting on which the entries of {X} are correlated. Another interesting question is related with bounded random variables, if instead of gaussian random variables we have Rademacher random variables (weighted by {b_{ij}} say), then the {\sqrt{\log n}} term is not always needed. It is known that even when {d} is constant, if the {b_{ij}}‘s correspond to an adjacency matrix of a {d}-regular graph, then {\mathbb{E}\|X\| \sim \sqrt{d}}. However, there are examples (such as the block example above) where the logarithmic terms are needed. It would be interesting to understand this phenomenon better, perhaps it is related to combinatorial properties of the underlying graph (described by the {b_{ij}}‘s).

Another interesting question, posed by Riemer and Schuett, is wether it is true that, when {X} has independent gaussian entries,

\displaystyle \mathbb{E} \|X\| \sim \mathbb{E}\left[ \max_{k}\|Xe_k\| \right].

Note that our result forces that any counter-example for this equality needs to be very inhomogeneous, as whenever there are {\mathrm{poly}(n)} entries with variance comparable to {\sigma_\ast^2}, our bound implies this equality.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s