# The little Grothendieck problem over O(d)

Today’s post is about a recent paper of mine with Christopher Kennedy and Amit Singer entitled “Approximating the Little Grothendieck Problem over the Orthogonal and Unitary Groups”. Earlier this week, we finished a major revision to the paper (even the title was slightly changed) and I decided to take the opportunity to blog about it (in a previous blog post I blogged about an open problem that is raised by this work).

— Generalized orthogonal Procrustes Problem —

In this post I will motivate the problem using the orthogonal Procrustes problem: Given ${n}$ point clouds in ${\mathbb{R}^d}$ of ${k}$ points each, the generalized orthogonal Procrustes problem consists of finding ${n}$ orthogonal transformations that best simultaneously align the point clouds. If the points are represented as the columns of matrices ${A_1,\dots,A_n}$, where ${A_i \in \mathbb{R}^{d\times k}}$ then the orthogonal Procrustes problem consists of solving

$\displaystyle \min_{O_1,.. .,O_n \in O(d)} \sum_{i,j=1}^n ||O_i^T A_i - O_j^T A_j||_F^2. \ \ \ \ \ (1)$

Since ${||O_i^T A_i - O_j^T A_j||_F^2 = \|A_i\|_F^2+\|A_j\|_F^2 - 2\mathrm{tr}\left( (A_iA_j^T)^T O_i O_j^T \right)}$, (1) has the same solution as the complementary version of the problem

$\displaystyle \max_4{O_1,.. .,O_n \in O(d)} \sum_{i,j=1}^n \mathrm{tr}\left( C_{ij}^T O_i O_j^T \right), \ \ \ \ \ (2)$

where ${C\in\mathbb{R}^{dn\times dn}}$ has ${d\times d}$ blocks given by ${C_{ij} = A_iA_j^T}$. Note that ${C}$ is positive semidefinite.

In fact, we will focus on problems of the form (2), for any positive definite ${C\in\mathbb{R}^{dn\times dn}}$ (which encodes a few other problems, see more here).

— The little Grothendieck problem (${\mathbf{d=1}}$) —

For ${d=1}$, (2) reduces to the well-studied little Grothendieck problem in combinatorial optimization (see this reference)

$\displaystyle \max_{x_i\in \{\pm1\}}\sum_{i=1}^n\sum_{j=1}^nC_{ij}x_ix_j, \ \ \ \ \ (3)$

where ${C}$ is a ${n\times n}$ positive semidefinite matrix matrix.

Problem (3) is known to be NP-hard (if ${C}$ is a Laplacian matrix of a graph then (3) reduces to the Max-Cut problem). I briefly discribed a semidefinite relaxation for (3) proposed by Goemans and Williamson in the context of the Max-Cut problem:

$\displaystyle \max_{\substack{ X_i\in\mathbb{R}^{n} \\ \|X_i\|^2=1}} \sum_{i=1}^n\sum_{j=1}^n C_{ij}X_i^TX_j. \ \ \ \ \ (4)$

Note that (4) is equivalent to the semidefinite program:

$\displaystyle \max_{\substack{ G\in\mathbb{R}^{n\times n} \\ G_{ii}=1 ,\ G\succeq 0 }} \mathrm{tr}(CG), \ \ \ \ \ (5)$

and ${X_i^TX_j=G_{ij}}$ can be recovered by the Cholesky decomposition, and so (4) can be solved, up to arbitrary precision, in polynomial time.

In the same blog post I explained how a simple rounding technique (to compute a solution ${x_i\in\{\pm1\}}$ to (3) from a solution ${X_i}$ to (4)) is guaranteed to produce a solution whose objective value is, in expectation, at least ${\frac2\pi\min_{0\leq \theta \leq \pi} \frac{\theta}{1-\cos\theta} \approx 0.878}$ of the optimum. It turns out that one can show an approximation ratio of ${\frac2\pi\approx 0.637}$ (instead of 0.878) for the general ${C\succeq 0}$ case using the same relaxation. This implies, in particular, that the value of (3) can never be smaller than ${\frac2{\pi}}$ times the value of (4). Interestingly, this was already known from the influential work of Grothendieck on norms of tensor products of Banach spaces.

— Little Grothendieck problem over the Orthogonal Group —

In our paper we propose, and analyze, a natural semidefinite relaxation to problem (2). It is a natural generalization of the ${d=1}$ case:

$\displaystyle \max_{\substack{ X_iX_i^T=I_{d\times d} \\ X_i\in\mathbb{R}^{d\times dn} }} \sum_{i=1}^n\sum_{j=1}^n\mathrm{Tr}\left(C_{ij}^TX_iX_j^T\right). \ \ \ \ \ (6)$

Just as in the unidimensional case, (6) is equivalent to a semidefinite program:

$\displaystyle \max_{\substack{ G\in\mathbb{R}^{dn\times dn} \\ G_{ii}=I_{d\times d} ,\ G\succeq 0 }} \mathrm{tr}(CG). \ \ \ \ \ (7)$

We propose the following approximation algorithm

Algorithm Compute ${\{X_i\}_{i=1}^n}$, a solution to (6). Let ${R\in\mathbb{R}^{nd\times d}}$ be a random matrix with iid gaussian entries. Compute ${V_i\in O(d)}$, a solution to (2), as

$\displaystyle V_i = \mathcal{P}(X_iR),$

where ${\mathcal{P}(X) = \mathrm{argmin}_{Z\in O(d)}\left\|Z-X\right\|_F}$.

Note that the Polar Decomposition ${\mathcal{P}(X)}$ can be efficiently computed via the Singular Value Decomposition ${X=U\Sigma V^T}$ as ${\mathcal{P}(X)=UV^T}$.

One of the main contributions of the paper is showing that this algorithm gives a constant factor approximation of ${\alpha_{\mathbb{R}}(d)^2}$ to (2), where ${\alpha_{\mathbb{R}}(d)}$ is defined as

$\displaystyle \alpha_{\mathbb{R}}(d) := \mathbb{E}\left[ \frac1d \sum_{j=1}^d \sigma_j(G)\right], \ \ \ \ \ (8)$

where ${G \in \mathbb{R}^{d\times d}}$ is a gaussian random matrix with i.i.d entries ${\mathcal{N}\left(0,d^{-1}\right)}$ and ${\sigma_j(G)}$ is the ${j}$-th singular value of ${G}$.

Theorem 1 Let ${C\succeq 0}$ and real. Let ${V_1,\dots,V_n\in \mathcal{O}_d}$ be the (random) output of the algorithm above. Then

$\displaystyle \mathbb{E} \left[ \sum_{i=1}^n\sum_{j=1}^n\mathrm{Tr}\left(C_{ij}^TV_iV_j^T\right) \right] \geq \alpha_{\mathbb{R}}(d)^2 \max_{\substack{ X_iX_i^T=I_{d\times d} \\ X_i\in\mathbb{R}^{d\times dn} }} \sum_{i=1}^n\sum_{j=1}^n\mathrm{Tr}\left(C_{ij}^TX_iX_j^T\right),$

where ${\alpha_{\mathbb{R}}(d)}$ is the constant defined in (8).

Moreover, we can also show a matching integrality gap meaning that this constant is optimal. I don’t want to go into this in much detail, but it essentially means that no other algorithm based on this natural semidefinite relaxation can have a better approximation ratio. This does not rule out the existence of a different efficient approximation algorithm that achieves a better ratio but, in these sort of problems, it tends to be the case that achieving better approximation ratios than the natural semidefinite relaxations is hard.

One of the interesting questions raised by this work is the behavior of ${\alpha_{\mathbb{R}}(d)}$. This was the subject of a previous blog post and I won’t discuss it much here. In a nutshell, we can give good lower bounds for this constant (that, in particular, show that our approximation ratio is larger than the best previously known, for all dimensions ${d}$) and conjecture that it is monotonically increasing with ${d}$. I find it quite counter-intuitive that problem (2) should be getting easier to approximation as the dimension ${d}$ increases! Perhaps even more strange is that if we do all of the above working in ${\mathbb{C}}$, considering the unitary group ${U(d)}$, and define the analogous constant ${\alpha_{\mathbb{C}}(d)}$ it seems to be decreasing with the dimension instead! We discussed this strange phenomenon in this blog post.