This is a guest post by Nicolas Boumal, a friend and collaborator from Université catholique de Louvain (Belgium), now at Inria in Paris (France), who develops Manopt: a toolbox for optimization on manifolds.
Optimization on manifolds is about solving problems of the form
where is a nice, known manifold. By “nice”, I mean a smooth, finite-dimensional Riemannian manifold.
Practical examples include the following (and all possible products of these):
- Euclidean spaces
- The sphere (set of vectors or matrices with unit Euclidean norm)
- The Stiefel manifold (set of orthonormal matrices)
- The Grassmann manifold (set of linear subspaces of a given dimension; this is a quotient space)
- The rotation group (set of orthogonal matrices with determinant +1)
- The manifold of fixed-rank matrices
- The same, further restricted to positive semidefinite matrices
- The cone of (strictly) positive definite matrices
Conceptually, the key point is to think of optimization on manifolds as unconstrained optimization: we do not think of as being embedded in a Euclidean space. Rather, we think of as being “the only thing that exists,” and we strive for intrinsic methods. Besides making for elegant theory, it also makes it clear how to handle abstract spaces numerically (such as the Grassmann manifold for example); and it gives algorithms the “right” invariances (computations do not depend on an arbitrarily chosen representation of the manifold).
There are at least two reasons why this class of problems is getting much attention lately. First, it is because optimization problems over the aforementioned sets (mostly matrix sets) come up pervasively in applications, and at some point it became clear that the intrinsic viewpoint leads to better algorithms, as compared to general-purpose constrained optimization methods (where is considered as being inside a Euclidean space , and algorithms move in , while penalizing distance to ). The second is that, as I will argue momentarily, Riemannian manifolds are “the right setting” to talk about unconstrained optimization. And indeed, there is a beautiful book by P.-A. Absil (my PhD advisor), R. Sepulchre (his advisor) and R. Mahony, called Optimization algorithms on matrix manifolds (freely available), that shows how the classical methods for unconstrained optimization (gradient descent, Newton, trust-regions, conjugate gradients…) carry over seamlessly to the more general Riemannian framework. So we can have one theory to cover optimization over a huge class of sets.
To help practitioners and researchers experiment with these ideas, with as gentle a learning curve as we could manage, Bamdev Mishra and I, together with our former advisors, develop Manopt: a Matlab toolbox that comes with online documentation, flexible features, friendly debugging tools and a help forum.
Optimization on manifolds is just as easy as unconstrained nonlinear optimization
Consider the gradient descent method on a Euclidean space for a moment. The method proceeds, from a starting point , by iterating:
where is the gradient of at and is the step size. It is well-known that if is chosen appropriately, under mild conditions, the iterates converge to critical points, i.e., as . The rationale underlying this method is: at the current iterate , among all possible directions, let us select the locally most promising one (the negative gradient), and move along that one for an appropriate distance.
Nothing about this procedure truly requires the search space to be a vector space.
This time, consider is a sphere. We would like to move away from the current point to improve our predicament. Bearing in mind that, ultimately, we want to remain on the sphere, it certainly makes no sense to consider directions that move radially away from the sphere. So we’ll restrict our attention to the vectors at that are tangent to the sphere (they form the tangent space: an intrinsic notion). Among all of these, we want to select the (locally) most promising direction. This requires the ability to compare tangent vectors. What better way to do this than to equip the tangent space with an inner product? Doing so in a principled way (so that tangent spaces close-by have “similar” inner products) endows the sphere with a Riemannian structure. This gives rise to a notion of gradient on the sphere, called the Riemannian gradient, denoted by . Finally, it wouldn’t be okay to move to : doesn’t live on a vector space, so that this subtraction makes no sense. Instead, we want to follow a geodesic, starting at and aligned with the chosen direction . In practice though, geodesics can be expensive to compute. Fortunately, we can make do with a crude approximation of geodesics, through the concept of retractions:
With the obvious Riemannian geometry on the sphere (obtained by restricting the canonical Euclidean metric to the individual tangent spaces), the Riemannian gradient is simply the orthogonal projection of the classical gradient:
and a typical retraction is to renormalize:
That’s it: the general theory provides global and local convergence results to first-order critical points with these simple ingredients (under mild conditions on ). The first-order necessary optimality conditions are simply (compare this with the extra work needed if we go through Lagrange multipliers to establish KKT conditions). Manopt includes a library of manifolds to pick from, that come with metrics, projections, retractions and the likes, for a quick and easy start.
Notice that an optimization problem is posed on a set, not on a Riemannian manifold. In effect, the geometry we put on the set affects the algorithm (much like preconditioning), but is not intrinsic to a specific problem. Finding out which geometries work best is an active area of research.
It’s also just as hard
Manifolds of interest are typically non convex, so that very little is known when it comes to computing global optimizers. In practice, it often works, though. One nice touch is to use spectral or convex relaxations to obtain a good initial iterate, and to refine it with optimization on manifolds. See for example this paper about noisy sensor network localization and a short one about synchronization of rotations.
There are some early successes where, under specific assumptions on the distribution of the data, researchers were able to show convergence to global optimizers with high probability, for example in low-rank matrix completion and dictionary learning; see also results involving geodesic convexity. The former are problems for which convex relaxations are known to work too, but solving those relaxations is often too expensive. Since in comparison the manifold approach (non convex but low dimensional) is faster and often yields the same answer, understanding the limits of this technique is a challenge worth undertaking.
With Manopt, one of our goals is to make it easy to experiment with optimization on manifolds, to help find (and push) its limits.