Optimizing in smooth waters: optimization on manifolds

This is a guest post by Nicolas Boumal, a friend and collaborator from Université catholique de Louvain (Belgium), now at Inria in Paris (France), who develops Manopt: a toolbox for optimization on manifolds.

Optimization on manifolds is about solving problems of the form

\mathrm{minimize}_{x\in\mathcal{M}} f(x),

where \mathcal{M} is a nice, known manifold. By “nice”, I mean a smooth, finite-dimensional Riemannian manifold.

Practical examples include the following (and all possible products of these):

  • Euclidean spaces
  • The sphere (set of vectors or matrices with unit Euclidean norm)
  • The Stiefel manifold (set of orthonormal matrices)
  • The Grassmann manifold (set of linear subspaces of a given dimension; this is a quotient space)
  • The rotation group (set of orthogonal matrices with determinant +1)
  • The manifold of fixed-rank matrices
  • The same, further restricted to positive semidefinite matrices
  • The cone of (strictly) positive definite matrices

Conceptually, the key point is to think of optimization on manifolds as unconstrained optimization: we do not think of \mathcal{M} as being embedded in a Euclidean space. Rather, we think of \mathcal{M} as being “the only thing that exists,” and we strive for intrinsic methods. Besides making for elegant theory, it also makes it clear how to handle abstract spaces numerically (such as the Grassmann manifold for example); and it gives algorithms the “right” invariances (computations do not depend on an arbitrarily chosen representation of the manifold).

There are at least two reasons why this class of problems is getting much attention lately. First, it is because optimization problems over the aforementioned sets (mostly matrix sets) come up pervasively in applications, and at some point it became clear that the intrinsic viewpoint leads to better algorithms, as compared to general-purpose constrained optimization methods (where \mathcal{M} is considered as being inside a Euclidean space \mathcal{E}, and algorithms move in \mathcal{E}, while penalizing distance to \mathcal{M}). The second is that, as I will argue momentarily, Riemannian manifolds are “the right setting” to talk about unconstrained optimization. And indeed, there is a beautiful book by P.-A. Absil (my PhD advisor), R. Sepulchre (his advisor) and R. Mahony, called Optimization algorithms on matrix manifolds (freely available), that shows how the classical methods for unconstrained optimization (gradient descent, Newton, trust-regions, conjugate gradients…) carry over seamlessly to the more general Riemannian framework. So we can have one theory to cover optimization over a huge class of sets.

To help practitioners and researchers experiment with these ideas, with as gentle a learning curve as we could manage, Bamdev Mishra and I, together with our former advisors, develop Manopt: a Matlab toolbox that comes with online documentation, flexible features, friendly debugging tools and a help forum.

Optimization on manifolds is just as easy as unconstrained nonlinear optimization

steepestdescent_compare_euclidean_sphere

Consider the gradient descent method on a Euclidean space \mathcal{E} for a moment. The method proceeds, from a starting point x_0 \in \mathcal{E}, by iterating:

x_{k+1} = x_k - \alpha_k \nabla f(x_k),

where \nabla f(x_k) is the gradient of f at x_k and \alpha_k > 0 is the step size. It is well-known that if \alpha_k is chosen appropriately, under mild conditions, the iterates converge to critical points, i.e., \nabla f(x_k) \to 0 as k \to \infty. The rationale underlying this method is: at the current iterate x_k, among all possible directions, let us select the locally most promising one (the negative gradient), and move along that one for an appropriate distance.

Nothing about this procedure truly requires the search space to be a vector space.

This time, consider \mathcal{M} is a sphere. We would like to move away from the current point x_k to improve our predicament. Bearing in mind that, ultimately, we want to remain on the sphere, it certainly makes no sense to consider directions that move radially away from the sphere. So we’ll restrict our attention to the vectors at x_k that are tangent to the sphere (they form the tangent space: an intrinsic notion). Among all of these, we want to select the (locally) most promising direction. This requires the ability to compare tangent vectors. What better way to do this than to equip the tangent space with an inner product? Doing so in a principled way (so that tangent spaces close-by have “similar” inner products) endows the sphere with a Riemannian structure. This gives rise to a notion of gradient on the sphere, called the Riemannian gradient, denoted by \mathrm{grad} f(x_k). Finally, it wouldn’t be okay to move to x_k - \alpha_k \mathrm{grad} f(x_k): x_k doesn’t live on a vector space, so that this subtraction makes no sense. Instead, we want to follow a geodesic, starting at x_k and aligned with the chosen direction -\mathrm{grad} f(x_k). In practice though, geodesics can be expensive to compute. Fortunately, we can make do with a crude approximation of geodesics, through the concept of retractions:

x_{k+1} = \mathrm{Retraction}_{x_k}(-\alpha_k \mathrm{grad} f(x_k)).

With the obvious Riemannian geometry on the sphere (obtained by restricting the canonical Euclidean metric to the individual tangent spaces), the Riemannian gradient is simply the orthogonal projection of the classical gradient:

\mathrm{grad} f(x) = (I - xx^T) \nabla f(x),

and a typical retraction is to renormalize:

\mathrm{Retraction}_x(u) = \frac{x+u}{\|x+u\|}.

That’s it: the general theory provides global and local convergence results to first-order critical points with these simple ingredients (under mild conditions on f). The first-order necessary optimality conditions are simply \mathrm{grad} f(x) = 0 (compare this with the extra work needed if we go through Lagrange multipliers to establish KKT conditions). Manopt includes a library of manifolds to pick from, that come with metrics, projections, retractions and the likes, for a quick and easy start.

Notice that an optimization problem is posed on a set, not on a Riemannian manifold. In effect, the geometry we put on the set affects the algorithm (much like preconditioning), but is not intrinsic to a specific problem. Finding out which geometries work best is an active area of research.

It’s also just as hard

Manifolds of interest are typically non convex, so that very little is known when it comes to computing global optimizers. In practice, it often works, though. One nice touch is to use spectral or convex relaxations to obtain a good initial iterate, and to refine it with optimization on manifolds. See for example this paper about noisy sensor network localization and a short one about synchronization of rotations.

There are some early successes where, under specific assumptions on the distribution of the data, researchers were able to show convergence to global optimizers with high probability, for example in low-rank matrix completion and dictionary learning; see also results involving geodesic convexity. The former are problems for which convex relaxations are known to work too, but solving those relaxations is often too expensive. Since in comparison the manifold approach (non convex but low dimensional) is faster and often yields the same answer, understanding the limits of this technique is a challenge worth undertaking.

With Manopt, one of our goals is to make it easy to experiment with optimization on manifolds, to help find (and push) its limits.

Advertisements

8 thoughts on “Optimizing in smooth waters: optimization on manifolds

  1. I have a question: Is it possible to run stochastic gradient descent instead of standard gradient descent on sphere?
    Another question is that: in the minimization problem, the retraction map will be used with -grad f(x) or grad f(x) direction.

    Thanks!

    1. Hello,

      Regarding SGD on manifolds, there is a nice paper by Silvère Bonnabel here:
      http://arxiv.org/abs/1111.5280
      That would of course apply to the sphere as well.

      About your second question: you are right, there is a typo in the blog post. The retraction is applied to -\alpha_k grad f(x_k). Sorry about that! (Afonso, do you think you could fix the typo? Thanks!)

      Best,
      Nicolas

  2. Is there a (nice) way of addressing problems which include non-manifold constraints in addition to manifold constraint(s), and get those passed into a constrained version of solver, for instance, without losing (all the) benefits of the Riemannian geometry apparatus? I.e, instead of solving an optimization problem which is unconstrained in a Riemannan manifold, solve an optimization problem which is (further) constrained in a Riemannian manifold. Or would that hopelessly muck up the Riemmanian geometry stuff? if doable, I suppose some,kinds of constraints are easier to incorporate than others?

    I don’t have an example problem to offer at this time, but perhaps I will think of one. What can I say, I like incorporating constraints, of all kinds, into optimization problems.

    1. Hello,

      That is a good question, and one that comes up quite a bit in applications actually. At the moment, I do not know about any paper that treats the general problem of constrained optimization over manifolds, though such work would certainly interest a lot of people.

      A natural framework to try and generalize to manifolds might be the Augmented Lagrangian Method, which is a type of quadratic penalty method (in terms of the constraint violation), but with an extra linear term whose main positive effect (from a numerical standpoint) is that it allows to get good constraint satisfaction without increasing the penalty parameter all the way to infinity.

      I had a few home trials with ALM on a sphere the other day (for equality constraints on top of the sphere constraints, where the latter was enforced via Riemannian optimization and the former were (approximately) enforced by ALM), and this worked excellently. I conducted these tests using Manopt, it took just an hour to set it up (http://www.manopt.org).

      So in short: I don’t know of any existing paper that does constrained Riemannian optimization (though it is possibly an oversight on my part), but I do think many things should be doable.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s