Later today I am giving a lecture in IMPA, Brazil . It is part of a special course on Randomness, Matrices and High Dimensional Problems (given together with Roberto Oliveira). Below are the notes for today’s lecture. See here for a printer friendly version. The content is based on a result in here.
1. The problem we will focus on
Let be an even positive integer. Given two sets of
nodes consider the following random graph
: For each pair
of nodes,
is an edge of
with probability
if
and
are in the same set and
if they are in different sets. Each edge is drawn independently and
.
(Think nodes as fans of Fluminense and Flamengo and edges representing friendships, in this model, fans of the same club are more likely to be friends)
For which values of and
can we recover the partition, with an efficient algorithm, from only looking at the graph
(with high probability)?
2. The interesting regime
If then it is easy to see that each cluster will not be connected (with high probability) and so recovery is not possible. In fact, the interesting regime is when
for constants .
Let be the adjacency matrix of
, meaning that
Let with
represent a partition (note there is an ambiguity in the sense that
and
represent the same partition). Then, if we did not worry about efficiency then our guess (which corresponds to the Maximum Likelihood Estimator) would be the solution of
as this maximizes the number of edges within the clusters minus the number of edges across the clusters (the clusters being each set of the partition given by ).
In fact, one can show (but will not be the focus of this lecture, see here for a proof) that if
then, with high probability, (4) recovers the true partition. Moreover, if
no algorithm (efficient or not) cannot, with high probability, recover the true partition.
In this lecture we are interested in understanding when the true partition can be recovered, using an efficient algorithm.
3. The algorithm
Note that if we remove the constraint that in (4) then the optimal solution becomes
. Let us define
, meaning that
It is clear that the problem
has the same solution has (4). However, when the constraint is dropped,
is no longer an optimal solution. Intuitively, there is enough “
” contributions to discourage unbalanced partitions. In fact, (10) is the problem we’ll set ourselves to solve.
Unfortunately (10) is in general NP-hard (one can encode, for example, \texttt{Max-Cut} by picking the right ). We will relax it to an easier problem by a technique known as lifting. If we write
then we can formulate the objective value of (10) as
also, the condition can be translated in
. This means that (10) is equivalent to
The fact that for some
is equivalent to
and
(
is Positive semidefinite, meaning that it is symmetric and all it’s eigenvalues are non-negative). This means that (10) is equivalent to
(But wait, if it is equivalent then why isn’t this problem just as hard?) — It is just as hard. The idea is that now we relax the problem and remove one constraint, the rank constraint
Now, (13) is an optimization problem with a linear objective and convex feasibility set, moreover the feasibility set is a nice convex set, this type of problems are known as Semidefinite Programs and can be solved (up to arbitrary precision) in polynomial time.
(But wait, if we removed a constraint then why would the solution of this problem have that particular form?) — Indeed, it won’t in general. What we will show is that, for some values of and
, with high probability, the solution to (13) not only satisfies the rank constraint but it coincides with
where
corresponds to the true partition. After computing such
, recovering
from it is trivial (leading eigenvector).
Remark 1 There are other types of results that, instead of assuming stochastic input, assume worst-case input and analyze the performance of rounding procedures that graph solutions to the relaxation and “round them” to solution of the original problems. I recommend taking a look at a very nice relaxation for \texttt{Max-Cut} here.
4. The analysis
WLOG we can assume that , meaning that the true partition corresponds to the first
nodes on one side and the other
on the other. Nothing changes but makes it easier to think about.
4.1. Some preliminary definitions
Recall that the degree matrix of a graph
is a diagonal matrix where each diagonal coefficient
corresponds to the number of neighbours of vertex
and that
is the second smallest eigenvalue of a symmetric matrix
.
Definition 1 Let (resp.
) be the subgraph of
that includes the edges that link two nodes in the same community (resp. in different communities) and
the adjacency matrix of
. We denote by
(resp.
) the degree matrix of
(resp.
) and define the Stochastic Block Model Laplacian to be
4.2. Convex Duality
A standard technique to show that a candidate solution is the optimal one for a convex problem is to use convex duality.
The dual problem of (13) is
The objective value of the dual is always larger or equal to the primal. In fact, let be respectively a feasible point of (13) and (15), then:
Since and
then
which means that
as we wanted. Recall that we want to show that the optimal solution of (13). Then, if we find
diagonal, such that
and
then must be an optimal solution of (13). To ensure that
is the unique solution we just have to ensure that the nullspace of
only has dimension 1 (which corresponds to multiples of
). Essentially, if this is the case, then for any other possible solution
one could not satisfy complementary slackness.
This means that if we can find such that:
is diagonal
,
then is the unique optima of (13) and so recovery of the true partition is possible (with an efficient algorithm).
is known as the dual certificate, or dual witness.
4.3. Building the dual certificate
The idea to build is to construct it to satisfy properties (1) and (2) and try to show that it satisfies (3) and (4) using concentration. If indeed
then (2) becomes equivalent to
. This means that we need to construct
such that
. Since
we have
meaning that
This means that
It trivially follows (by construction) that
This means that
Lemma 2 If
then the relaxation recovers the true partition.
Note that is a random matrix and so, now, this is “only” an exercise in random matrix theory.
4.4. Matrix Concentration
And easy calculation gives
and ,
, and
is a matrix such with
blocks where the diagonal blocks have
and the non-diagonal have
. We can write this as
This means that
Since we can ignore what happens in the span of
and it is not hard to see that
This means that it is enough to show that
which is a large deviations inequality. ( denotes operator norm) I will not describe the details here but the idea is to write
as a sum of independent random matrices and use Matrix Bernestein in here. In fact, using that strategy one can show that, with high probability, 17 holds as long as
4.5. Comparison with phase transition
To compare (18) with (5) we note that the latter can be rewritten as
and so the relaxation achieves exact recovery almost at the threshold, essentially only suboptimal by a factor of .
1 thought on “Lecture on Semidefinite Relaxation for Community Detection”