Later today I am giving a lecture in IMPA, Brazil . It is part of a special course on Randomness, Matrices and High Dimensional Problems (given together with Roberto Oliveira). Below are the notes for today’s lecture. See here for a printer friendly version. The content is based on a result in here.

**1. The problem we will focus on**

Let be an even positive integer. Given two sets of nodes consider the following random graph : For each pair of nodes, is an edge of with probability if and are in the same set and if they are in different sets. Each edge is drawn independently and .

(Think nodes as fans of Fluminense and Flamengo and edges representing friendships, in this model, fans of the same club are more likely to be friends)

For which values of and can we recover the partition, with an efficient algorithm, from only looking at the graph (with high probability)?

**2. The interesting regime**

If then it is easy to see that each cluster will not be connected (with high probability) and so recovery is not possible. In fact, the interesting regime is when

for constants .

Let be the adjacency matrix of , meaning that

Let with represent a partition (note there is an ambiguity in the sense that and represent the same partition). Then, if we did not worry about efficiency then our guess (which corresponds to the Maximum Likelihood Estimator) would be the solution of

as this maximizes the number of edges within the clusters minus the number of edges across the clusters (the clusters being each set of the partition given by ).

In fact, one can show (but will not be the focus of this lecture, see here for a proof) that if

then, with high probability, (4) recovers the true partition. Moreover, if

no algorithm (efficient or not) cannot, with high probability, recover the true partition.

In this lecture we are interested in understanding when the true partition can be recovered, using an efficient algorithm.

**3. The algorithm**

Note that if we remove the constraint that in (4) then the optimal solution becomes . Let us define , meaning that

It is clear that the problem

has the same solution has (4). However, when the constraint is dropped,

is no longer an optimal solution. Intuitively, there is enough “” contributions to discourage unbalanced partitions. In fact, (10) is the problem we’ll set ourselves to solve.

Unfortunately (10) is in general NP-hard (one can encode, for example, \texttt{Max-Cut} by picking the right ). We will relax it to an easier problem by a technique known as lifting. If we write then we can formulate the objective value of (10) as

also, the condition can be translated in . This means that (10) is equivalent to

The fact that for some is equivalent to and ( is Positive semidefinite, meaning that it is symmetric and all it’s eigenvalues are non-negative). This means that (10) is equivalent to

(But wait, if it is equivalent then why isn’t this problem just as hard?) — It is just as hard. The idea is that now we relax the problem and remove one constraint, the rank constraint

Now, (13) is an optimization problem with a linear objective and convex feasibility set, moreover the feasibility set is a nice convex set, this type of problems are known as Semidefinite Programs and can be solved (up to arbitrary precision) in polynomial time.

(But wait, if we removed a constraint then why would the solution of this problem have that particular form?) — Indeed, it won’t in general. What we will show is that, for some values of and , with high probability, the solution to (13) not only satisfies the rank constraint but it coincides with where corresponds to the true partition. After computing such , recovering from it is trivial (leading eigenvector).

Remark 1 There are other types of results that, instead of assuming stochastic input, assume worst-case input and analyze the performance of rounding procedures that graph solutions to the relaxation and “round them” to solution of the original problems. I recommend taking a look at a very nice relaxation for \texttt{Max-Cut} here.

**4. The analysis**

WLOG we can assume that , meaning that the true partition corresponds to the first nodes on one side and the other on the other. Nothing changes but makes it easier to think about.

**4.1. Some preliminary definition**s

Recall that the degree matrix of a graph is a diagonal matrix where each diagonal coefficient corresponds to the number of neighbours of vertex and that is the second smallest eigenvalue of a symmetric matrix .

Definition 1 Let (resp. ) be the subgraph of that includes the edges that link two nodes in the same community (resp. in different communities) and the adjacency matrix of . We denote by (resp. ) the degree matrix of (resp. ) and define the Stochastic Block Model Laplacian to be

**4.2. Convex Duality**

A standard technique to show that a candidate solution is the optimal one for a convex problem is to use convex duality.

The dual problem of (13) is

The objective value of the dual is always larger or equal to the primal. In fact, let be respectively a feasible point of (13) and (15), then:

Since and then which means that

as we wanted. Recall that we want to show that the optimal solution of (13). Then, if we find diagonal, such that and

then must be an optimal solution of (13). To ensure that is the unique solution we just have to ensure that the nullspace of only has dimension 1 (which corresponds to multiples of ). Essentially, if this is the case, then for any other possible solution one could not satisfy complementary slackness.

This means that if we can find such that:

is diagonal

,

then is the unique optima of (13) and so recovery of the true partition is possible (with an efficient algorithm). is known as the dual certificate, or dual witness.

**4.3. Building the dual certificate**

The idea to build is to construct it to satisfy properties (1) and (2) and try to show that it satisfies (3) and (4) using concentration. If indeed then (2) becomes equivalent to . This means that we need to construct such that . Since we have

meaning that

This means that

It trivially follows (by construction) that

This means that

Lemma 2 If

then the relaxation recovers the true partition.

Note that is a random matrix and so, now, this is “only” an exercise in random matrix theory.

**4.4. Matrix Concentration**

And easy calculation gives

and , , and is a matrix such with blocks where the diagonal blocks have and the non-diagonal have . We can write this as

This means that

Since we can ignore what happens in the span of and it is not hard to see that

This means that it is enough to show that

which is a large deviations inequality. ( denotes operator norm) I will not describe the details here but the idea is to write as a sum of independent random matrices and use Matrix Bernestein in here. In fact, using that strategy one can show that, with high probability, 17 holds as long as

**4.5. Comparison with phase transition**

To compare (18) with (5) we note that the latter can be rewritten as

and so the relaxation achieves exact recovery almost at the threshold, essentially only suboptimal by a factor of .

## One thought on “Lecture on Semidefinite Relaxation for Community Detection”