Cheeger Inequality and Spectral Clustering

Today’s post will be concerned with the clustering problem. Suppose one has a data set and wants to split the set into two “meaningful” clusters. We will show how the eigenvectors of the graph Laplacian can provide such a clustering, giving an efficient method for clustering. The result that guarantees the performance of this method is inspired in an isoperimetric inequality proved in the late 60’s. This post is tightly related to my previous post about Spectral Clustering and Diffusion Maps.

Diffusion Maps and Clustering on Graphs

It has been a while since I posted here (not counting the announcement I did a few days ago). I have decided to revive the blog from the ashes and start posting with compliance to the Boris law of Blogging (the time elapsed between any two consecutive posts should be never shorter than 1 week nor longer than 2 weeks).

Now the math, this is based on part of a talk I gave last Tuesday on the PACM Graduate Student Seminar.

Suppose we are given a data set we want to understand (I’ll be more precise later). Let’s say we have a notion of affinity between two data points. One popular way of describing this type of data set is through a weighted graph ${G = ( V , E , W)}$ where each vertex ${v\in V}$ represents a data point and, for each edge ${(i,j)\in E}$, the weight ${w_{i,j} = W[i,j] }$ represents the affinity between node i and node j. The matrix W is often called the weighted adjacency matrix of the graph. We make the reasonable assumption of symmetry in the weights, ${w_{ij} = w_{ji}}$.