# Cheeger Inequality and Spectral Clustering

Today’s post will be concerned with the clustering problem. Suppose one has a data set and wants to split the set into two “meaningful” clusters. We will show how the eigenvectors of the graph Laplacian can provide such a clustering, giving an efficient method for clustering. The result that guarantees the performance of this method is inspired in an isoperimetric inequality proved in the late 60’s. This post is tightly related to my previous post about Spectral Clustering and Diffusion Maps.

Suppose we are given a data set we want to understand (I’ll be more precise later). Let’s say we have a notion of affinity between two data points. One popular way of describing this type of data set is through a weighted graph ${G = ( V , E , W)}$ where each vertex ${v\in V}$ represents a data point and, for each edge ${(i,j)\in E}$, the weight ${w_{i,j} = W[i,j] }$ represents the affinity between node i and node j. The matrix W is often called the weighted adjacency matrix of the graph. We make the reasonable assumption of symmetry in the weights, ${w_{ij} = w_{ji}}$.