# 18.S096: Graphs, Diffusion Maps, and Semi-supervised Learning

The second set of Lecture notes for my course is now available here.  This week’s notes are about graphs, embedding of graphs in Euclidean space (focusing in Diffusion Maps) and relations between behavior of a graph based semi-supervised learning method and Sobolev Embedding Theorem. Given the nature of these topics, these notes have a lot more images than normal, take a look!

The notes also describe three open problems that I would like to document here, there is a much more detailed description of the problems on the notes, including description of partial progress.

Open Problem 2.1.

Given a positive integer $r$, the Ramsey number $R(r)$ denotes the smallest integer $n$ such that any graph on $n$ nodes has either a clique or an independent set of size $r$. The following are open questions:

1. What is $R(5)$? It is known that $43 \leq R(5) \leq 49$.
2. What are the asymptotics of $R(r)$? The current best bounds are subexponential improvements of $2^{r/2} \lesssim R(r) \gtrsim 2^{2r}$.
3.  Construct a family of graphs $G = (V,E)$ with increasing number of nodes for which there exists $\epsilon>0$ such that $|v| \lesssim (1+\epsilon)^r$.

The current world’s record for deterministic constructions of Ramsey graphs produces graphs for which the the clique or independence number is $\approx 2^{(\log\log|V|)^{O(1)}}$ (see here and here), while we know (by the bounds above) that there exist graphs for which this number is $\approx \log|V|$.

Open Problem 2.2. (Erdos-Hajnal conjecture)

For any finite graph $H$, there exists a constant $\delta_H > 0$ such that any graph on $n$ nodes that does not contain $H$ as a subgraph (is an $H$-free graph) must have

$r(G) \gtrsim n^{\delta_H}$,

where $r(G)$ is the maximum among the clique and independence number of $G$.

I think this is one of the most fascinating conjectures in graph theory. It is known (see this survey) that $r(G)\gtrsim \exp\left( c_H \sqrt{\log n} \right)$, for some constant $c_H >0$. Note that this lower bound already shows that $H$-free graphs need to have considerably larger $r(G)$. This is an amazing local to global effect, where imposing a constraint on small groups of vertices are connected (being $H$-free is a local property) creates extremely large cliques or independence sets — much larger than $\log(n)$ as in random Erd\H{o}s-Reny\’i graphs (see, for example, the notes).

Open Problem 2.3. (The planted clique problem)

Let $G$ be a random graph constructed by taking a $G\left( n, \frac12\right)$ and planting a clique of size $\omega$:

1. Is there a polynomial time algorithm that is able to find the largest clique of $G$ (with high probability) for $\omega \ll \sqrt{n}$? For example, for $\omega \approx \frac{\sqrt{n}}{\log n}$?
2. Is there a polynomial time algorithm that is able to distinguish, with high probability, $G$ from a draw of $G\left( n, \frac12\right)$ for $\omega \ll \sqrt{n}$? For example, for $\omega \approx \frac{\sqrt{n}}{\log n}$?
3. Is there a quasi-linear time algorithm able to find the largest clique of $G$ (with high probability) for $\omega\leq \left(\frac1{\sqrt{e}}-\epsilon \right)\sqrt{n}$, for some $\epsilon>0$? (This question was posed here)

Questions 1 and 2 are central questions in average case complexity. In fact, the hypothesis that finding planted cliques for $\omega \ll \sqrt{n}$ is hard is used as the basis of several cryptographic protocols and average case hardness results, see the notes for some references and a brief explanation of why the problem is easy if $\omega \approx \sqrt{n}$.