Approximation Algorithms for Graph Homomorphism Problems Chaitanya Swamy

Approximation Algorithms for Graph Homomorphism Problems Chaitanya Swamy University of Waterloo Joint work with Michael Langberg and Yuval Rabani Open University Technion

Maximum Graph Homomorphism Given: graphs G = (VG, EG) and H = (VH, EH) G H find a mapping p : VG®VH that maximizes no. of edges of G mapped to edges of H Goal: Maximize |{(u, v)ÎEG : (p(u), p(v))ÎEH }| Value of mapping p

Maximum Graph Homomorphism Given: graphs G = (VG, EG) and H = (VH, EH) p G H find a mapping p : VG®VH that maximizes no. of edges of G mapped to edges of H Goal: Maximize |{(u, v)ÎEG : (p(u), p(v))ÎEH }| Value of mapping p

• H= : Max-Cut problem G

• H= : Max-Cut problem G Þ Problem is NP-hard, APX-hard even for fixed H • Optimization version of H-coloring: decide if there is a mapping p of value |EG| (such a p: homomorphism) e. g. , when H is a k-clique, H-coloring º k-coloring problem, maximum graph homomorphism (MGH) º Max-k-Cut H-coloring is NP-complete if H is not bipartite and does not contain a self-loop (Hell & Nesetril)

Related Work MGH problem appears to be new. • H-coloring: well studied problem; Hell & Nesetril proved that H-coloring is either in P or is NP-complete – restrictive/list H-coloring: various restrictions placed on p, e. g. , restrictions on {p(u)} for uÎVG, or p-1(i) for iÎVH – counting versions of these problems: Dyer & Greenhill proved a dichotomy theorem for counting # of Hcolorings – sampling a random H-coloring • Minimum cost homomorphism: find p that minimizes (cost of assigning labels to nodes) + (weights of images of EG); studied by Cohen et al. , Gutin et al. , Aggarwal et al. – if edge weights in H form a metric, this is the metric labeling problem (Kleinberg & Tardos)

Related Work (contd. ) • maximum common subgraph: given graphs G, H, find their largest common subgraph º essentially MGH where p is required to be one-one MGH can be reduced to this problem: – blow up each iÎVH to an independent set of size |VG| – replace each edge (i, j)ÎEH by complete bipartite graph H G

Related Work (contd. ) • maximum common subgraph: given graphs G, H, find their largest common subgraph º essentially MGH where p is required to be one-one MGH can be reduced to this problem: – blow up each iÎVH to an independent set of size |VG| – replace each edge (i, j)ÎEH by complete bipartite graph H G Kann: (B+1)-approx. when degrees in G, H are ≤ B.

A Trivial 0. 5 -approximation 1) Fix an edge (i, j) of H 2) Map each uÎVG to i or j randomly with probability ½. G H

A Trivial 0. 5 -approximation 1) Fix an edge (i, j) of H 2) Map each uÎVG to i or j randomly with probability ½. OPTMGH(G, H) G ≥ Max. Cut(G) H ≥ |EG|/2 Each edge of G is mapped to (i, j) with probability ½, Þ expected value of mapping = |EG|/2 Þ get 0. 5 -approximation algorithm (can derandomize)

More generally, for a subset N Í VH define its density r(N) = (2|E(N)|) / |N|2 Mapping each uÎVG randomly to a label in N maps r(N). |EG| edges of G in expectation Þ gives an r(N)-approximation algorithm e. g. , if H has a triangle, get a 2/3 -approximation if H has a k-clique, get a (1– 1/k)approximation In general, factor of 0. 5 might be the best possible!

Informal Statement of Result There is no (0. 5+e)-approximation algorithm for MGH, unless certain average-case instances of subgraph isomomorphism can be solved in polynomial time. Gn, p º distribution on n-vertex graphs where each edge is chosen independently with probability p Our average-case instances are related to Gn, p Main question: how hard is subgraph isomomorphism on a pair of random graphs GÎGn, p

The Roadmap Main Lemma: If H is triangle-free with k nodes, and GÎGn, p where p=c. ln(k)/n with n, c suitably large, then with high probability (over all G’s), OPT(G, H) ≤ (1+d)|EG|/2 factor 2 So, if G is a subgraph of H, OPT(G, H) = |EG| gapwhp. if G is not a subgraph of H, OPT(G, H) ≤ (1+d)|EG|/2 • A (0. 5+e)-approximation algorithm can be used to • • distinguish between these two cases Inapproximability result based on the assumption that this is hard when G, H are drawn from a suitable distribution on triangle-free graphs Formulate this precisely as a refutation problem

The Refutation Problem Let Dn, p = distribution on n-node D-free graphs obtained by taking GÎGproblem: randomly till nothat Ds given n, p, removing Refutation Find edges a poly-time algorithm remain and HÎD , where q >> p = c. ln(n)/n, GÎD n, p n, q (a) returns “yes” if GÍH, ½ (b) returns “no” with probability ≥ [With very high probability not aberefutation a subgraph of H. ] A (0. 5+e)-approx. algorithm. GAwill yields algorithm: • if GÍH, then A(G, H) ≥ (0. 5+e)|EG| • otherwise, let G be obtained by removing edges from G’ÎGn, p OPT(G, H) ≤ OPT(G’, H) ≤ (1+d)|EG’|/2 » (1+d)c. n ln(n)/4 3(n) n 1/2 |EG’| » c. n ln(n)/2 and. A(G, H) (# of D’s in G’) ≤ c≤ 3. (ln 1+4 d)|E ≤ OPT(G, H) G|/2 whp.

Refutation Problem (contd. ) • Feige initiated the use of average-case complexity to prove hardness results, where average-case hardness translates to hardness of a refutation problem • Can make refutation problem harder and more robust: require algorithm to say “yes” if G has a subgraph of size |EG|(1 -e) isomorphic to H • How hard is the refutation problem? Open. But, local analysis does not work – return “yes” iff all “small” subgraphs of G are subgraphs of H. Also can make G have W(ln(n)/lnln(n)) girth. • We set q >> p, to be “far” from graph isomorphism which is poly-time solvable for random graphs

Main Lemma and Proof Lemma: Let d ≤ 0. 5. If H is triangle-free with k nodes, GÎGn, p where p=c. ln(k)/n with n ≥ n 0(d), c ≥ c 0(d), then whp. (a) OPT(G, H) ≤ (1+d)c. n ln(k)/4, (b) |EG| ≥ (1– d)c. n ln(k)/2, so Proof: (a) Fix a mapping p. For a random GÎGn, p, (c) OPT(G, H) ≤ (1+4 d)|EG|/2 Value of p = V(p) = ∑(i, j)ÎE ∑u, vÎVG : p(u)=i, p(v)=j Xuv H E[V(p)] = p. ∑(i, j)ÎEH |p-1(i)| |p-1(j)|

Turan’s Theorem An n-node graph that is Kr+1 -free has at most (1 -1/r). n 2/2 edges. Corollary: Let H be a n-node graph that is Kr+1 free. Let w: VH®Z+ be a wt. function such that ∑i wi = n. Then, ∑(i, j)ÎEH wi. wj ≤ (1 -1/r). n 2/2 Proof: H 1 1 2 2 H’ Blow iÎVH to independent set of size wi to get H’ H’ is also Kr+1 -free – use Turan on H’

Main Lemma and Proof Lemma: Let d ≤ 0. 5. If H is triangle-free with k nodes, GÎGn, p where p=c. ln(k)/n with n ≥ n 0(d), c ≥ c 0(d), then whp. (a) OPT(G, H) ≤ (1+d)c. n ln(k)/4, (b) |EG| ≥ (1– d)c. n ln(k)/2, so Proof: (a) Fix a mapping p. For a random GÎGn, p, (c) OPT(G, H) ≤ (1+4 d)|EG|/2 Value of p = V(p) = ∑(i, j)ÎE ∑u, vÎVG : p(u)=i, p(v)=j Xuv H E[V(p)] = p. ∑(i, j)ÎEH |p-1(i)| |p-1(j)|

Main Lemma and Proof Lemma: Let d ≤ 0. 5. If H is triangle-free with k nodes, GÎGn, p where p=c. ln(k)/n with n ≥ n 0(d), c ≥ c 0(d), then whp. (a) OPT(G, H) ≤ (1+d)c. n ln(k)/4, (b) |EG| ≥ (1– d)c. n ln(k)/2, so Proof: (a) Fix a mapping p. For a random GÎGn, p, (c) OPT(G, H) ≤ (1+4 d)|EG|/2 Value of p = V(p) = ∑(i, j)ÎE ∑u, vÎVG : p(u)=i, p(v)=j Xuv H E[V(p)] = p. ∑(i, j)ÎEH |p-1(i)| |p-1(j)| ≤ p. n 2/4 (by Turan) V(p) is sum of independent random variables, so Pr[V(p) > (1+d)E[V(p)]] ≤ e–O(n ln(k)) kn total mappings, so by union bound, whp. V(p) ≤ (1+d)c. n ln(k)/4 for all p Þ OPT(G, H) ≤ (1+d)c. n ln(k)/4 whp.

(b) E[|EG|] = p. n(n– 1)/2 » c. n ln(k)/2 By Chernoff bounds, |EG| ≥ (1–d)c. n ln(k)/2 whp. (c) Therefore, OPT(G, H) ≤ (1+4 d)|EG|/2 Refutation problem: Find a poly-time algorithm that given GÎDn, p and HÎDn, q, where q >> p = c. ln(n)/n, (a) returns “yes” if GÍH, (b) returns “no” with probability ≥ ½ A (0. 5+e)-approx. algorithm yields an algorithm for the refutation problem

Other Results • Can get an 0. 5+W(1/|VH| ln(|VH|))-approximation using SDP – gives improvements for any fixed H • Prelabeled MGH: a partial labeling p’: U®VH is also given and output p has to be an extension of p’. Encodes the Multiway-Uncut problem: given G and terminal-set TÍVG, partition VG into |T| parts with terminal in each part, to maximize (# uncut edges) Here H is |T|-self loops, p’: T®V is a bijection

Open Questions • Hardness of refutation problem: is subgraph isomorphism solvable in polynomial time when GÎGn, p and HÎGn, q? • Dense instances: G has W(n 2) edges, H is arbitrary; can one get a PTAS? Can get a quasi-PTAS and a PTAS for Max-k-Cut and in general when H is vertex-transitive • Directed setting: improve upon trivial 0. 25 approx. Encodes Max-Acyclic-Subgraph (nothing better than 0. 5 known). • Prelabeled MGH: improve upon 1/3 approximation.

Thank You.