Lower Bounds for Property Testing Luca Trevisan U
Lower Bounds for Property Testing Luca Trevisan U. C. Berkeley
Sub-linear Time Algorithms • This talk: – algorithms that run in less than linear time (cannot read entire input). – No pre-preprocessing. (Unstructured data) – Must be probabilistic and approximate • For optimization problems: – Compute numerical apx of optimum cost (and implicit representation of apx solution? ) • For decision problems: – What is approximation for decision problems?
(Graph) Property Testing a property P with accuracy e in adjacency matrix representation: • Given graph G that has property P, accept with probability >3/4 • Given graph G that is e-far from property P accept with probability <1/4 e-far = must change e–fraction of adjacency matrix to get property P (add/remove > en 2 edges)
Example [GGR, AK] Testing bipartiteness of a given graph G • Pick (1/e)polylog(1/e) vertices, and check if they induce a bipartite graph; if so accept otherwise reject • If G is bipartite then alg accepts with prob 1 • If G is e-far from bipartite, then whp algorithm discovers an odd cycle (non-trivial to prove) • Running time: O ((1/e 2)polylog(1/e))
Paleontologist’s approach
Paleontologist’s approach
Paleontologist’s approach
Paleontologist’s approach
Lower Bounds [BT] • Alon-Krivelevich’s algorithm – has one-sided error, is non-adaptive and has running time (1/e 2)polylog(1/e) • Lower Bounds: – W(1/e 2) for non-adaptive algorithms – W(1/e 1. 5) for adaptive algorithms – Both results hold even for two-sided error
Two Distributions • Gfar: every edge exists with probability e – whp it is e/3 -far from bipartite • Gbip: pick a random partition, then every edge that crosses the partition exists with probability 2 e • Indistinguishable by non-adaptive algorithms making o(1/e 2) queries • Indistinguishable by adaptive algorithms making o(1/e 1. 5) queries
Bounded Degree Graphs Testing a property P with accuracy e in adjacency lists representation: • Given graph G that has property P, accept with probability >3/4 • Given graph G that is e-far from property P accept with probability <1/4 e-far = must change e–fraction of adjacency lists entries to get property P (add/remove > edn edges)
Bipartiteness [GR] Testing bipartiteness • Repeat polylog n times: – Start at random point, and pick sqrt(n) random walks of length polylog n, if two of them combine to form an odd cycle reject, otherwise accept • Analysis: – in a graph where you need to remove constant fraction of edges to make it bipartite, algorithm finds odd cycle
Matching Lower Bound [GR] • Define two distributions of graphs: – Gfar: a random hamiltonian circuit, plus a random matching (whp 1/100 -far from bipartite) – Gbip: a random hamiltonian circuit, plus a random matching conditioned on making the graph bipartite • Gfar and Gbip are indistinguishable by algorithms of query complexity o(sqrt(n)).
Sublinear Time Approximation Problems restricted to dense instances: • Max CUT and other graph problems can be approximated within (1+e) in graphs with at least an 2 edges in time 2 poly(1/ea) [GGR] • Max 3 SAT can be approximated within (1+e) in instances with at least an 3 clauses in time 2 poly(1/ea) and similar results for other satisfiability problems [AFKK]
Sub-linear Time Approximation Problems on bounded-degree instances • Minimum spanning tree – given a connected weighted graph of degree d with weights in range {1, …, w}, can approximate MST weight within (1+e) in time about O(dw/e 2) [Chazelle, Rubinfeld, T]
General Goals • When looking for polynomial-time algorithms: – Several algorithmic techniques of general applicability – A general technique to “prove” impossibility (NP-completeness) • For sublinear-time algorithms: – General algorithmic techniques? – Impossibility results?
Dense Graphs Some general algorithmic results • All problems with a certain logical representation testable in time dependent only on e [AFKS] • All regular languages testable in time dependent only on e [AFNS] • Only one-sided error algorithm [GT] (pick a random subgraph and check it is consistent with the property) – Adaptivity does not help – “Only one algorithm” result also for 2 -sided error. Few lower bounds
Bounded-Degree Graphs Fewer and less general algorithms. Some results are different from dense case • adaptivity helps – No property testable with o(sqrt(n)) queries nonadaptive queries. Several problems testable with O(1) adaptive queries. • 2 -sided better than 1 -sided for natural monotone properties – Property “being a forest” has no o(sqrt(n)) one-sided algorithm, but has O(1) two-sided algorithm Few lower bounds
Testing 3 -Colorability • Easy in adjacency matrix representation • NP-hard in adjacency list representation • Only for small enough e – Can find 3 -coloring good for 80% of the edges in a 3 colorable graph using SDP – NP-hard to find 3 -coloring good for 98% (? ) fraction of edges • Implies non-tight, and conditional, lower bound for query complexity
Other problems • The query complexity of following problems is equivalent to query complexity of testing 3 col – Testing satisfiability of 3 SAT instance • Every variable occurs in O(1) clauses, “adjacency list” representation – Approximating max cut, vertex cover, independent set, . . . , in bounded-degree graphs – Approximating Max SAT, Max 2 SAT, . . . • Lower bound of sqrt(n) for all problems implied by [GR] lower bound for testing bipartiteness
Some Results from [BOT] • For one-sided error algorithms: – W(n) query complexity to distinguish 3 -colorable graphs from graphs that are (1/3 – d)-far – Lower bound applies to testing problems that are solvable in polynomial time • For two-sided error algorithms: – For some e, W(n) query complexity to distinguish 3 -colorable graphs from graphs that are e-far.
Additional Results • Unconditionally, algorithms running in time o(n) cannot: – Approximate Max 3 SAT better than 7/8 – Approximate Max Cut in bounded-degree graphs better than 16/17 –. . . • Hastad’ 97 proved above problems are NP-hard
The 3 -Coloring Lower Bound • Consider first one-sided error algorithms • It’s enough to find a graph G that is (1/3 – d)-far from 3 -colorable, but every subgraph of size < an is 3 -colorable – (for every d there is an a such that. . . ) • Then an algorithm of query complexity < an either accepts G (which is wrong) or rejects some 3 -colorable graph (which means the algorithm has not one-sided error)
The Graph • Pick a graph of degree O(1/d 2) at random (pick so many random matchings) • Then it is (1/3 – d)-far whp • But, for some a, whp, every subgraph induced by k < an vertices contains <1. 5 k edges • In a minimal non-3 -colorable graph, every vertex has degree at least 3 • Every subgraph induced by < an vertices is 3 colorable [Erdos]
Explicit Construction Can the previous construction be derandomized? • For constants d, e, a, and for every suff large n, we can explicitly construct a graph – on n vertices, with max degree d, – e-far from 3 -colorable, – every subset of an vertices induces a 3 -colorable subgraph.
Explicit Construction • We construct a 3 SAT formula such that for constants k, e’, a’ – Every variable occurs k times – No assignment satisfies more than 1 -e’ fraction of clauses – Every a’ fraction of clauses is satisfiable – Then we use (slightly new) reduction from 3 SAT to 3 Coloring
The Formula • Fix a degree-d expander graph G=(V, E) such that for every cut (S, V-S) at least min{|S|, |V-S|} edges cross the cut (enough d=14) • Have two variables xuv and xvu for each egde (u, v) • For every vertex v have the (3 SAT equivalent of) the constraint – Su xuv = 1 + Sw xvw
Structure of the Analysis • Impossible to satisfy more than a fraction 1/(d+1) of the constraints • Can always satisfy half of the constraint – define an auxiliary network – show that the auxiliary network has no small cut because of expansion – then there is a large flow – use large flow to find assignment for subset of constraint
Flow Argument • Want to satisfy constraints corresponding to vertices in C, with |C| < |V|/2 V-C s C t Construct flow network with new source s, sink t obtained by collapsing V -C, and vertices in C
Flow Argument |A| edges A t • Every cut has size at least |C| • There is a 0/1 flow of cost at least |C| s |C-A| edges C-A • Interpreted as an assignment, satisfies all constraints in C
Two-Sided Error Algorithms Need to define two distributions of graphs Gcol and Gfar such that: • Graphs in Gcol are (almost) always 3 -colorable • Graphs in Gfar are (almost) always far from 3 -colorable • To an algorithm of bounded query complexity, Gcol and Gfar look (almost) the same
Main Step • Define two distributions Dsat and Dfar of instances of E 3 LIN-2 (systems over GF(2) with 3 variables per equation) – Systems in Dsat are always satisfiable – Systems in Dfar are (almost) always (1/2 -d)-far from satisfiable – To an algorithm of bounded query complexity, Dsat and Dfar look the same • We get Gcol and Gfar using reduction from approximate E 3 LIN-2 to approximate 3 -coloring
E 3 LIN-2 X 1 + X 3 + X 10 = 0 mod 2 X 2 + X 3 + X 4 = 1 mod 2 X 1 + X 2 + X 9 = 0 mod 2. . .
Main Building Block • We show that for every c there is a such that there exists a left-hand side with – n variables, cn equations, 3 variables per equations, every variable occurs in 3 c equations – every an equations are linearly independent • Pick the left-hand side at random – repeat 3 c times: pick at random a set of n/3 disjoint triples of variables • Explicit construction?
Distributions • The left-hand side is always as before • In Dsat, we pick a random assignment to the variables, and set right-hand side consistently – always satisfiable • In Dfar, we pick the right-hand side uniformly at random – With high probability, (1/2 – O(1/sqrt c))-far
Indistinguishability • Two distributions differ only in right-hand side • In Dfar uniformly distributed • In Dsat, an-wise independent – Linear independence implies statistical independence • Look the same to algorithm that sees less than an equations
Conclusion of the Argument • No algorithm of “query complexity” o(n) can distinguish satisfiable instances of E 3 LIN-2 from instances that are (1/2 -d)-far from satisfiable • For some e, no algorithm of query complexity o(n) can distinguish 3 -colorable graphs from graphs that e–far from 3 -col. • No algorithm of query complexity o(n) can approximate Max 3 SAT better than 7/8. . .
Open Questions • Show that distinguishing 3 -colorable graphs from (1/3 -d)-far graphs requires query complexity W(n) – we can only prove it for one-sided error • Show that approximating Max SAT better than ¾ and Max CUT bettter than ½ requires query complexity W(n) – we only know W(sqrt(n)) [implicit in GR] – would “explain” why we need SDP
Some more open questions • In adjacency matrix representation, most interesting problems solvable in constant (in e) time • For some problems (eg testing triangle-freeness) analysis uses Szemeredy’s regularity lemma, and constant is hyper-exponential in e • Lower bound (1/e)log 1/ e and only and for onesided error • Alternative analysis / stronger lower bounds?
- Slides: 39