Algorithmic HighDimensional Geometry 2 Alex Andoni Microsoft Research

  • Slides: 40
Download presentation
Algorithmic High-Dimensional Geometry 2 Alex Andoni (Microsoft Research SVC)

Algorithmic High-Dimensional Geometry 2 Alex Andoni (Microsoft Research SVC)

The NNS prism High dimensional geometry NNS dimension reduction space partitions small dimension embedding

The NNS prism High dimensional geometry NNS dimension reduction space partitions small dimension embedding sketching …

Small Dimension

Small Dimension

“effectively” �

“effectively” �

Doubling dimension �

Doubling dimension �

NNS for small doubling dimension �

NNS for small doubling dimension �

Embeddings

Embeddings

General Theory: embeddings � Hamming distance Euclidean distance (ℓ 2) Edit distance between two

General Theory: embeddings � Hamming distance Euclidean distance (ℓ 2) Edit distance between two strings Earth-Mover (transportation) Distance Compute distance between two points Nearest Neighbor Search Diameter/Close-pair of set S Clustering, MST, etc

Embeddings: landscape �

Embeddings: landscape �

Earth-Mover Distance � Images courtesy of Kristen Grauman

Earth-Mover Distance � Images courtesy of Kristen Grauman

High level embedding � 00 02 00 11 12 01 01 22 20 00

High level embedding � 00 02 00 11 12 01 01 22 20 00 0002… 0011… 0100… 0000… 13 0100… 0011… 0000… 1100…

Main Approach � Idea: decompose EMD over [ ]2 into EMDs over smaller grids

Main Approach � Idea: decompose EMD over [ ]2 into EMDs over smaller grids � Recursively reduce to =O(1) ≈ +

EMD over small grid � Suppose � f(A) =3 has nine coordinates, counting #

EMD over small grid � Suppose � f(A) =3 has nine coordinates, counting # points in each joint � f(A)=(2, 1, 1, 0, 0, 0, 1, 0, 0) � f(B)=(1, 1, 0, 0, 2, 0, 0, 0, 1) � Gives O(1) distortion

Decomposition Lemma [I 07] � lower bound on cost k /k upper bound

Decomposition Lemma [I 07] � lower bound on cost k /k upper bound

Part 1: lower bound a randomly-shifted cut-grid G of side length k, we have:

Part 1: lower bound a randomly-shifted cut-grid G of side length k, we have: � For � EEMD (A, B) ≤ EEMDk(A 1, B 1) + EEMDk(A 2, B 2)+… + k*EEMD /k(AG, BG) a matching from the matchings on right-hand k side � For each a A, with a Ai, it is either: � Extract matched in EEMD(Ai, Bi) to some b Bi � or a AiBi, and it is matched in EEMD(AG, BG) to some b Bj � � Match � cost in 2 nd case: Move a to center ( ) � paid � by EEMD(Ai, Bi) Move from cell i to cell j /k

Parts 2 & 3: upper bound �

Parts 2 & 3: upper bound �

Part 2: Cost? � k dx

Part 2: Cost? � k dx

Wrap-up of EMD Embedding �

Wrap-up of EMD Embedding �

Metric Upper bound edit( banana , ananas ) = 2 Ulam (edit distance between

Metric Upper bound edit( banana , ananas ) = 2 Ulam (edit distance between permutations) Block edit distance edit(1234567, 7123456) = 2

Metric Upper bound Lower bounds Ulam (edit distance between permutations) Block edit distance 4/3

Metric Upper bound Lower bounds Ulam (edit distance between permutations) Block edit distance 4/3 [Cor 03]

Non-embeddability proofs �

Non-embeddability proofs �

Other good host spaces? � What is “good”: , etc �is algorithmically tractable �is

Other good host spaces? � What is “good”: , etc �is algorithmically tractable �is rich (can embed into it) ? ? ? sq-ℓ 2=real space with distance: ||x-y||22 Metric sq-ℓ 2, hosts with very good LSH (lower bounds via communication complexity) [AK’ 07] Ulam (edit distance between permutations) [AK’ 07] [AIK’ 08]

Other good host spaces? � What is “good”: , etc �algorithmically tractable �rich (can

Other good host spaces? � What is “good”: , etc �algorithmically tractable �rich (can embed into it) �But: combination sometimes works!

α Meet our new host [A-Indyk-Krauthgamer’ 09] � Iterated product space … … …

α Meet our new host [A-Indyk-Krauthgamer’ 09] � Iterated product space … … … d 1 d 1 β d∞, 1 d 22, ∞, 1 γ 27

[Indyk’ 02, A-Indyk-Krauthgamer’ 09] Algorithmically Rich tractable � edit distance between permutations ED(1234567, 7123456)

[Indyk’ 02, A-Indyk-Krauthgamer’ 09] Algorithmically Rich tractable � edit distance between permutations ED(1234567, 7123456) = 2

Sketching

Sketching

x Computational view y � F F(x) F(y)

x Computational view y � F F(x) F(y)

Why? � 1) Beyond embeddings: � can � 2) more do if “embed” into

Why? � 1) Beyond embeddings: � can � 2) more do if “embed” into computational space A waypoint to get embeddings: � computational � 3) perspective can give actual embeddings Connection to informational/computational notions � communication 31 complexity

Beyond Embeddings: �

Beyond Embeddings: �

Waypoint to get embeddings � sum of squares (sq-ℓ 2) edit(X, Y) max (ℓ∞)

Waypoint to get embeddings � sum of squares (sq-ℓ 2) edit(X, Y) max (ℓ∞) sum (ℓ 1) X Y

Ulam: algorithmic characterization [Ailon-Chazelle-Commandur-Lu’ 04, Gopalan-Jayram. Krauthgamer-Kumar’ 07, A-Indyk-Krauthgamer’ 09] �Lemma: Ulam(x, y) approximately

Ulam: algorithmic characterization [Ailon-Chazelle-Commandur-Lu’ 04, Gopalan-Jayram. Krauthgamer-Kumar’ 07, A-Indyk-Krauthgamer’ 09] �Lemma: Ulam(x, y) approximately equals the number of “faulty” characters a satisfying: E. g. , a=5; K=4 X[5; 4] x= 123456789 y= 123467895 exists K≥ 1 (prefix-length) s. t. � the set of K characters preceding a in x Y[5; 4] differs much from the set of K characters preceding a in y � there

Connection to communication complexity � Enter the world of Alice and Bob… shared randomness

Connection to communication complexity � Enter the world of Alice and Bob… shared randomness … CC bits Referee Communication complexity model: n Two-party protocol n Shared randomness n Promise (gap) version n c = approximation ratio n CC = min. # bits to decide (for 90% success) Sketching model: n Referee decides based on sketch(x), sketch(y) n SK = min. sketch size to decide Fact: SK ≥ CC 35

Communication Complexity � 36

Communication Complexity � 36

High dimensional geometry ? ? ?

High dimensional geometry ? ? ?

Closest Pair � p 1 p 2 p 1 +p 2 =M pn pn

Closest Pair � p 1 p 2 p 1 +p 2 =M pn pn 38 Find max entry of MMt using subcubic MM algorithms

What I didn’t talk about: �

What I didn’t talk about: �

High dimensional geometry via NNS prism High dimensional geometry NNS dimension reduction space partitions

High dimensional geometry via NNS prism High dimensional geometry NNS dimension reduction space partitions small dimension embedding sketching +++