Nearest Neighbor Search in highdimensional spaces Alexandr Andoni

Nearest Neighbor Search in high-dimensional spaces Alexandr Andoni (Princeton/CCI → MSR SVC) Barriers II August 30, 2010

Nearest Neighbor Search (NNS) l Preprocess: a set D of points in Rd l Query: given a new point q, report a point p D with the smallest distance to q p q

Motivation l Generic setup: ¡ Points model objects (e. g. images) ¡ Distance models (dis)similarity measure l Application areas: ¡ machine learning, data mining, speech recognition, image/video/music clustering, bioinformatics, etc… l Distance can be: ¡ Euclidean, Hamming, ℓ∞, edit distance, Ulam, Earth-mover distance, etc… l Primitive for other problems: ¡ find the closest pair in a set D, MST, clustering… p q

Plan for today 1. NNS for basic distances 2. NNS for advanced distances: embeddings 3. NNS via product spaces

2 D case l Compute Voronoi diagram l Given query q, perform point location l Performance: ¡Space: O(n) ¡Query time: O(log n)

High-dimensional case l All exact algorithms degrade rapidly with the dimension d Algorithm Query time Space Full indexing O(d*log n) n. O(d) (Voronoi diagram size) No indexing – O(dn) linear scan l When d is high, state-of-the-art is unsatisfactory: ¡Even in practice, query time tends to be linear in n

Approximate NNS e t a m i x o r p p a - r-near neighbor: given a new cl point q, report a point p D s. t. ||p-q||≤rcr as long as there exists a point at distance ≤r r cr q p

Approximation Algorithms for NNS l. A vast literature: ¡With exp(d) space or Ω(n) time: [Arya -Mount-et al], [Kleinberg’ 97], [Har-Peled’ 02], … ¡With poly(n) space and o(n) time: [Kushilevitz-Ostrovsky-Rabani’ 98], [Indyk-Motwani’ 98], [Indyk’ 98, ‘ 01], [Gionis-Indyk-Motwani’ 99], [Charikar’ 02], [Datar-Immorlica-Indyk-Mirrokni’ 04], [Chakrabarti. Regev’ 04], [Panigrahy’ 06], [Ailon-Chazelle’ 06], [AIndyk’ 06]…

The landscape: algorithms Space Time Comment 2 Space: poly(n). n 4/ε +nd O(d*log n) c=1+ε Query: logarithmic Reference [KOR’ 98, IM’ 98] Space: small poly n 1+ρ+nd dnρ (close to linear). Query: poly (sublinear). ρ≈1/c Space: near-linear. nd*logn dnρ Query: poly (sublinear). ρ=2. 09/c [Ind’ 01, Pan’ 06] ρ=O(1/c 2) [AI’ 06] [IM’ 98, Cha’ 02, DIIM’ 04] ρ=1/c 2+o(1) [AI’ 06]

Locality-Sensitive Hashing [Indyk-Motwani’ 98] q l Random hash function g: Rd Z s. t. for any points p, q: p ¡If ||p-q|| ≤ r, then Pr[g(p)=g(q)] is “high” “not-so-small” ¡If ||p-q|| >cr, then Pr[g(p)=g(q)] is “small” l Use several hash tables: nρ, where ρ s. t. Pr[g(p)=g(q)] 1 P 2 ||p-q|| r cr

Example of hash functions: grids [Datar-Immorlica-Indyk-Mirrokni’ 04] l Pick a regular grid: ¡ Shift and rotate randomly l Hash function: ¡ g(p) = index of the cell of p l Gives ρ ≈ 1/c p

State-of-the-art LSH [A-Indyk’ 06] l Regular grid → grid of balls p ¡p can hit empty space, so take more such grids until p is in a ball l Need (too) many grids of balls ¡Start by reducing dimension to t l Analysis gives l Choice of reduced dimension t? 2 D ¡Tradeoff between l# hash tables, n , and l. Time to hash, t. O(t) ¡Total query time: dn 1/c 2+o(1) p Rt

p Proof idea l Claim: , where ¡ P(r)=probability of collision when ||p-q||=r l Intuitive proof: ¡ Let’s ignore effects of reducing dimension ¡ P(r) = intersection / union ¡ P(r)≈random point u beyond the dashed line ¡ The x-coordinate of u has a nearly Gaussian distribution → P(r) exp(-A·r 2) qq r p P(r) u x

The landscape: lower bounds Space Time Comment 2 Space: poly(n). n 4/ε +nd O(d*log n) c=1+ε Query: logarithmic 2 no(1/ε ) ω(1) memory lookups Space: small poly n 1+ρ+nd dnρ (close to linear). Query: poly (sublinear). n 1+o(1/c 2) ρ≈1/c [KOR’ 98, IM’ 98] [AIP’ 06] [IM’ 98, Cha’ 02, DIIM’ 04] ρ=1/c 2+o(1) [AI’ 06] ρ≥ 1/c 2 ω(1) memory lookups Space: near-linear. nd*logn dnρ Query: poly (sublinear). Reference [MNP’ 06, OWZ’ 10] [PTW’ 08, PTW’ 10] ρ=2. 09/c [Ind’ 01, Pan’ 06] ρ=O(1/c 2) [AI’ 06]

Challenge 1: l Design space partitioning of Rt that is ¡efficient: point location in poly(t) time ¡qualitative: regions are “sphere-like” [Prob. needle of length 1 is cut]c ≥ [Prob needle of length c is cut] 2

NNS for ℓ∞ distance [Indyk’ 98] l Thm: for ρ>0, NNS for ℓ∞d with ¡ O(d * log n) query time ¡ n 1+ρ space ¡ O(lg 1+ρ lg d) approximation l The approach: ¡ A deterministic decision tree l Similar to kd-trees q 1<5 ? Yes ¡ Each node of DT is “qi < t” ¡ One difference: algorithms goes q 2<4 ? down the tree once Yes No (while tracking the list of possible neighbors) q 1<3 ? l [ACP’ 08]: optimal for decision trees! No q 2<3 ?

Plan for today 1. NNS for basic distances 2. NNS for advanced distances: embeddings 3. NNS via product spaces

What do we have? l Classical ℓp distances: ¡Hamming, Euclidean, ℓ∞ How about other distances, like edit distance? Space Time Comment Reference ρ Hamminged(x, y) (ℓ 1) n 1+ρ = +nd dn number [IM’ 98, Cha’ 02, DIIM’ 04] ρ=1/c of substitutions/insertions/ [MNP 06, OZW 10, PTW’ 08’ 10] deletions to transform ρ≥ 1/c string x into y Euclidean (ℓ 2) ρ≈1/c 2 ρ≥ 1/c 2 [AI’ 06] [MNP 06, OZW 10, PTW’ 08’ 10] ℓ∞ n 1+ρ O(d log n) c≈logρlog d [I’ 98] [ACP’ 08] optimal for decision trees

NNS via Embeddings l An embedding of M into a host metric (H, d. H) is a map f : M→H ¡has distortion A ≥ 1 if x, y M d. M(x, y) ≤ d. H(f(x), f(y)) ≤ A*d. M(x, y) l Why? ¡If H is Euclidean space, then obtain NNS for the original space M ! l Popular host: H = ℓ 1 f f

Embeddings of various metrics l Embeddings into ℓ 1 Metric Upper bound Edit distance over {0, 1}d 2 O (√log d) [OR 05] Ulam (edit distance between Challenge 2: permutations) O(log d) Block edit distance O (log d) [CK 06] Improve the distortion embedding [MS 00, of CM 07] distance into ℓ 1 s) Earth-moveredit distance O(log (s-sized sets in 2 D plane) Earth-mover distance (s-sized sets in {0, 1}d) [Cha 02, IT 03] O(log s*log d) [AIK 08]

OK, but where’s the barrier?

A barrier: ℓ 1 non-embeddability l Embeddings into ℓ 1 Metric Upper bound Lower bound Edit distance over {0, 1}d 2 O (√log d) Ω(log d) [OR 05] Ulam (edit distance between permutations) O(log d) Block edit distance O (log d) [KN 05, KR 06] Ω (log d) [CK 06] [AK 07] 4/3 [MS 00, CM 07] Earth-mover distance (s-sized sets in 2 D plane) O(log s) Earth-mover distance (s-sized sets in {0, 1}d) O(log s*log d) [Cor 03] Ω(log 1/2 s) [Cha 02, IT 03] [AIK 08] [NS 07] Ω(log s) [KN 05]

Other good host spaces? l What is “good”: (ℓ 2)2, etc ℓ∞ ¡is algorithmically tractable ¡is rich (can embed into it) (ℓ 2)2=real space with distance: ||x-y||22 Metric Edit distance over {0, 1}d Ulam (edit distance (ℓ 2)2, host with v. good LSH Lower bound into ℓ 1 (sketching l. b. via Ω(log d) communication complexity) [AK’ 07] [KN 05] [AIK’ 08] Ω (log d) between orderings) Earth-mover distance (s-sized sets in {0, 1}d) [KN 05, KR 06] Ω(log s)

Plan for today 1. NNS for basic distances 2. NNS for advanced distances: embeddings 3. NNS via product spaces

α Meet a new host Iterated product space, Ρ 22, ∞, 1= … … … d 1 d 1 β d∞, 1 d 22, ∞, 1 γ

Why Ρ 22, ∞, 1 ? [A-Indyk-Krauthgamer’ 09, Indyk’ 02] edit distance between permutations l Because we can… l Embedding: …embed Ulam into Ρ 22, ∞, 1 with constant distortion l(small dimensions) l NNS: Any t-iterated product space has NNS on n points with ¡(lg lg n)O(t) approximation ¡near-linear space and sublinear time l Corollary: NNS for Ulam with O(lg lg n)2 approximation ¡cf. each ℓp part has logarithmic lower bound!

Embedding into l Theorem: Can embed Ulam metric into Ρ 22, ∞, 1 with constant distortion ¡Dimensions: α=β=γ=d l Proof intuition ¡Characterize Ulam distance “nicely”: l“Ulam distance between x and y equals the number of characters that satisfy a simple property” ¡“Geometrize” this characterization

Ulam: a characterization l Lemma: Ulam(x, y) approximately equals the number characters a satisfying: ¡there exists K≥ 1 (prefix-length) s. t. ¡the set of K characters preceding a in x differs much from the set of K characters preceding a in y E. g. , a=5; K=4 X[5; 4] x= 123456789 y= 123467895 Y[5; 4]

Ulam: the embedding X[5; 4] l “Geometrizing” characterization: 123467895 l Gives an embedding 123456789 Y[5; 4]

A view on product metrics: l Give more computational view of embeddings l Ulam characterization is related to work in the context of property testing & streaming [EKKRV 98, ACCL 04, GJKK 07, GG 07, EJ 08] sum of squares (ℓ 22) edit(P, Q) max (ℓ∞) sum (ℓ 1) P Q

Challenges 3, … l Embedding into product spaces? ¡Of edit distance, EMD… l NNS for any norm (Banach space) ? ¡Would help for EMD (a norm) ¡A first target: Schatten norms (e. g. , trace of a matrix) l Other uses of embeddings into product spaces? ¡Related work: sketching of product spaces [JW’ 09, AIK’ 08, AKO]

Some aspects I didn’t mention yet l NNS for spaces with low intrinsic dimension: ¡ [Clarkson’ 99], [Karger-Ruhl’ 02], [Hildrum-Kubiatowicz-Ma. Rao’ 04], [Krauthgamer-Lee’ 04, ’ 05], [Indyk-Naor’ 07], … l Cell-probe lower bounds for deterministic and/or exact NNS: ¡ [Borodin-Ostrovsky-Rabani’ 99], [Barkol-Rabani’ 00], [Jayram. Khot-Kumar-Rabani’ 03], [Liu’ 04], [Chakrabarti-Chazelle-Gum. Lvov’ 04], [Pătraşcu-Thorup’ 06], … l NNS for average case: ¡ [Alt-Heinrich-Litan’ 01], [Dubiner’ 08], … l Other problems via reductions from NNS: ¡ [Eppstein’ 92], [Indyk’ 00], … l Many others !

Summary of challenges l 1. Design qualitative, efficient space partitioning l 2. Embeddings with improved distortion: edit into ℓ 1 l 3. NNS for any norm: e. g. trace norm? l 4. Embedding into product spaces: say, of EMD