Tight Lower Bounds for Data Dependent Locality Sensitive

Tight Lower Bounds for Data Dependent Locality Sensitive Hashing Alexandr Andoni (Columbia) Ilya Razenshteyn (MIT CSAIL)

Near Neighbor Search •

Approximate Near Neighbor Search (ANN) •

Locality Sensitive Hashing (LSH) • From the definition of ANN

From LSH to ANN •

Bounds on LSH Distance metric Reference 1/4 (Andoni-Indyk 2006) 1/2 (O’Donnell-Wu-Zhou 2011) (Indyk-Motwani 1998) (O’Donnell-Wu-Zhou 2011) LSH? upon Can one improve Yes! (Andoni-Indyk-Nguyen- R 2014) (Andoni- R 2015)

How to do better than LSH? •

Bounds on data dependent LSH Distance metric Optimal! Reference 1/4 (Andoni-Indyk 2006) (O’Donnell-Wu-Zhou 2011) 1/7 (Andoni-R 2015) 1/2 (Indyk-Motwani 1998) (O’Donnell-Wu-Zhou 2011) 1/3 (Andoni-R 2015) Optimal!

The main result The data-dependent space partitions for the Euclidean and Manhattan/Hamming distances from (Andoni-R 2015) are optimal* * After proper formalization

Hard instance •

Fine print • Voronoi diagram: a perfect partition • Useless: hard to locate points • Need to define what is allowed properly to rule it out

Formalizing the model • Restricted computational complexity : data structure lower bounds • Bounded number of parts : can tweak the Voronoi diagram example

The main result •

Proof outline •

Conclusions Questions? • Space Query time • Can prove matching data-independent lower bounds for the hard instance (in an appropriate model) • What about data-dependent?