Tight Lower Bounds for Data Dependent Locality Sensitive
Tight Lower Bounds for Data Dependent Locality Sensitive Hashing Alexandr Andoni (Columbia) Ilya Razenshteyn (MIT CSAIL)
Near Neighbor Search •
Approximate Near Neighbor Search (ANN) •
Locality Sensitive Hashing (LSH) • From the definition of ANN
From LSH to ANN •
Bounds on LSH Distance metric Reference 1/4 (Andoni-Indyk 2006) 1/2 (O’Donnell-Wu-Zhou 2011) (Indyk-Motwani 1998) (O’Donnell-Wu-Zhou 2011) LSH? upon Can one improve Yes! (Andoni-Indyk-Nguyen- R 2014) (Andoni- R 2015)
How to do better than LSH? •
Bounds on data dependent LSH Distance metric Optimal! Reference 1/4 (Andoni-Indyk 2006) (O’Donnell-Wu-Zhou 2011) 1/7 (Andoni-R 2015) 1/2 (Indyk-Motwani 1998) (O’Donnell-Wu-Zhou 2011) 1/3 (Andoni-R 2015) Optimal!
The main result The data-dependent space partitions for the Euclidean and Manhattan/Hamming distances from (Andoni-R 2015) are optimal* * After proper formalization
Hard instance •
Fine print • Voronoi diagram: a perfect partition • Useless: hard to locate points • Need to define what is allowed properly to rule it out
Formalizing the model • Restricted computational complexity : data structure lower bounds • Bounded number of parts : can tweak the Voronoi diagram example
The main result •
Proof outline •
Conclusions Questions? • Space Query time • Can prove matching data-independent lower bounds for the hard instance (in an appropriate model) • What about data-dependent?
- Slides: 15