Sketching Sampling and other Sublinear Algorithms Euclidean space

Sketching, Sampling and other Sublinear Algorithms: Euclidean space: dimension reduction and NNS Alex Andoni (MSR SVC)

A Sketching Problem � To sketch or not to sketch To be or not to be 010110 010101 be to similar? 2 similar?
![Sketch from LSH � 1 [Broder’ 97]: for Jaccard coefficient 3 Sketch from LSH � 1 [Broder’ 97]: for Jaccard coefficient 3](http://slidetodoc.com/presentation_image_h/7c58da138c98d9d109e68d009a99ab59/image-3.jpg)
Sketch from LSH � 1 [Broder’ 97]: for Jaccard coefficient 3

General Theory: embeddings � Hamming distance Compute distance between two points Nearest Neighbor Search Diameter/Close-pair of set S Clustering, MST, etc Euclidean distance (ℓ 2) Edit distance between two strings Earth-Mover (transportation) Distance f Reduce problem <P under hard metric> to <P under simpler metric>

Embeddings: landscape �

Dimension Reduction �

Main intuition �

1 D embedding �

1 D embedding � 2 2

Full Dimension Reduction �

Concentration �

Dimension Reduction: wrap-up �
![NNS for Euclidean space [Datar-Immorlica-Indyk-Mirrokni’ 04] � 13 NNS for Euclidean space [Datar-Immorlica-Indyk-Mirrokni’ 04] � 13](http://slidetodoc.com/presentation_image_h/7c58da138c98d9d109e68d009a99ab59/image-13.jpg)
NNS for Euclidean space [Datar-Immorlica-Indyk-Mirrokni’ 04] � 13
![Near-Optimal LSH [A-Indyk’ 06] �Regular grid → grid of balls p � p can Near-Optimal LSH [A-Indyk’ 06] �Regular grid → grid of balls p � p can](http://slidetodoc.com/presentation_image_h/7c58da138c98d9d109e68d009a99ab59/image-14.jpg)
Near-Optimal LSH [A-Indyk’ 06] �Regular grid → grid of balls p � p can hit empty space, so take more such grids until p is in a ball �Need (too) many grids of balls � Start by projecting in dimension t �Analysis gives �Choice of reduced dimension t? 2 D � Tradeoff between � # hash tables, n , and � Time to hash, t. O(t) 2 � Total query time: dn 1/c +o(1) p Rt
![Open question: � 2 c [Prob. needle of length 1 is not cut] ≥ Open question: � 2 c [Prob. needle of length 1 is not cut] ≥](http://slidetodoc.com/presentation_image_h/7c58da138c98d9d109e68d009a99ab59/image-15.jpg)
Open question: � 2 c [Prob. needle of length 1 is not cut] ≥ [Prob needle of length c is not cut]
![Time-Space Trade-offs space low query time Space Time Comment Reference [Ind’ 01, Pan’ 06] Time-Space Trade-offs space low query time Space Time Comment Reference [Ind’ 01, Pan’ 06]](http://slidetodoc.com/presentation_image_h/7c58da138c98d9d109e68d009a99ab59/image-16.jpg)
Time-Space Trade-offs space low query time Space Time Comment Reference [Ind’ 01, Pan’ 06] high [AI’ 06] [IM’ 98] medium [DIIM’ 04, AI’ 06] n 1+o(1/c 2) high ω(1) memory lookups [PTW’ 08, PTW’ 10] one hash table lookup! low [KOR’ 98, IM’ 98, Pan’ 06] no(1/ε 2) ω(1) memory lookups [AIP’ 06]

NNS beyond LSH � 17

Finale �
- Slides: 18