Fast JohnsonLindenstrauss Transforms Nir Ailon Edo Liberty Bernard
Fast Johnson-Lindenstrauss Transform(s) Nir Ailon Edo Liberty, Bernard Chazelle Bertinoro Workshop on Sublinear Algorithms May 2011
JL – Distribution Version Find random mapping from Rn to Rk (n big, k small) such that for every x Rn, ǁxǁ2=1 with probability exp{-k || x||2 = 1 ± O( (0< 1 K is Tight for this probabilistic guarantee [Jayram, Woodruff 2011]
JL – Metric Embedding Version If you have N vectors x 1. . x. N Rn: set k=O( log N) by union bound: for all i, j ǁ xi- xjǁ ≈ ǁxi- xjǁ low-distortion metric embedding Target dimension k almost tight [Alon 2003]
Solution: Johnson-Lindenstrauss (JL) “dense random matrix” = k n
So what’s the problem? • Running time (kn) • Number of random bits (kn) • Can we do better?
Fast JL A, Chazelle 2006 = Sparse . Hadamard . Diagonal k Fourier n Time = O(k 3 + nlog n), Randomness=O(k 3 log n + n) beats JL (kn) bound for: log n < k < n 1/2
Improvement • Ailon, Liberty 2008 • O(n log n) for k < n 1/2 • O(n) random bits
Algorithm (works for k=O(d 1/2)) A, Liberty 2007 = B. D 1. H. D 2. H. D 3 … B = Error Correcting Code n k Di =
1/2 - k=d Algorithm (works for k=O(d 1/2)) Assume D 1=diag( 1… d) BD 1 x = xi. B(i) i Rademacher r. v. in k dim A, Liberty 2007 Tail of Z=||BD 1 x||2 bounded using Talagrand. . . … = B D 1 H D 2 H D 3 Pr[|Z- | > ] exp{- 2/ k 2} ||B diag(x)||2 t|| ||BO(1) k-1/2 4 (Cauchy Schwartz) 2 4||x|| B = Error Correcting Code n Best we could hope for: ||x||4=d-1/4=k-1/2 k Each element is 1/ k Row set is subset of rows from Hadamard O(n log n) runtime HDix – Rademacher r. v. in d dim Z = ||HDix||4 bounded using Talagrand… Columns are 4 -wise independent 2 2 Pr[|Z- | > ] exp{- / } t|| ||B 2 4 = O(1) ||H||4/3 4||x||4 (Cauchy Schwartz) Use Haussdorff-Young and assumption on k to “make progress” at each i
In the meantime… • Compressed sensing for sparse signal recovery Find a k n mapping s. t. the equation: y = . x, could be efficiently solved exactly for s-sparse x • R. I. P. property sufficient (Candes + Tao): ǁ . xǁ ǁxǁ for s-sparse x • You also want to be efficiently computable for the recovery algorithm
Why J. L. R. I. P. • “Number” of s-sparse x’s is exp{ s log n} [Baraniuk] • Therefore k s log n/ 2 measurements enough using distributional JL to get (1+ )-R. I. P. for s-sparse vectors • But fast R. I. P. was known without restriction on k – Rudelson, Vershynin: Take k log 3 n randomly chosen rows from Fourier transform – No restriction of the form k < n 1/2 • Does R. I. P. J. L. ? – That would be a good way to kill restriction on k!
Rudelson + Vershynin’s R. I. P. • If is random choice of k=s t log 4 n rows from Fourier (Hadamard) matrix, then with constant probability matrix is (1/ t)-R. I. P for s-sparse vectors
Rudelson + Vershynin’s R. I. P. (almost) metic J. L. • Analysis not black box • Had to extend nitty-gritty details
The Transformation The Analysis n k k-1/2 n Hadamard (Unnormalized) 1 n D x k = O(log 4 n log N / 4) with no restriction on k s = log N / 2 heaviest coordinates n – s lightest coordinates: bounded by 1/ s = 1/( log N)
D x. L Rademacher Z = || k-1/2 D x. L||2 concentrated using Talagrand with = || k-1/2 diag(x. L)||2 2 || Dx || = ||x || (1 O( )) The Analysis “directly” from r. v. n H 2 n x. H k k-1/2 Hadamard (Unnormalized) 1 n D x. L x k = O(log 4 n log N / 4) with no restriction on k s = log N / 2 heaviest coordinates n – s lightest coordinates: bounded by 1/ s = 1/( log N)
The Analysis D x. L Rademacher Z = || k-1/2 D x. L||2 concentrated using Talagrand with = || k-1/2 diag(x. L)||2 2 = O(log N / 2) || Dx. H||2 = ||x. H||2(1 O( )) “directly” from r. v. R. V. Proved that with constant probability over , uniformly for all vectors x=“s ones and the rest zero”: ||k-1/2 diag(x)||2 2 is bounded (this is R. I. P. ). The two parameters that govern the bound are: 1. ||x|| = 1, and 2. For any vector y s. t. ||y||2=1: ||diag(x) y||1 T What we have is 1. ||x. L|| 1/( log N) 2. For any vector y s. t. ||y||2=1: ||diag(x. L) y||1 1 k = O(log 4 n log N / 4) with no restriction on k s = log N / 2 heaviest coordinates n – s lightest coordinates: bounded by 1/ s = 1/( log N)
More. . • Krahmer and Ward (2010) prove RIP JL black-box! This fixed the -4 problem and replaces it with the correct -2!! Proof technique: • Kane, Nelson: “sparse JL” • Lots of work on derandomization • Can we get rid of polylog n? If we go via R. I. P. then we need at least one log n factor, which JL doesn’t seem to need.
- Slides: 17