Bregman Information Bottleneck Koby Crammer Noam Slonim Hebrew
Bregman Information Bottleneck Koby Crammer Noam Slonim Hebrew University of Jerusalem Princeton University NIPS’ 03, Whistler December 2003
Motivation Hello, world • Extend the IB for a broad family Multinomial of representations Vectors distribution • Relation to the Exponential family
Outline • • • Rate-Distortion Formulation Bregman Divergences Bregman IB Statistical Interpretation Summary
Information Bottleneck X T Y X T [p(y=1|X) [p(y=1|T) … p(y=n|X)] p(y=n|T)]
Rate-Distortion Formulation • Input • Variables • Distortion
Self-Consistent Equations • Bolzman Distribution: • Markov + Bayes • Marginal
Bregman Divergences f: S R Bf(v||u) = f(v) - (f(u)+f’(u)(v-u)) (v, f(v)) f (u, f(u)) (v, f(u)+f’(u)(v-u))
Bregman IB: Rate-Distortion Formulation • Functional • Bregman Function • Input • Variables • Distortion
Self-Consistent Equations • Bolzman Distribution: • Prototypes: convex combination of input vectors • Marginal
Special Cases • Information Bottleneck: § Bregman function: f(x)=x log(x) – x § Domain: Simplex § Divergence: Kullback-Leibler • Soft K-means § § Bregman function: f(x)=(1/2) x 2 Domain: Realsn Divergence: Euclidian Distance [Still, Bialek, Bottou, NIPS 2003]
Bregman IB Rate-Distortion Information Bottleneck Bregman Clustering Exponential Family
Exponential Family • Expectation parameters: • Examples (single dimension): § Normal § Poisson
Exponential Family and Bregman Divergences • Expectation parameters: § • Properties : §
Illustration
Exponential Family and Bregman Divergences • Expectation parameters: § • Properties : § §
Back to Distributional Clustering • Distortion: • Data vectors and prototypes: expectation parameters • Question: For what exponential distribution we have ? Answer: Poisson
Illustration a a b a a Product a b a a aof Poisson Distributions Pr 60 . 8. 2 a b 40 a b Multinomial Distribution
Back to Distributional Clustering • Information Bottleneck: § Distributional clustering of Poison distributions • (Soft) k-means: § (Soft) Clustering of Normal distributions
Maximum Likelihood Perspective • Distortion • Input: § Observations • Output § Parameters of Distribution • IB functional: EM [Elidan & Fridman, before]
Back to Self Consistent Equations • Posterior: • Partition Function: Weighted b-norm of the Likelihood • b → ∞ , most likely cluster governs • b → 0 , clusters collapse into a single prototype
Summary • Bregman Information Bottleneck § Clustering/Compression for many representations and divergences • Statistical Interpretation § Clustering of distributions from the exponential family § EM like formulation • Current Work: § Algorithms § Characterize distortion measures which also yield Bolzman distributions § General distortion measures
- Slides: 21