Bregman Information Bottleneck Koby Crammer Noam Slonim Hebrew

Bregman Information Bottleneck Koby Crammer Noam Slonim Hebrew University of Jerusalem Princeton University NIPS’ 03, Whistler December 2003

Motivation Hello, world • Extend the IB for a broad family Multinomial of representations Vectors distribution • Relation to the Exponential family

Outline • • • Rate-Distortion Formulation Bregman Divergences Bregman IB Statistical Interpretation Summary

Information Bottleneck X T Y X T [p(y=1|X) [p(y=1|T) … p(y=n|X)] p(y=n|T)]

Rate-Distortion Formulation • Input • Variables • Distortion

Self-Consistent Equations • Bolzman Distribution: • Markov + Bayes • Marginal

Bregman Divergences f: S R Bf(v||u) = f(v) - (f(u)+f’(u)(v-u)) (v, f(v)) f (u, f(u)) (v, f(u)+f’(u)(v-u))

Bregman IB: Rate-Distortion Formulation • Functional • Bregman Function • Input • Variables • Distortion

Self-Consistent Equations • Bolzman Distribution: • Prototypes: convex combination of input vectors • Marginal

Special Cases • Information Bottleneck: § Bregman function: f(x)=x log(x) – x § Domain: Simplex § Divergence: Kullback-Leibler • Soft K-means § § Bregman function: f(x)=(1/2) x 2 Domain: Realsn Divergence: Euclidian Distance [Still, Bialek, Bottou, NIPS 2003]

Bregman IB Rate-Distortion Information Bottleneck Bregman Clustering Exponential Family

Exponential Family • Expectation parameters: • Examples (single dimension): § Normal § Poisson

Exponential Family and Bregman Divergences • Expectation parameters: § • Properties : §

Illustration

Exponential Family and Bregman Divergences • Expectation parameters: § • Properties : § §

Back to Distributional Clustering • Distortion: • Data vectors and prototypes: expectation parameters • Question: For what exponential distribution we have ? Answer: Poisson

Illustration a a b a a Product a b a a aof Poisson Distributions Pr 60 . 8. 2 a b 40 a b Multinomial Distribution

Back to Distributional Clustering • Information Bottleneck: § Distributional clustering of Poison distributions • (Soft) k-means: § (Soft) Clustering of Normal distributions

Maximum Likelihood Perspective • Distortion • Input: § Observations • Output § Parameters of Distribution • IB functional: EM [Elidan & Fridman, before]

Back to Self Consistent Equations • Posterior: • Partition Function: Weighted b-norm of the Likelihood • b → ∞ , most likely cluster governs • b → 0 , clusters collapse into a single prototype

Summary • Bregman Information Bottleneck § Clustering/Compression for many representations and divergences • Statistical Interpretation § Clustering of distributions from the exponential family § EM like formulation • Current Work: § Algorithms § Characterize distortion measures which also yield Bolzman distributions § General distortion measures