Multivariate Information Bottleneck Nir Friedman Ori Mosenzon Noam
Multivariate Information Bottleneck Nir Friedman Ori Mosenzon Noam Slonim Naftali Tishby Hebrew University .
Data Analysis Statistics Population
Cluster “age” clusters are predictive of education level? 17 19 24 29 34 39 that 44 49 54 59 64 69 74 de gr ee r’s PH D lo ch Ba H ig h So sc m ho e ol co lle ge N on e Information Bottleneck
Cluster “age” clusters are predictive of education level? 17 19 24 29 34 39 that 44 49 54 59 64 69 74 Also cluster education attained to be predictive of age? de gr ee r’s PH D lo ch Ba H ig h So sc m ho e ol co lle ge N on e Information Bottleneck
Our contribution Generalize Information Bottleneck: Ø Generic principle for specifying systems of interacting clusters Ø Characterization of the solution for these specs Ø General purpose methods for constructing solutions
Information Bottleneck [Tishby, Peirera & Bialek 99] P(A, B) P(T|A) Tradeoff Soft clustering Minimize: I(T; A) - I(T; B) Compression Information lost about A Preserved information about B P(T, B)
Information Bottleneck Reexamined B A T T G in B A Input Actual Distribution parameters G out Desired independencies
Example: Symmetric Bottleneck A TA B TA TB G in A TB G out B Simultaneous clustering of both A and B · P(TA|A) · P(TB|B) So that · TA captures the information A contain about B · TB captures the information B contain about A
General Principle Input: u P(X 1, …, Xn) u G in - Compression · Tj clusters values of paj u G out X 1 X 2 T 1 - Desired (conditional) independencies Goal: · Find P(Tj|paj) in G in to “match” G out … … Tk Xn
Multi-information u Information random variables jointly contain about each other u Generalizes mutual information
Graph Projection P Let G be a DAG Define: Distributions consistent with G All possible distributions
Graph Projection P Let G be a DAG Define: Multi-info as though P is consistent with G Proposition: Real multi-info
Multi-information & Bayesian Networks Proposition: If P is consistent with G · Then Define Sum of local interactions
Optimizing Criteria Two goals: u Lose info wrt G in u Attain conditional independencies in G out Optimization objective: Force clusters to compress Minimize violations of conditional indep. in G out
Additional Interpretation Using properties of we can rewrite Thus, we can instead minimize Minimize information in G in Maximize information in G out
Minimization Objective - Example Symmetric Bottleneck G in TA A B TA G out TB A TB B Recall Parameters we can control Input (fixed)
Characterization of Solutions Thm: Minimal point if and only if d(tj, paj) - measure of “distortion” between tj and paj For example in symmetric bottleneck:
Finding Solutions How can we find solutions? Asynchronous update · Pick an index j · Update P(Tj|paj) Theorem u Asynchronous updates converge to (local) minima
Example - 20 newsgroup u 20, 000 messages from 20 news group [Lang 1995] · A - newsgroup of the message · B - word in the message P(a, b) probability that choosing a random position in the corpus would select · word b in a message in newsgroup a u We applied symmetric bottleneck on both attributes
20 Newsgroup: Symmetric Bottleneck word Newsgroup
20 Newsgroup: Symmetric Bottleneck word car turkish game team jesus gun hockey … Newsgroup comp. * misc. forsale sci. crypt sci. electronics alt. atheism rec. autos rec. motorcycles rec. sport. * sci. med sci. space soc. religion. christian talk. politics. * x file image encryption window dos mac … P(TD, TW)
20 Newsgroup: Symmetric Bottleneck word Newsgroup P(TD, TW)
20 Newsgroup: Symmetric Bottleneck word Newsgroup P(TD, TW)
20 Newsgroup: Symmetric Bottleneck word Newsgroup P(TD, TW)
20 Newsgroup: Symmetric Bottleneck Newsgroup atheists christianity jesus bible sin faith … alt. atheism soc. religion. christian talk. religion. misc word P(TD, TW)
Discussion General framework: u Defines a new family of optimization problems … and solutions Future directions: u Additional algorithms - agglomerative solutions u Relation to generative models u Parametric constraints in Gout
Example: Parallel Bottleneck A T 1 Gin T 1 B T 2 A T 2 Gout B
- Slides: 27