Topic Significance Ranking for LDA Generative Models Loulwah

  • Slides: 17
Download presentation
Topic Significance Ranking for LDA Generative Models Loulwah Al. Sumait James Gentle Daniel Barbará

Topic Significance Ranking for LDA Generative Models Loulwah Al. Sumait James Gentle Daniel Barbará Carlotta Domeniconi ECML PKDD - Bled, Slovenia - September 7 -11, 2009

Agenda p Introduction p Junk/Insignificant topic definitions p Distance measures p 4 -phase Weighted

Agenda p Introduction p Junk/Insignificant topic definitions p Distance measures p 4 -phase Weighted Combination Approach p Experimental results p Conclusions and future work 2

Latent Dirichlet Allocation (LDA) Model Blei, Ng, & Jordan (2003) p p o p

Latent Dirichlet Allocation (LDA) Model Blei, Ng, & Jordan (2003) p p o p p Approximation approaches Input: K Output: Φ, θ d zi Generative Process p Inference Process p Probabilistic generative model Hidden variables (topics) are associated with the observed text Dirichlet priors on document and topic distributions Exact inference is intractable K wi Nd D 3

Topic Significance Ranking p Critical effect of the setting of K on the inferred

Topic Significance Ranking p Critical effect of the setting of K on the inferred topics p Most of previous work manually examine the topics p Quantify the semantic significance of topics n How much different is the topic distribution from junk/insignificant topic distributions 4

Topic Significance Ranking p Example: 20 News. Group The Volgenau School of Information Technology

Topic Significance Ranking p Example: 20 News. Group The Volgenau School of Information Technology and Engineering Department of Computer Science 5

Junk/Insignificant Topic Definitions p Uniform Distribution Over Words n n p Vacuous Semantic Distribution

Junk/Insignificant Topic Definitions p Uniform Distribution Over Words n n p Vacuous Semantic Distribution n n p Uniformity of a topic: , p(wi|k) = ik , Vacuousness of a topic: Background Distribution n n Background of a topic: , 6

Distance Measures p Symmetric KL-Divergence n n p Uniformity, Background, W-Vacuous Cosine Dissimilarity n

Distance Measures p Symmetric KL-Divergence n n p Uniformity, Background, W-Vacuous Cosine Dissimilarity n n p Uniformity , W-Vacuous , Background Coefficient Correlation n n Uniformity , W-Vacuous , Background 7

Topic Significance Ranking Multi-Criteria Weighted Combination p 4 phases p n Standardization procedure p

Topic Significance Ranking Multi-Criteria Weighted Combination p 4 phases p n Standardization procedure p Transfer distances into standardized measures § Scores § Weights 8

Topic Significance Ranking 4 phases (Continued) n Intra-Criterion Weighted Combination p Combine standardized measures

Topic Significance Ranking 4 phases (Continued) n Intra-Criterion Weighted Combination p Combine standardized measures of each J/I definition Uniformity scores W-Vacuous scores S 1 U k n S 1 Vk S 2 Vk S 1 Bk Inter-Criteria Weighted Combination p n S 2 U k Background scores S 2 Bk Combine J/I scores and weights Topic Rank TSR X p 9

Experimental Results: Simulated Data 10

Experimental Results: Simulated Data 10

20 News. Groups Top 10 significant topics 11

20 News. Groups Top 10 significant topics 11

20 News. Groups Lowest 10 significant topics 12

20 News. Groups Lowest 10 significant topics 12

NIPS Top 10 Significant Topics 13

NIPS Top 10 Significant Topics 13

NIPS Lowest 10 Significant Topics 14

NIPS Lowest 10 Significant Topics 14

Individual vs. Combined Score Simulated Data 15

Individual vs. Combined Score Simulated Data 15

Individual vs. Combined Score 20 News. Groups 16

Individual vs. Combined Score 20 News. Groups 16

Conclusions and Future Work Unsupervised numerical quantification of the topics’ semantic Significance p Novel

Conclusions and Future Work Unsupervised numerical quantification of the topics’ semantic Significance p Novel post analysis in LDA modeling p Three J/I topic distributions p 4 levels of weighted combination approach p Future directions: p n n n Analysis of TSR sensitivity to the approach, K and weights settings More J/I definitions Tool to visualize topic evolution in online setting 17