Topic Model Latent Dirichlet Allocation Ouyang Ruofei May
- Slides: 35
Topic Model Latent Dirichlet Allocation Ouyang Ruofei May. 10 2013 Ouyang Ruofei LDA
Introduction Parameters: Inference: data = latent pattern + noise Ouyang Ruofei LDA 2
Introduction Parametric Model: Number of parameters is fixed w. r. t. sample size Nonparametric Model: Number of parameters grows with sample size Infinite dimensional parameter space Problem Density Estimation Regression Clustering Ouyang Ruofei Parameter Distributions Functions Partitions LDA 3
Clustering 1. Ironman 2. Thor 3. Hulk Indicator variable for each data point Ouyang Ruofei LDA 4
Dirichlet process Ironman: 3 times Thor: 2 times Hulk: 2 times Without the likelihood, we know that: 1. There are three clusters 2. The distribution over three clusters New data Ouyang Ruofei LDA 5
Dirichlet process Example: Dirichlet distribution: Dir(Ironman, Thor, Hulk) pdf: mean: Ouyang Ruofei LDA 6
Dirichlet process Conjugate prior Dirichlet distribution: Multinomial distribution: Posterior: Example: Ironman Thor Hulk Prior 3 2 2 Likelihood 100 300 200 Posterior 103 302 202 Pseudo count Ouyang Ruofei LDA 7
Dirichlet process In our Avengers model, K=3 (Ironman, Thor, Hulk) However, this guy comes… Dirichlet distribution can’t model this stupid guy Dirichlet process: K = infinity Nonparametrics here mean infinite number of clusters Ouyang Ruofei LDA 8
Dirichlet process: α: Pseudo counts in each cluster G 0: Base distribution of each cluster Distribution template Given any partition A distribution over distributions Ouyang Ruofei LDA 9
Dirichlet process Construct Dirichlet process by CRP Chinese restaurant process: In a restaurant, there are infinite number of tables. Costumer 1 seats at an unoccupied table with p=1. Costumer N seats at table k with p= Ouyang Ruofei LDA 10
Dirichlet process Ouyang Ruofei LDA 11
Dirichlet process Ouyang Ruofei LDA 12
Dirichlet process Ouyang Ruofei LDA 13
Dirichlet process Ouyang Ruofei LDA 14
Dirichlet process Customers : data Tables : clusters Ouyang Ruofei LDA 15
Dirichlet process Train the model by Gibbs sampling Ouyang Ruofei LDA 16
Dirichlet process Train the model by Gibbs sampling Ouyang Ruofei LDA 17
Gibbs sampling is a MCMC method to obtain a sequence of observations from a multivariate distribution The intuition is to turn a multivariate problem into a sequence of univariate problem. In Dirichlet process, Multivariate: Univariate: Ouyang Ruofei LDA 18
Gibbs sampling pseudo code: Ouyang Ruofei LDA 19
Topic model Document Mixture of topics Latent variable But, we can read words topics words Ouyang Ruofei LDA 20
Topic model Ouyang Ruofei LDA 21
Topic model Ouyang Ruofei LDA 22
Topic model topic of xij observed word/topic count topic/doc count other topics other words Ouyang Ruofei LDA 23
Topic model Apply Dirichlet process in topic model Learn the distribution of topics in a document Document Topic 1 Topic 2 Topic 3 P 1 P 2 P 3 Learn the distribution of topics for a word Word Topic 1 Topic 2 Topic 3 Q 1 Q 2 Q 3 Ouyang Ruofei LDA 24
Topic model topic/doc table t 1 t 2 word/topic table t 3 d 1 w 2 w 3 w 4 t 1 t 2 t 1 t 3 t 2 d 2 t 3 t 1 t 2 t 3 d 3 Ouyang Ruofei LDA 25
Topic model Latent Dirichlet allocation: Dirichlet mixture model: Ouyang Ruofei LDA 26
LDA Example d 1: ipad apple itunes d 2: apple mirror queen d 3: queen joker ladygaga d 4: queen ladygaga mirror w: ipad apple itunes mirror queen joker ladygaga t 1: product t 2: story In fact, the topics are latent t 3: poker Ouyang Ruofei LDA 27
LDA example 1 2 3 d 1: ipad apple itunes 2 1 2 d 2: apple mirror queen 3 3 1 d 3: queen joker ladygaga 2 1 2 itunes mirror d 4: queen ladygaga mirror ipad t 1 apple 1 t 2 2 t 3 d 1 1 d 2 1 2 0 d 3 1 0 2 d 4 1 2 0 queen joker 2 1 1 2 2 Ouyang Ruofei ladygaga 2 1 1 t 2 1 t 3 sum t 1 LDA 1 1 3 1 2 28
LDA example 1 2 3 d 1: ipad apple itunes 2 1 2 d 2: apple mirror queen 3 1 d 3: queen joker ladygaga 2 1 2 itunes mirror d 4: queen ladygaga mirror ipad t 1 apple 1 t 2 2 t 3 d 1 1 d 2 1 2 0 d 3 1 0 2 d 4 1 2 0 queen joker 2 1 1 2 2 Ouyang Ruofei ladygaga 2 1 1 t 2 1 t 3 sum t 1 LDA 1 1 3 1 2 29
LDA example 1 2 3 d 1: ipad apple itunes 2 1 2 d 2: apple mirror queen 3 1 d 3: queen joker ladygaga 2 1 2 itunes mirror d 4: queen ladygaga mirror ipad t 1 apple 1 t 2 2 t 3 d 1 1 d 2 1 2 0 d 3 1 0 2 -1 d 4 1 2 0 queen joker 2 1 1 2 2 Ouyang Ruofei ladygaga 2 1 1 t 2 1 t 3 sum t 1 LDA 1 -1 1 3 -1 1 2 30
LDA example 1 2 3 d 1: ipad apple itunes 2 1 2 d 2: apple mirror queen 3 1 d 3: queen joker ladygaga 2 1 2 itunes mirror d 4: queen ladygaga mirror ipad t 1 apple 1 t 2 2 t 3 d 1 1 d 2 1 2 0 d 3 1 0 1 d 4 1 2 0 queen joker 2 1 1 2 2 Ouyang Ruofei ladygaga 2 1 1 t 2 1 t 3 sum t 1 LDA 0 1 2 31
LDA example 1 2 3 d 1: ipad apple itunes 2 1 2 d 2: apple mirror queen 2 3 1 d 3: queen joker ladygaga 2 1 2 itunes mirror d 4: queen ladygaga mirror ipad t 1 apple 1 t 2 2 t 3 d 1 1 d 2 1 2 0 d 3 1 0+1 1 d 4 1 2 0 queen joker 2 1 1 2+1 2 Ouyang Ruofei ladygaga 2 1 1 t 2 1 t 3 sum t 1 LDA 0 1 2+1 1 2 32
Further Dirichlet distribution prior: K topics Supervised Dirichlet process prior: infinite topics Unsupervised Alpha mainly controls the probability of a topic with few training data in the document. Beta mainly controls the probability of a topic with few training data in the words. Ouyang Ruofei LDA 33
Further Unrealistic bag of words assumption TNG, bi. LDA Lose power law behavior Pitman Yor language model David Blei has done an extensive survey on topic model http: //home. etf. rs/~bfurlan/publications/SURVEY-1. pdf Ouyang Ruofei LDA 34
Q&A Ouyang Ruofei LDA
- Specific latent heat formula
- Linked allocation
- Box principle
- Pigeonhole principle
- Dirichlet conditions
- Series de fourier
- Cosine integral
- Dirichlet
- Dirichlet condition for fourier series expansion
- Dirichlet condition of fourier series
- Nested dirichlet process
- Third person example
- /topic/ down
- Personification in tuesdays with morrie
- Hci patterns
- Dimension of thermal conductivity
- Latent heat
- Latent heat of lead
- Ddo mimic event
- Heterophoria vs heterotropia
- Heterophoria
- Hirschburg test
- Latent heat of water btu/lb
- States of consciousness ap psychology
- Specific latent heat definition
- What is latent heat capacity
- Socialization and social control
- Education as a social institution
- Mantoux test procedure
- Manifest and latent functions examples
- Latent hyperopia
- Positional hypermetropia
- Gamme de produit
- Refractory period in heart
- Latent image formation
- Sublimation in psychology