Topic Model Latent Dirichlet Allocation Ouyang Ruofei May

  • Slides: 35
Download presentation
Topic Model Latent Dirichlet Allocation Ouyang Ruofei May. 10 2013 Ouyang Ruofei LDA

Topic Model Latent Dirichlet Allocation Ouyang Ruofei May. 10 2013 Ouyang Ruofei LDA

Introduction Parameters: Inference: data = latent pattern + noise Ouyang Ruofei LDA 2

Introduction Parameters: Inference: data = latent pattern + noise Ouyang Ruofei LDA 2

Introduction Parametric Model: Number of parameters is fixed w. r. t. sample size Nonparametric

Introduction Parametric Model: Number of parameters is fixed w. r. t. sample size Nonparametric Model: Number of parameters grows with sample size Infinite dimensional parameter space Problem Density Estimation Regression Clustering Ouyang Ruofei Parameter Distributions Functions Partitions LDA 3

Clustering 1. Ironman 2. Thor 3. Hulk Indicator variable for each data point Ouyang

Clustering 1. Ironman 2. Thor 3. Hulk Indicator variable for each data point Ouyang Ruofei LDA 4

Dirichlet process Ironman: 3 times Thor: 2 times Hulk: 2 times Without the likelihood,

Dirichlet process Ironman: 3 times Thor: 2 times Hulk: 2 times Without the likelihood, we know that: 1. There are three clusters 2. The distribution over three clusters New data Ouyang Ruofei LDA 5

Dirichlet process Example: Dirichlet distribution: Dir(Ironman, Thor, Hulk) pdf: mean: Ouyang Ruofei LDA 6

Dirichlet process Example: Dirichlet distribution: Dir(Ironman, Thor, Hulk) pdf: mean: Ouyang Ruofei LDA 6

Dirichlet process Conjugate prior Dirichlet distribution: Multinomial distribution: Posterior: Example: Ironman Thor Hulk Prior

Dirichlet process Conjugate prior Dirichlet distribution: Multinomial distribution: Posterior: Example: Ironman Thor Hulk Prior 3 2 2 Likelihood 100 300 200 Posterior 103 302 202 Pseudo count Ouyang Ruofei LDA 7

Dirichlet process In our Avengers model, K=3 (Ironman, Thor, Hulk) However, this guy comes…

Dirichlet process In our Avengers model, K=3 (Ironman, Thor, Hulk) However, this guy comes… Dirichlet distribution can’t model this stupid guy Dirichlet process: K = infinity Nonparametrics here mean infinite number of clusters Ouyang Ruofei LDA 8

Dirichlet process: α: Pseudo counts in each cluster G 0: Base distribution of each

Dirichlet process: α: Pseudo counts in each cluster G 0: Base distribution of each cluster Distribution template Given any partition A distribution over distributions Ouyang Ruofei LDA 9

Dirichlet process Construct Dirichlet process by CRP Chinese restaurant process: In a restaurant, there

Dirichlet process Construct Dirichlet process by CRP Chinese restaurant process: In a restaurant, there are infinite number of tables. Costumer 1 seats at an unoccupied table with p=1. Costumer N seats at table k with p= Ouyang Ruofei LDA 10

Dirichlet process Ouyang Ruofei LDA 11

Dirichlet process Ouyang Ruofei LDA 11

Dirichlet process Ouyang Ruofei LDA 12

Dirichlet process Ouyang Ruofei LDA 12

Dirichlet process Ouyang Ruofei LDA 13

Dirichlet process Ouyang Ruofei LDA 13

Dirichlet process Ouyang Ruofei LDA 14

Dirichlet process Ouyang Ruofei LDA 14

Dirichlet process Customers : data Tables : clusters Ouyang Ruofei LDA 15

Dirichlet process Customers : data Tables : clusters Ouyang Ruofei LDA 15

Dirichlet process Train the model by Gibbs sampling Ouyang Ruofei LDA 16

Dirichlet process Train the model by Gibbs sampling Ouyang Ruofei LDA 16

Dirichlet process Train the model by Gibbs sampling Ouyang Ruofei LDA 17

Dirichlet process Train the model by Gibbs sampling Ouyang Ruofei LDA 17

Gibbs sampling is a MCMC method to obtain a sequence of observations from a

Gibbs sampling is a MCMC method to obtain a sequence of observations from a multivariate distribution The intuition is to turn a multivariate problem into a sequence of univariate problem. In Dirichlet process, Multivariate: Univariate: Ouyang Ruofei LDA 18

Gibbs sampling pseudo code: Ouyang Ruofei LDA 19

Gibbs sampling pseudo code: Ouyang Ruofei LDA 19

Topic model Document Mixture of topics Latent variable But, we can read words topics

Topic model Document Mixture of topics Latent variable But, we can read words topics words Ouyang Ruofei LDA 20

Topic model Ouyang Ruofei LDA 21

Topic model Ouyang Ruofei LDA 21

Topic model Ouyang Ruofei LDA 22

Topic model Ouyang Ruofei LDA 22

Topic model topic of xij observed word/topic count topic/doc count other topics other words

Topic model topic of xij observed word/topic count topic/doc count other topics other words Ouyang Ruofei LDA 23

Topic model Apply Dirichlet process in topic model Learn the distribution of topics in

Topic model Apply Dirichlet process in topic model Learn the distribution of topics in a document Document Topic 1 Topic 2 Topic 3 P 1 P 2 P 3 Learn the distribution of topics for a word Word Topic 1 Topic 2 Topic 3 Q 1 Q 2 Q 3 Ouyang Ruofei LDA 24

Topic model topic/doc table t 1 t 2 word/topic table t 3 d 1

Topic model topic/doc table t 1 t 2 word/topic table t 3 d 1 w 2 w 3 w 4 t 1 t 2 t 1 t 3 t 2 d 2 t 3 t 1 t 2 t 3 d 3 Ouyang Ruofei LDA 25

Topic model Latent Dirichlet allocation: Dirichlet mixture model: Ouyang Ruofei LDA 26

Topic model Latent Dirichlet allocation: Dirichlet mixture model: Ouyang Ruofei LDA 26

LDA Example d 1: ipad apple itunes d 2: apple mirror queen d 3:

LDA Example d 1: ipad apple itunes d 2: apple mirror queen d 3: queen joker ladygaga d 4: queen ladygaga mirror w: ipad apple itunes mirror queen joker ladygaga t 1: product t 2: story In fact, the topics are latent t 3: poker Ouyang Ruofei LDA 27

LDA example 1 2 3 d 1: ipad apple itunes 2 1 2 d

LDA example 1 2 3 d 1: ipad apple itunes 2 1 2 d 2: apple mirror queen 3 3 1 d 3: queen joker ladygaga 2 1 2 itunes mirror d 4: queen ladygaga mirror ipad t 1 apple 1 t 2 2 t 3 d 1 1 d 2 1 2 0 d 3 1 0 2 d 4 1 2 0 queen joker 2 1 1 2 2 Ouyang Ruofei ladygaga 2 1 1 t 2 1 t 3 sum t 1 LDA 1 1 3 1 2 28

LDA example 1 2 3 d 1: ipad apple itunes 2 1 2 d

LDA example 1 2 3 d 1: ipad apple itunes 2 1 2 d 2: apple mirror queen 3 1 d 3: queen joker ladygaga 2 1 2 itunes mirror d 4: queen ladygaga mirror ipad t 1 apple 1 t 2 2 t 3 d 1 1 d 2 1 2 0 d 3 1 0 2 d 4 1 2 0 queen joker 2 1 1 2 2 Ouyang Ruofei ladygaga 2 1 1 t 2 1 t 3 sum t 1 LDA 1 1 3 1 2 29

LDA example 1 2 3 d 1: ipad apple itunes 2 1 2 d

LDA example 1 2 3 d 1: ipad apple itunes 2 1 2 d 2: apple mirror queen 3 1 d 3: queen joker ladygaga 2 1 2 itunes mirror d 4: queen ladygaga mirror ipad t 1 apple 1 t 2 2 t 3 d 1 1 d 2 1 2 0 d 3 1 0 2 -1 d 4 1 2 0 queen joker 2 1 1 2 2 Ouyang Ruofei ladygaga 2 1 1 t 2 1 t 3 sum t 1 LDA 1 -1 1 3 -1 1 2 30

LDA example 1 2 3 d 1: ipad apple itunes 2 1 2 d

LDA example 1 2 3 d 1: ipad apple itunes 2 1 2 d 2: apple mirror queen 3 1 d 3: queen joker ladygaga 2 1 2 itunes mirror d 4: queen ladygaga mirror ipad t 1 apple 1 t 2 2 t 3 d 1 1 d 2 1 2 0 d 3 1 0 1 d 4 1 2 0 queen joker 2 1 1 2 2 Ouyang Ruofei ladygaga 2 1 1 t 2 1 t 3 sum t 1 LDA 0 1 2 31

LDA example 1 2 3 d 1: ipad apple itunes 2 1 2 d

LDA example 1 2 3 d 1: ipad apple itunes 2 1 2 d 2: apple mirror queen 2 3 1 d 3: queen joker ladygaga 2 1 2 itunes mirror d 4: queen ladygaga mirror ipad t 1 apple 1 t 2 2 t 3 d 1 1 d 2 1 2 0 d 3 1 0+1 1 d 4 1 2 0 queen joker 2 1 1 2+1 2 Ouyang Ruofei ladygaga 2 1 1 t 2 1 t 3 sum t 1 LDA 0 1 2+1 1 2 32

Further Dirichlet distribution prior: K topics Supervised Dirichlet process prior: infinite topics Unsupervised Alpha

Further Dirichlet distribution prior: K topics Supervised Dirichlet process prior: infinite topics Unsupervised Alpha mainly controls the probability of a topic with few training data in the document. Beta mainly controls the probability of a topic with few training data in the words. Ouyang Ruofei LDA 33

Further Unrealistic bag of words assumption TNG, bi. LDA Lose power law behavior Pitman

Further Unrealistic bag of words assumption TNG, bi. LDA Lose power law behavior Pitman Yor language model David Blei has done an extensive survey on topic model http: //home. etf. rs/~bfurlan/publications/SURVEY-1. pdf Ouyang Ruofei LDA 34

Q&A Ouyang Ruofei LDA

Q&A Ouyang Ruofei LDA