Variants of LDA Latent Dirichlet Allocation LDA Pros

  • Slides: 31
Download presentation
Variants of LDA

Variants of LDA

Latent Dirichlet Allocation (LDA) () Pros: • The Dirichlet distribution is in the exponential

Latent Dirichlet Allocation (LDA) () Pros: • The Dirichlet distribution is in the exponential family and conjugate to the multinomial distribution --- variational inference is tractable. • are document-specific, so the variational parameters of could be regarded as the representation of a document --feature set is reduced. • are sampled repeatedly within a document --- one document can be associated with multiple topics. Cons: Because of the independence assumption implicit in the Dirichlet distribution, LDA is unable to capture the correlation between different topics.

Correlated Topic Models (CTM) (1) Key point: the topic proportions are drawn from a

Correlated Topic Models (CTM) (1) Key point: the topic proportions are drawn from a logistic normal distribution rather than a Dirichlet distribution. Definition of logistic normal distribution Let denote k-dimensional real space, positive simplex defined by the (k-1)-dimensional Suppose that follows a multinormal distribution over. The logistic transformation from to can be used to define a logistic distribution over.

Correlated Topic Models (CTM) (2) Logistic transformation 1 Log ratio transformation 1 1 The

Correlated Topic Models (CTM) (2) Logistic transformation 1 Log ratio transformation 1 1 The density function of The logistic normal distribution is defined over the simplex as Dirichlet distribution and it allows correlation between components.

Correlated Topic Models (CTM) (3) Generative process for each document W in a corpus

Correlated Topic Models (CTM) (3) Generative process for each document W in a corpus D: 1. Choose 2. For each of the N words (a) Choose a topic (b) Choose a word

Correlated Topic Models (CTM) (4) Posterior inference (for in each document) – variational inference

Correlated Topic Models (CTM) (4) Posterior inference (for in each document) – variational inference where Difficulty: the logistic normal is not exponential conjugate. Solution: we lower bound it with a Taylor expansion concave

Correlated Topic Models (CTM) (5) Parameters estimation (for of the entire corpus of documents

Correlated Topic Models (CTM) (5) Parameters estimation (for of the entire corpus of documents ) – maximizing the likelihood Variational EM 1. (E-step) For each document, we maximize the lower bound with respect to the variational parameters ; 2. (M-step) Maximize the lower bound of the likelihood of the entire corpus with respect to the model parameters and

Experimental Results (1) Example: Modeling Science

Experimental Results (1) Example: Modeling Science

Experimental Results (2) Comparison with LDA - Document modeling

Experimental Results (2) Comparison with LDA - Document modeling

Experimental Results (3) Comparison with LDA – Collaborative filtering To evaluate how well the

Experimental Results (3) Comparison with LDA – Collaborative filtering To evaluate how well the models predict the remaining words after observing a portion of the document, we need to define a measure to compare. Lower numbers denote more predictive power.

Conclusions • The main contribution of this paper is that the CTM directly model

Conclusions • The main contribution of this paper is that the CTM directly model correlation between topics via the logistic normal distribution. • At the same time, the nonconjugacy of the logistic normal distribution adds complexity to the variational inference process. • As the LDA, the CTM allows multiple topics for each document; its variational parameters could serve as features of the document.

Reference: J. Aitchison and S. M. Shen. Logistic-Normal Distributions: Some Properties and Uses. Biometrika,

Reference: J. Aitchison and S. M. Shen. Logistic-Normal Distributions: Some Properties and Uses. Biometrika, vol. 67, no. 2, pp. 261272, 1980. D. Blei, A. Ng and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3: 993 -1022, 2003.

Outline 1. Introduction 2. Exchangeable topic models (L. Fei-Fei. et al. CVPR 2005) 3.

Outline 1. Introduction 2. Exchangeable topic models (L. Fei-Fei. et al. CVPR 2005) 3. Dynamic topic models (D. Blei et al. ICML 2006)

Introduction Topic models – tools for automatically organizing, searching and browsing large collections (documents,

Introduction Topic models – tools for automatically organizing, searching and browsing large collections (documents, images, etc. ) Topic models – the discovered patterns often reflect the underlying topics which combined, form corpuses. Exchangeable (static) topic models – the words (patches) of each document (image) are assumed to be independently drawn from a mixture of multinomials; the mixture components (topics) are shared by all documents Dynamic topic models – capture the evolution of topics in a sequentially organized corpus of documents (images)

Exchangeable topic models (CVPR 2005) Used for learning natural scene categories. A key idea

Exchangeable topic models (CVPR 2005) Used for learning natural scene categories. A key idea is to use intermediate representations (themes) before classifying scenes. Avoid using manually labeled or segmented images to train the system. Local regions are first clustered into different intermediate themes, and then into categories. NO supervision is needed apart from a single category label to the training image. • the algorithm provides a principled approach to learning relevant intermediate representations of scenes, without supervision • the model is able to group categories of images into a sensible hierarchy

Exchangeable topic models (CVPR 2005)

Exchangeable topic models (CVPR 2005)

Exchangeable topic models (CVPR 2005)

Exchangeable topic models (CVPR 2005)

Exchangeable topic models (CVPR 2005) a patch x is the basic unit of an

Exchangeable topic models (CVPR 2005) a patch x is the basic unit of an image is a sequence of N patches a category is a collection of I images is the total number of themes intermediate themes (K-dim unit vectors) is the total number of codewords

Exchangeable topic models (CVPR 2005) Bayesian decision: For convenience, distribution, is always assumed to

Exchangeable topic models (CVPR 2005) Bayesian decision: For convenience, distribution, is always assumed to be a fixed uniform

Exchangeable topic models (CVPR 2005) Learning: Variational inference:

Exchangeable topic models (CVPR 2005) Learning: Variational inference:

Exchangeable topic models (CVPR 2005) Features and codebook: 1. Evenly sampled grid 2. Random

Exchangeable topic models (CVPR 2005) Features and codebook: 1. Evenly sampled grid 2. Random sampling 3. Kadir & Brady saliency detector 4. Lowe’s Do. G detector

Exchangeable topic models (CVPR 2005) Experimental setup and results: A model for each category

Exchangeable topic models (CVPR 2005) Experimental setup and results: A model for each category was obtained from the training images.

Exchangeable topic models (CVPR 2005) Experimental setup and results:

Exchangeable topic models (CVPR 2005) Experimental setup and results:

Dynamic topic models (ICML 2006) Topic models – tools for automatically organizing, searching and

Dynamic topic models (ICML 2006) Topic models – tools for automatically organizing, searching and browsing large collections (documents, images, etc. ) Topic models – the discovered patterns often reflect the underlying topics which combined, form documents. Exchangeable (static) topic models – the words (patches) of each document (image) are assumed to be independently drawn from a mixture of multinomials; the mixture components (topics) are shared by all documents Dynamic topic models – capture the evolution of topics in a sequentially organized corpus of documents (images)

Dynamic topic models (ICML 2006) Static topic model review: Each document (image) is assumed

Dynamic topic models (ICML 2006) Static topic model review: Each document (image) is assumed drawn from the following generative process: 1. choose topic proportions such as a Dirichlet from a distribution over the (K-1) simplex, 2. for each (word) patch: - choose a topic assignment - choose a patch This process assumes that images (documents) are drawn exchangeably from the same set of topics. In a dynamic topic model, we suppose that the data is divided by time slice, for example by year. The images of each slice are modeled with a K-component topic model, where the topics associated with slice t evolve from the topics associated with slice t-1.

Dynamic topic models (ICML 2006) Dynamic topic models: Extension of the logistic normal distribution

Dynamic topic models (ICML 2006) Dynamic topic models: Extension of the logistic normal distribution to time-series simplex data

Dynamic topic models (ICML 2006) Approximate inference: In the dynamic topic model, the latent

Dynamic topic models (ICML 2006) Approximate inference: In the dynamic topic model, the latent variables are the topics mixture proportions and topic indicators , . They optimize the free parameters of a distribution over the latent variables so that the distribution is close to K-L divergence to the true posterior. Follow all the derivations in the paper.

Dynamic topic models (ICML 2006) Experimental setup and results: A subset of 30, 000

Dynamic topic models (ICML 2006) Experimental setup and results: A subset of 30, 000 articles from the journal “Science”, 250 from each of the 120 years between 1881 and 1999. The corpus is made up of approximately 7. 5 million words. To explore the corpus and its themes, a 20 -component dynamic topic model was estimated.

Dynamic topic models (ICML 2006)

Dynamic topic models (ICML 2006)

Dynamic topic models (ICML 2006) Discussion: A sequential topic model for discrete data was

Dynamic topic models (ICML 2006) Discussion: A sequential topic model for discrete data was developed by using Gaussian time series on the natural parameters of the multinomial topics and logistic normal topic proportion models. The most promising extension to the method presented here is to incorporate a model of how new topics in the collection appear or disappear over time, rather than assuming a fixed number of topics.

References: 1. Blei, D. , Ng, A. , and Jordan, M. (JMLR 2003) –

References: 1. Blei, D. , Ng, A. , and Jordan, M. (JMLR 2003) – “Latent Dirichlet allocation” 2. Blei, D. , Lafferty, J. D. (NIPS 2006) – “Correlated topic models” 3. Fei-Fei, L. and Perona, P. (IEEE CVPR 2005) – “A Bayesian hierarchical model for learning natural scene categories” 4. Blei, D. , Lafferty, J. D. (ICML 2006) – “Dynamic topic models”