SophiaXueyao Liang CPSC 503 Final Project Olympic vancouver

  • Slides: 18
Download presentation
Sophia(Xueyao) Liang CPSC 503 Final Project

Sophia(Xueyao) Liang CPSC 503 Final Project

Olympic, vancouver Snow, cold K=3 Moon light, spider man P( |d) Unsupervised P( |d)

Olympic, vancouver Snow, cold K=3 Moon light, spider man P( |d) Unsupervised P( |d)

Expectation: Maximization:

Expectation: Maximization:

Parameter Inference: No closed form solution for expectation step Efficient Algorithm: Expectation (PLSA) Maximization(PLSA)

Parameter Inference: No closed form solution for expectation step Efficient Algorithm: Expectation (PLSA) Maximization(PLSA) The result of the previous steps may not ends in better value for O

 Potential Problems of the model -10000 100 Parameter Inference Higher time complexity and

Potential Problems of the model -10000 100 Parameter Inference Higher time complexity and slower to converge

 Cora Data version 1. 0 30000 scientific papers, with citation information Important files:

Cora Data version 1. 0 30000 scientific papers, with citation information Important files: papers (ID-name, link, author…. . ) citations (ID-cited ID) classifications (link-category) directory: extractions (post-script form of the papers) Cited paper not in the corpus No abstract for some post-script files Too many categories Duplicated or isolated papers

 Cora Data version 1. 0 Papers in category Machine Learning About 2700 papers

Cora Data version 1. 0 Papers in category Machine Learning About 2700 papers 1400 Frequent Words (stop words removed, stemmed) Theory Reinforcement Geneti Algorithms Neural Networks Probabilistic Case based Rule Learning 315 217 418 818 426 298 180

PHITS Overall Accuracy 0. 470 PLSA 0. 501 Net. PLSA 0. 562 Overall Accuracy

PHITS Overall Accuracy 0. 470 PLSA 0. 501 Net. PLSA 0. 562 Overall Accuracy (A) Accuracy (B) Recall Accuray and Recall for each category

 Justified the claim that adding network structure into the model could improve the

Justified the claim that adding network structure into the model could improve the result of topic modeling Modeled the network on a scale of articles Inherent problem exists in the picked framework The result is still far from satisfactory

 How to model the network structure of blog articles, especially considering model them

How to model the network structure of blog articles, especially considering model them on a scale of articles Bag-of-words matrix extraction Better integral model, maybe LDA based Efficiency of the algorithm Recommendation based on topic communtiy discovery