National Yunlin University of Science and Technology Automatic

  • Slides: 23
Download presentation
國立雲林科技大學 National Yunlin University of Science and Technology Automatic Keywords Extraction Of Chinese Document

國立雲林科技大學 National Yunlin University of Science and Technology Automatic Keywords Extraction Of Chinese Document Using Small World Structure Advisor:Dr. Hsu Graduate:Chien-Shing Chen Author:Zhu Mengxiao Cai Zhi Cai Qingsheng Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 Intelligent Database Systems Lab

N. Y. U. S. T. I. M. Outline n n n n Motivation Objective

N. Y. U. S. T. I. M. Outline n n n n Motivation Objective Introduction Small world and co-occurrence network Chinese document keywords extraction Experimental example Conclusion Opinion Intelligent Database Systems Lab

N. Y. U. S. T. I. M. Motivation n The human language networks are

N. Y. U. S. T. I. M. Motivation n The human language networks are proved to be neither completely random nor completely regular, but adopt small world properties. Intelligent Database Systems Lab

N. Y. U. S. T. I. M. Objective n keywords extraction Intelligent Database Systems

N. Y. U. S. T. I. M. Objective n keywords extraction Intelligent Database Systems Lab

N. Y. U. S. T. I. M. Introduction n n human language networks are

N. Y. U. S. T. I. M. Introduction n n human language networks are not n completely random n completely regular Analyze the co-occurrence Extracting keywords V. S. TFIDF Intelligent Database Systems Lab

2. 1 Small World Characters n n 1. has high clustering coefficient n possibility

2. 1 Small World Characters n n 1. has high clustering coefficient n possibility of two persons that have common friends also to be friends 2. short characteristic path length n Average shortest distance of all pairs of nodes Intelligent Database Systems Lab N. Y. U. S. T. I. M.

2. 1 Small World Characters n n N : total number of nodes ki

2. 1 Small World Characters n n N : total number of nodes ki : number of the neighbors for the ith nodes ei : existing links between node i’s k enighbors C : average over all nodes Intelligent Database Systems Lab N. Y. U. S. T. I. M.

2. 1 Small World Characters n P increasing from 0 to 1 Intelligent Database

2. 1 Small World Characters n P increasing from 0 to 1 Intelligent Database Systems Lab N. Y. U. S. T. I. M.

2. 1 Small World Characters n Examples from reference [2] Intelligent Database Systems Lab

2. 1 Small World Characters n Examples from reference [2] Intelligent Database Systems Lab N. Y. U. S. T. I. M.

2. 2 Co-occurrence Network n N. Y. U. S. T. I. M. each word

2. 2 Co-occurrence Network n N. Y. U. S. T. I. M. each word as a node and connect it with its closest neighbor in the same sentence Intelligent Database Systems Lab

2. 2 Co-occurrence Network n n Useless words n conjunctions Pre-processing Intelligent Database Systems

2. 2 Co-occurrence Network n n Useless words n conjunctions Pre-processing Intelligent Database Systems Lab N. Y. U. S. T. I. M.

2. 2 Co-occurrence Network n http: //www. people. com. en/GB/paper 464/ Intelligent Database Systems

2. 2 Co-occurrence Network n http: //www. people. com. en/GB/paper 464/ Intelligent Database Systems Lab N. Y. U. S. T. I. M.

3. Chinese Document Key Words Extraction N. Y. U. S. T. I. M. n

3. Chinese Document Key Words Extraction N. Y. U. S. T. I. M. n 3. 1 pre-processing n 3. 2 keywords extraction Intelligent Database Systems Lab

N. Y. U. S. T. I. M. 3. 1 pre-processing n n n 1.

N. Y. U. S. T. I. M. 3. 1 pre-processing n n n 1. format processing 2. sentence recognizing 3. word splitting n N-Shortest-Paths method 4. part of speech marking 5. stop-word removing Intelligent Database Systems Lab

N. Y. U. S. T. I. M. 3. 2 keywords extraction n n nearest

N. Y. U. S. T. I. M. 3. 2 keywords extraction n n nearest two words and each word and its neighbor’s neighbor are connected Keywords extraction process : bridge words Intelligent Database Systems Lab

N. Y. U. S. T. I. M. 3. 2 keywords extraction n n Definition

N. Y. U. S. T. I. M. 3. 2 keywords extraction n n Definition 1 : CN is the original Co-occurrence Network constructed Definition 2 : CNi is the co-occurrence network with the absence of the ith node. Intelligent Database Systems Lab

N. Y. U. S. T. I. M. 3. 2 keywords extraction n △Li =

N. Y. U. S. T. I. M. 3. 2 keywords extraction n △Li = Li – L L : the average of shortest path length over each two nodes Li : characteristic path length of the new network CNi Intelligent Database Systems Lab

4. Experimental Example n n data from http: //www. people. com. cn/GB/paper 464/9567/883861. html

4. Experimental Example n n data from http: //www. people. com. cn/GB/paper 464/9567/883861. html 67 words Average Path Length just 2. 97 Clustering Coefficient is 0. 119 Intelligent Database Systems Lab N. Y. U. S. T. I. M.

4. Experimental Example Intelligent Database Systems Lab N. Y. U. S. T. I. M.

4. Experimental Example Intelligent Database Systems Lab N. Y. U. S. T. I. M.

4. Experimental Example Intelligent Database Systems Lab N. Y. U. S. T. I. M.

4. Experimental Example Intelligent Database Systems Lab N. Y. U. S. T. I. M.

4. Experimental Example Intelligent Database Systems Lab N. Y. U. S. T. I. M.

4. Experimental Example Intelligent Database Systems Lab N. Y. U. S. T. I. M.

N. Y. U. S. T. I. M. 5. Conclusion n keyword extracted relate more

N. Y. U. S. T. I. M. 5. Conclusion n keyword extracted relate more to the contents Intelligent Database Systems Lab

N. Y. U. S. T. I. M. Opinion Intelligent Database Systems Lab

N. Y. U. S. T. I. M. Opinion Intelligent Database Systems Lab