Treestructured Conditional Random Fields for Semantic Annotation Jie
- Slides: 32
Tree-structured Conditional Random Fields for Semantic Annotation Jie Tang, Mingcai Hong, Juanzi Li, and Bangyong Liang Knowledge Engineering Group (KEG) Department of Computer Science and Technology Tsinghua University Nov. 5, 2006
Outline • Motivation and Problem Description • Related Work • Our Approach • Experimental Results • Future work & Summary
Introduction • Semantic web requires annotating existing web content according to particular ontologies • Application of semantic annotation – Personal profile annotation – Product information annotation – Image annotation – Company annual report annotation –…
Example of Semantic Annotation Task: 1) Identifying target entities & relations 2) Populating the ontology base October 14, 2002, 4: 00 a. m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the opensource concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source, " said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access. “ Richard Stallman, founder of the Free Software Foundation, countered saying… Metadata Person Name work. In Organization Title Name Instance Person#1 Bill Gates Person#2 CEO Founder work. In Organization#1 Richard Stallman Organization#2 work. In Person#3 Bill Beghte Microsoft VP Free Software Foundation
Hierarchical Semantic Annotation • Hierarchical dependency Dependency 3. Company Directorate Info Company directorate secretary: Haokui Zhou Representative of directorate: He Zhang Address: No. 583 -14, Road Linling, Shanghai, China Zipcode: 200030 Email: ajcoob@mail 2. online. sh. cn Phone: 021 -64396600 Fax: 021 -64392118 Dependency 4. Company Registration Info Company registration address: No. 838, Road Zhang Yang, Shanghai, China Zipcode: 200122 Company office address: No. 583 -14, Road Linling, Shanghai, China Zipcode: 200030 Email: ajcorp@online. sh. cn Phone: 021 -64396654 Metadata Company Basic Info has_directorate_info has_registration_info Company Directorate Info Company Registration Info secretary address reg_address representative Email zipcode reg_zipcode phone Phone Email Fax office_address - How to make use of the dependencies in annotation? - How to formalize a unified model? office_zipcode
Outline • Motivation and Problem Description • Related Work • Our Approach • Experimental Results • Future work & Summary
Related Work—Semantic Annotation • Annotation using Rule Learning – Learning annotation rules – E. g. Ciravegna (2001), Handschuh et al. (2002), and Popov et al. (2003) • Annotation using Classification – Formalizing the annotation problem as that of classification – E. g. Hammond, Sheth, and Kochut (2002) • Annotation using Sequential Labeling – Sequential labeling can describe dependencies between targeted entities – E. g. Reeve (2004)
Related Work—Information Extraction • Classification Models – E. g. Cortes and Vapnik (1995), Collions (2002), and Finn (2004) • Dependent Models – E. g. Ghahramani and Jordan (1997), Mc. Callum et al. (2000), and Lafferty et al. (2001) • Non-linear Dependent Models – E. g. Sutton et al. (2004), Zhu et al. (2005), and Bunescu and Mooney (2004)
Outline • Motivation and Problem Description • Related Work • Our Approach • Experimental Results • Future work & Summary
Our Approach • Hierarchical Semantic Annotation – Information is organized as a tree structure – E. g. HTML, XML • Tree-structured Conditional Random Field (TCRF) – Modeling hierarchical dependencies in a cyclable tree – Performing parameter estimation by maximizing the loglikelihood objective function – Using TRP algorithm to do the inference in the parameter estimation
Linear Conditional Random Fields 3. Company Directorate Info Company directorate secretary: Haokui Zhou Representative of directorate: He Zhang Address: No. 583 -14, Road Linling, Shanghai, China Zipcode: 200030 Email: ajcoob@mail 2. online. sh. cn Phone: 021 -64396600 Fax: 021 -64392118 4. Company Registration Info Company registration address: No. 838, Road Zhang Yang, Shanghai, China Zipcode: 200122 Company office address: No. 583 -14, Road Linling, Shanghai, China Zipcode: 200030 Email: ajcorp@online. sh. cn Phone: 021 -64396654
Linear Conditional Random Fields O Z 1 O E 1 Zipcode: 200030 Email: ajcoob@. . . … … O Zipcode: Z 2 200030 O Email: E 2 ajcorp@. . .
Tree-structured CRFs (TCRFs) • In TCRF, the dependencies are organized as a tree structure
Modeling with TCRFs A R D O Z 1 O E 1 … Zipcode: 200030 Email: ajcoob@. . . … 3. Company Directorate Info O Zipcode: 4. Company Registration Info Z 2 200030 O Email: E 2 ajcorp@. . .
TCRF Model: Annotation: How to estimate the parameters?
Parameter Estimation (1) With training data D={(x(i), y(i))}, the log-likelihood objective function: where Θ={λ 1, λ 2, …; μk, μk+1, …} (2) Derivative of the objective function with respect to a λj p(y|x) p(yp, yc|x) p(yc, yp|x) p(ys, ys|x) here: - f denotes both the edge feature t and the vertex feature s; - c (clique) denotes both edge e and vertex v; -λ denotes the two kinds of parameters λ and μ. (3) With the objective function and the derivative function, we can use any gradient-based methods (e. g. L-BFGS) to solve the optimization problem so as to do the parameter estimation
Calculating the Marginal Probabilities • Tree-based Reparameterization (TRP) – • TRP is based on the fact that any exact algorithm for optimal inference on trees actually computes marginal distributions for pairs of neighboring nodes. TRP Algorithm – – Step 1: Initialization Step 2: Updates a) Generating a spanning tree b) Propagation on the spanning tree c) Stop if terminations are met
TRP—Step 1: Initialization X 1 T 01=k·exp(s(x 1, y 1)) y 1 T 014=k·exp(t(x, y 1, y 4)) y 4 T 04=k·exp(s(x 4, y 4)) X 4 X 2 X 3 T 02=k·exp(s(x 2, y 2)) T 012=k·exp(t(x, y 2 y 1, y 2)) T 03=k·exp(s(x 3, y 3)) y 3 T 023=k·exp(t(x, y 2, y 3)) T 036=k·exp(t(x, y 3, y 6)) T 025=k·exp(t(x, y 2, y 5)) T 045=k·exp(t(x, y 4, y 5)) T 056=k·exp(t(x, y 5, y 6)) y 5 T 05=k·exp(s(x 5, y 5)) X 5 y 6 T 06=k·exp(s(x 6, y 6)) X 6
TRP—Step 2: a) Generating spanning tree • Methods: Edge cutting and edge adding X 1 X 2 X 3 y 1 y 2 y 3 y 4 y 5 y 6 X 4 X 5 X 6
TRP—Step 2: b) Propagation X 1 X 2 X 3 y 1 y 2 y 3 y 4 y 5 y 6 X 4 X 5 X 6
After The First Iteration X 1 T 11 y 1 X 2 T 12 y 2 X 3 T 13 y 3 T 014 y 4 T 14 X 4 T 036 y 5 T 15 X 5 y 6 T 16 X 6
Annotation ? ? … Zipcode: 200030 Email: ajcoob@. . . … 3. Company Directorat Info ? Zipcode: 4. Company Registration Info ? 200030 ? Email: ? ajcorp@. . .
Annotation (cont. ) • Viterbi algorithm
Outline • Motivation and Problem Description • Related Work • Our Approach • Experimental Results • Future work & Summary
Experimental Setup Baselines Data Set #Docs Synthetic 62 #Annotation Task 4 Real 3726 10 Ontology (1) SVM (2) Linear-CRF Features Category Features Edge Feature f(yp, yc), f(yc, yp), f(ys, ys) Vertex Feature {wi}, {wp}, {wc}, {ws} {wp, wi}, {wc, wi}, {ws, wi}
Annotation Results on Synthetic Data
Annotation Results on Real Data Annotation Task SVM Prec. Rec. CRF F 1 Prec. Rec. TCRF F 1 Prec. Rec. F 1 Company_Chinese_Name 88. 82 89. 40 89. 11 82. 10 80. 69 81. 37 84. 34 92. 72 88. 33 Company_English_Name 90. 51 95. 33 92. 86 71. 68 80. 14 75. 66 89. 26 88. 67 88. 96 Legal_Representative 94. 84 97. 35 96. 08 92. 86 96. 60 94. 66 94. 84 97. 35 96. 08 Company_Secretary 99. 29 93. 33 96. 22 91. 65 96. 99 94. 23 77. 96 96. 67 86. 31 Secretary_Email 57. 14 8. 89 15. 39 69. 94 56. 53 62. 34 73. 86 97. 01 83. 87 Registered_Address 98. 66 96. 71 97. 68 94. 75 87. 20 90. 80 84. 05 90. 13 86. 98 Office_Address 70. 41 97. 54 81. 78 77. 41 87. 06 81. 94 86. 93 89. 86 88. 37 Company_Email 0. 00 84. 57 85. 64 85. 09 95. 20 90. 84 92. 97 Newspaper 100. 0 99. 34 99. 67 94. 51 91. 97 93. 21 98. 69 100. 0 99. 34 Accounting_Agency 83. 15 95. 63 88. 95 73. 81 56. 77 62. 73 79. 57 97. 19 87. 50 Average 78. 28 77. 35 75. 77 83. 33 81. 96 82. 20 86. 47 94. 04 89. 87
Time Complexity Methods Training Annotation SVM 96 s 30 s CRF 5 m 25 s 5 s TCRF 50 m 40 s 50 s Tested on a computer with two 2. 8 G P 4 -CPUs and 3 G memory
Outline • Motivation and Problem Description • Related Work • Our Approach • Experimental Results • Future work & Summary
Questions • How to reduce the computational cost? – Parallelization – Incorporation of constraints from ontologies • How to incorporate the other types of dependencies into the CRF model? – E. g. Multiple dimensions – Long distant dependencies –… • How to identify entities & relations in a unified model?
Summary • Investigated the problem of hierarchical semantic annotation • Proposed a Tree-structured Conditional Random Fields for incorporating the hierarchical dependencies • Employed Tree-based Reparameterization (TRP) to perform the parameter estimation • Our approach significantly outperforms the baseline methods (SVM and CRF)
Thanks! HP: http: //keg. cs. tsinghua. edu. cn/persons/tj/
- Red fields
- Semantic field analysis
- Joomla form builder
- Verkleinwoorde van pad
- Gv tu jie
- įvardžiuotiniai būdvardžiai
- Bbi dictionary of english word combinations
- Cho lược đồ quan hệ q(abcdegh)
- Anthony badea
- Jie wu
- Jie wu
- Ziwo jieshao
- Xiong jie
- Jie wu temple
- Jie qin
- Random assignment vs random selection
- Random assignment vs random sampling
- Conditional random field
- English if clause
- The real conditional
- Www.chrisvanallsburg.com
- Maker annotation tutorial
- Eclipse annotation processor
- The santa ana winds joan didion annotation
- Amazon data annotation
- Annotation slows down the reader to deepen understanding
- Bacteriophage annotation
- What is annotation
- Braker annotation
- Close reading symbols
- Gcse art final piece evaluation template
- Benjamin banneker letter to thomas jefferson annotation
- Nothing gold can stay poem theme