XML Supporting XML Query Processing with Complex Keyword

  • Slides: 41
Download presentation
支援具有複雜關鍵字限制之XML查詢系統 Supporting XML Query Processing with Complex Keyword Constraints 指導教授: 張雅惠 博士 研究生: 吳政儀

支援具有複雜關鍵字限制之XML查詢系統 Supporting XML Query Processing with Complex Keyword Constraints 指導教授: 張雅惠 博士 研究生: 吳政儀 2021/6/18 DBLAB @ NTOU 1/41

XQuery n n XQuery為一FLWR(For-Let-Where-Return)之結構 文數值限定 n n W 3 C XQuery User Case Full

XQuery n n XQuery為一FLWR(For-Let-Where-Return)之結構 文數值限定 n n W 3 C XQuery User Case Full Text “ftcontains” : ordered語法和distance語法 為了方便處理ordered語法和distance語法,我們另外給 予每個關鍵字唯一的position For $p in document (“http: //dblab. cs. ntou. edu. tw/book. xml”) /catalog/item Where $p/description ftcontains (“database” and “design” ordered) ftand(“database” and “design” with distance at least 2 words) and $p//name ftcontains (“Peter” and “Rob” ordered) Return $p 2021/6/18 DBLAB @ NTOU 4/41

延伸杜威編碼 n (續) XML文件之DTD <!ELEMENT catalog (item*)> <!ELEMENT item (title, author*, publisher, description)> <!ELEMENT

延伸杜威編碼 n (續) XML文件之DTD <!ELEMENT catalog (item*)> <!ELEMENT item (title, author*, publisher, description)> <!ELEMENT title (#PCDATA)> <!ELEMENT author (name)> <!ELEMENT name (#PCDATA)> <!ELEMENT publish(#PCDATA)> <!ELEMENT description(#PCDATA)> n 延伸杜威編碼的優點 n n 可以將杜威編碼直接還原成完整的路徑 n 1. 1. 8 ->/catalog/item/description 加快路徑的判斷 2021/6/18 DBLAB @ NTOU 6/41

XML文件 2021/6/18 DBLAB @ NTOU 7/41

XML文件 2021/6/18 DBLAB @ NTOU 7/41

查詢樹模組 For $p in document (“http: //dblab. cs. ntou. edu. tw/book. xml”) /catalog/item Where

查詢樹模組 For $p in document (“http: //dblab. cs. ntou. edu. tw/book. xml”) /catalog/item Where $p/description ftcontains (“database” and “design” ordered) ftand(“database” and “design” with distance at least 2 words) and $p//name ftcontains (“Peter” and “Rob” ordered) Return $p n 2021/6/18 節點形態定義 n 葉子節點 (LN) n 黏合節點 (GN) n 內容限制節點 (VF) n 回傳限制節點 (RF) DBLAB @ NTOU 9/41

元素編碼表(preprocess) Extended Dewey Level Code Tagname Keyword Position 1. 1. 2. 1 4 name

元素編碼表(preprocess) Extended Dewey Level Code Tagname Keyword Position 1. 1. 2. 1 4 name 1. 1. 6. 1 4 name 1. 2. 2. 1 4 name 1. 2. 6. 1 4 name 1. 3. 2. 1 4 name Peter Rob Carlos Coronel Peter Rob Elie Semaan Peter Rob 7 8 9 10 49 50 51 52 83 84 /catalog/item/author/name的元素編碼表 2021/6/18 DBLAB @ NTOU 11/41

資料擷取模組 2021/6/18 DBLAB @ NTOU 12/41

資料擷取模組 2021/6/18 DBLAB @ NTOU 12/41

TJ_IR系統架構:資訊檢索模組 n 透過position資訊處理order和distance限制 (C_match) 2021/6/18 DBLAB @ NTOU 13/41

TJ_IR系統架構:資訊檢索模組 n 透過position資訊處理order和distance限制 (C_match) 2021/6/18 DBLAB @ NTOU 13/41

資訊檢索模組 For $p in document (“http: //dblab. cs. ntou. edu. tw/book. xml”) /catalog/item Where

資訊檢索模組 For $p in document (“http: //dblab. cs. ntou. edu. tw/book. xml”) /catalog/item Where $p/description ftcontains (“database” and “design” ordered) ftand(“database” and “design” with distance at least 2 words) and $p//name ftcontains (“Peter” and “Rob” ordered) Return $p 2021/6/18 DBLAB @ NTOU 14/41

An example for TJFast Document: catalog item 1. 1 catalog 1. 2 item 1.

An example for TJFast Document: catalog item 1. 1 catalog 1. 2 item 1. 1. 2 1. 1. 8 author description 1. 1. 2. 1 name Query: 1 A set for the branching {} node item 1. 3 item name 1. 2. 2 1. 3. 4 author description 1. 2. 2. 1 1. 3. 2. 1 name description DTD: catalog -> item* Tname: 1. 1. 2. 1, 1. 2. 2. 1, 1. 3. 2. 1 item-> title, author*, publisher*, description author-> name Tdescription: 2021/6/18 1. 1. 8, 1. 3. 4 publisher->name DBLAB @ NTOU 16/41

An example for TJFast Document: catalog item Query: 1 1. 2 item 1. 1.

An example for TJFast Document: catalog item Query: 1 1. 2 item 1. 1. 2 1. 1. 8 author description 1. 1. 2. 1 name (續) 1. 3 item 1. 2. 2 1. 3. 2 author name catalog 1. 2. 2. 1 name 1. 3. 4 description 1. 3. 2. 1 name {} description derive 1. 1. 2. 1 /catalog/item/author/name derive Tname: Tdescription: 1. 1. 8 /catalog/item/description 1. 1. 2. 1, 1. 2. 2. 1, 1. 3. 2. 1 1. 1. 8, 1. 3. 4 2021/6/18 DBLAB @ NTOU 17/41

An example for TJFast Document: catalog item 1. 2 1. 1. 2. 1 name

An example for TJFast Document: catalog item 1. 2 1. 1. 2. 1 name Tdescription: item name 1. 2. 2 1. 3. 4 author description name catalog 1. 3 item 1. 1. 2 1. 1. 8 author description Tname: Query: 1 1. 1 (續) 1. 2. 2. 1 1. 3. 2. 1 name {} description 由於item (1. 1)為分支節點 (GN) 因此將item (1. 1) 插入集合中. 1. 1. 2. 1, 1. 2. 2. 1, 1. 3. 2. 1 1. 1. 8, 1. 3. 4 2021/6/18 DBLAB @ NTOU 18/41

An example for TJFast Document: catalog item 1. 1. 2. 1 name Tdescription: 1.

An example for TJFast Document: catalog item 1. 1. 2. 1 name Tdescription: 1. 2 item 1. 1. 2 1. 1. 8 author description Tname: Query: 1 1. 1 catalog {item (1. 1) } 1. 3 item 1. 2. 2 1. 3. 4 author description name (續) 1. 2. 2. 1 1. 1. 2. 1, 1. 2. 2. 1, 1. 3. 2. 1 name description 1. 3. 2. 1 name 將 Tname 的指標從 name(1. 1. 2. 1)移動到 name(1. 2. 2. 1)並且輸出 符合的路徑 <item, author, name> 1. 1. 8, 1. 3. 4 2021/6/18 DBLAB @ NTOU 19/41

An example for TJFast (續) Document: catalog item 1. 1 1. 2 1. 1.

An example for TJFast (續) Document: catalog item 1. 1 1. 2 1. 1. 2. 1 name Tdescription: item 1. 2. 2 1. 3. 4 author description name catalog 1. 3 item 1. 1. 2 1. 1. 8 author description Tname: Query: 1 1. 2. 2. 1 1. 3. 2. 1 name 1. 1. 2. 1, 1. 2. 2. 1, 1. 3. 2. 1 name {item(1. 1)} description derive 1. 2. 2. 1 /catalog/item/author/name 由於Tname 1. 2. 2. 1不符合結構限制因此不做處置. 繼續往下個 節點處理, 將Tdescription的指標從description(1. 1. 8)移動到 description(1. 3. 4), 輸出<item, description>, 同時將集合清除. 1. 1. 8, 1. 3. 4 2021/6/18 DBLAB @ NTOU 20/41

An example for TJFast Document: catalog item 1. 2 1. 1. 2. 1 name

An example for TJFast Document: catalog item 1. 2 1. 1. 2. 1 name Tdescription: item 1. 2. 2 1. 3. 4 author description name catalog 1. 3 item 1. 1. 2 1. 1. 8 author description Tname: Query: 1 1. 1 (續) 1. 2. 2. 1 name 1. 1. 2. 1, 1. 2. 2. 1, 1. 3. 2. 1 name {} description 1. 3. 2. 1 由於不符合結構限制因此不做處置. 繼續往下個節點處 理, 將Tname的指標從name(1. 2. 2. 1)移動到name(1. 3. 2. 1). 1. 1. 8, 1. 3. 4 2021/6/18 DBLAB @ NTOU 21/41

An example for TJFast Document: catalog item 1. 1. 2. 1 name Tdescription: 1.

An example for TJFast Document: catalog item 1. 1. 2. 1 name Tdescription: 1. 2 item 1. 2. 2 1. 3. 4 author description name catalog 1. 3 item 1. 1. 2 1. 1. 8 author description Tname: Query: 1 1. 1 (續) 1. 2. 2. 1 name 1. 1. 2. 1, 1. 2. 2. 1, 1. 3. 2. 1 name {item(1. 3) } description 1. 3. 2. 1 由於item(1. 3)為分支節點 , 因此將item 3(1. 3)插入集合中. 1. 1. 8, 1. 3. 4 2021/6/18 DBLAB @ NTOU 22/41

An example for TJFast Document: catalog item 1. 1. 2. 1 name Tdescription: 1.

An example for TJFast Document: catalog item 1. 1. 2. 1 name Tdescription: 1. 2 item 1. 1. 2 1. 1. 8 author description Tname: Query: 1 1. 1 catalog 1. 3 item 1. 2. 2 1. 3. 4 author description name (續) 1. 2. 2. 1 1. 1. 2. 1, 1. 2. 2. 1, 1. 3. 2. 1 name {item(1. 3) } description 1. 3. 2. 1 移動Tname 的指標到達最後, 並解輸出符合的路徑 <item, author, name>. 1. 1. 8, 1. 3. 4 2021/6/18 DBLAB @ NTOU 23/41

An example for TJFast Document: catalog item 1. 1. 2. 1 name Tdescription: 1.

An example for TJFast Document: catalog item 1. 1. 2. 1 name Tdescription: 1. 2 item 1. 2. 2 1. 3. 4 author description name catalog 1. 3 item 1. 1. 2 1. 1. 8 author description Tname: Query: 1 1. 1 (續) 1. 2. 2. 1 name 1. 1. 2. 1, 1. 2. 2. 1, 1. 3. 2. 1 name {item(1. 3) } description 1. 3. 2. 1 移動Tdescription 的指標到達最後, 並解輸出符合的路徑 <item, description>. 1. 1. 8, 1. 3. 4 2021/6/18 DBLAB @ NTOU 24/41

An example for TJFast Document: catalog item 1. 1. 2. 1 name Tdescription: 1.

An example for TJFast Document: catalog item 1. 1. 2. 1 name Tdescription: 1. 2 item 1. 1. 2 1. 1. 8 author description Tname: Query: 1 1. 1 (續) 1. 3 item 1. 2. 2 1. 3. 4 author description name catalog 1. 2. 2. 1 1. 1. 2. 1, 1. 2. 2. 1, 1. 3. 2. 1 name {} description 1. 3. 2. 1 name 將最後所有符合的路徑做merge-join最後輸出符合的答案. 1. 1. 8, 1. 3. 4 2021/6/18 DBLAB @ NTOU 25/41

An example for TJFast (續) catalog Document: Query: catalog Item(1. 1) Item(1. 3) {}

An example for TJFast (續) catalog Document: Query: catalog Item(1. 1) Item(1. 3) {} item author description(1. 1. 8) author description(1. 3. 4) name description name(1. 1. 2. 1) name(1. 3. 2. 1) Phase 1. Intermediate paths Phase 2. Final solutions <catalog, item, author, name, description> catalog/item/author//name catalog/item/description: <catalog 1, item 1, author 2, name 1>, <catalog 1, item 1, description 8>, <catalog 1, item 3, author 2, name 1> <catalog 1, item 3, description 4> Join <catalog 1, item 1, author 2, name 1, description 8>, 2021/6/18 <catalog 1, item 3, author 2, name 1, description 4> 26/41 DBLAB @ NTOU

關鍵字編碼表(preprocess) Dewey. ID global_position 1. 1. 1. 8 1. 1. 8. 14 1. 1.

關鍵字編碼表(preprocess) Dewey. ID global_position 1. 1. 1. 8 1. 1. 8. 14 1. 1. 8. 25 1. 1. 8. 34 1. 2. 1. 38 1. 2. 8. 57 1. 3. 1. 82 1. 3. 4. 89 database的關鍵字編碼表 2021/6/18 DBLAB @ NTOU 28/41

2021/6/18 DBLAB @ NTOU 29/41

2021/6/18 DBLAB @ NTOU 29/41

2021/6/18 DBLAB @ NTOU 31/41

2021/6/18 DBLAB @ NTOU 31/41

2021/6/18 DBLAB @ NTOU 33/41

2021/6/18 DBLAB @ NTOU 33/41

實驗 n 實驗環境 n n n CPU : Pentium 4 3. 0 GHz 記憶體

實驗 n 實驗環境 n n n CPU : Pentium 4 3. 0 GHz 記憶體 : 1. 5 GB 作業系統 : Windows XP 實作 具 : Visual C++ 6. 0和Visual 2008 Data Set n n 2021/6/18 DBLP 10 MB-50 MB XMark 10 MB-50 MB DBLAB @ NTOU 34/41

不同Dataset之實驗 For $p in document (“http: : //dblab. cs. ntou. edu. tw/dblp. xml”) /dblp/inproceedings

不同Dataset之實驗 For $p in document (“http: : //dblab. cs. ntou. edu. tw/dblp. xml”) /dblp/inproceedings where $p/booktitle ftcontains (System and System ordered) $p/title ftcontains (program and program ordered) return $p 取得資料之時間(包含做LCA) 2021/6/18 Q 1 number of partial data total時間 DBLAB @ NTOU 36/41

不同Dataset之實驗(續) For $p in document (“http: : //dblab. cs. ntou. edu. tw/xmark. xml”) /site//item

不同Dataset之實驗(續) For $p in document (“http: : //dblab. cs. ntou. edu. tw/xmark. xml”) /site//item where $p/description//text ftcontains (master and master ordered) $p/mailbox//from ftcontains (Mehrdad and Mehrdad ordered) return $p 取得資料之時間(包含做LCA) 2021/6/18 Q 2 number of partial data total時間 DBLAB @ NTOU 37/41

不同關鍵字限制之實驗 For $p in document (“http: : //dblab. cs. ntou. edu. tw/dblp. xml”) /dblp/inproceedings

不同關鍵字限制之實驗 For $p in document (“http: : //dblab. cs. ntou. edu. tw/dblp. xml”) /dblp/inproceedings where $p/booktitle ftcontains (System and System ordered) ftand (Advance and Course ordered) $p/title ftcontains (Language and Language ordered) return $p Q 3 For $p in document (“http: : //dblab. cs. ntou. edu. tw/dblp. xml”) /dblp/inproceedings where $p/booktitle ftcontains (System and System ordered) ftand (Advance and Course distance <= 2 words) $p/title ftcontains (Language and Language ordered) return $p Q 4 For $p in document (“http: : //dblab. cs. ntou. edu. tw/dblp. xml”) /dblp/inproceedings where $p/booktitle ftcontains (System and System ordered) ftand (Advance and Course distance <= 2 words) $p/title ftcontains (Language and Language ordered) return $p Q 5 2021/6/18 DBLAB @ NTOU 38/41

關鍵字頻率影響之實驗 For $p in document (“http: : //dblab. cs. ntou. edu. tw/dblp. xml”) /dblp/inproceedings

關鍵字頻率影響之實驗 For $p in document (“http: : //dblab. cs. ntou. edu. tw/dblp. xml”) /dblp/inproceedings where $p/booktitle ftcontains (System and System ordered) $p/title ftcontains (Support and Support ordered) return $p Q 6 For $p in document (“http: : //dblab. cs. ntou. edu. tw/dblp. xml”) /dblp/inproceedings where $p/booktitle ftcontains (System and System ordered) $p/title ftcontains (language and language ordered) return $p Q 7 2021/6/18 10 mb 20 mb 30 mb 40 mb 50 mb DBLAB @ NTOU system 1334 2960 4595 5437 6741 program 250 414 647 762 954 Support 244 492 850 1006 1260 40/41 language 769 1261 1839 2099 2552