Constructing Signature Graphs for Signature Files Dr Yangjun

  • Slides: 24
Download presentation
Constructing Signature Graphs for Signature Files Dr. Yangjun Chen Dept. Applied Computer Science University

Constructing Signature Graphs for Signature Files Dr. Yangjun Chen Dept. Applied Computer Science University of Winnipeg Canada

 • Motivation • Signature Files as Indexes • Signature Graph and its Construction

• Motivation • Signature Files as Indexes • Signature Graph and its Construction – Searching a Signature Graph • Maintenance of Signature Graph • Summary and Future Work

Motivation • Establish Indexes to speed up query evaluation • B+-trees, inverted files, signature

Motivation • Establish Indexes to speed up query evaluation • B+-trees, inverted files, signature files • Signature files: simple and easy for maintenance • Signature graphs: less time for searching

Signature Files as Indexes § Definition A signature for a key word or an

Signature Files as Indexes § Definition A signature for a key word or an attribute value is hash-coded bit string. § Signature construction - Important parameters: m: number of 1 s in bit string F: length of bit string D: size of a block (or average number of the key words of an element) - optimal choice of the parameters: F ln 2 = m D.

 • Example: (constructing a signature for a word with m = 4 and

• Example: (constructing a signature for a word with m = 4 and F = 12) “database” letter triplets: dat, ata, tab, aba, bas, ase H(dat) = 5, H(ata) = 1, H(tab) = 8, H(aba) = 1, H(bas) = 10, H(ase) = 8. 100 010 100

Signature Files as Indexes text: … SGML … database …information … matching word signatures:

Signature Files as Indexes text: … SGML … database …information … matching word signatures: query signatures: results 010000100110 match with SGML queries: 010000100110 SGML OS database 100010010100 XML 011000100100 no match with OS information 010100011000 informatik 110100100000 object signature 110110111110 (OS) false drop

Example: relation: name sex John male . . . query: John Ù male .

Example: relation: name sex John male . . . query: John Ù male . . . query signature: 1010 0101

Signature Graph Consider a signature si of length m. We denote it as si

Signature Graph Consider a signature si of length m. We denote it as si = si[1]si[2]. . . si[m], where each si[j] {0, 1} (j = 1, . . . , F). We also use si(j 1, . . . , jh) to denote a sequence of pairs w. r. t. si: (j 1, si[j 1])(j 2, si[j 2]). . . (jh, si[jh]), where 1 jk m for k {1, . . . , h}. Definition (signature identifier) Let S = s 1. s 2. . sn denote a signature file. Consider si (1 i n). If there exists a sequence: j 1, . . . , jh such that for any k i (1 k n) we have si(j 1, . . . , jh) sk(j 1, . . . , jh), then we say si(j 1, . . . , jh) identifies the signature si or say si(j 1, . . . , jh) is an identifier of si.

Example: s 8(5, 1, 4) = (5, 1)(1, 1)(4, 0) (*For any i 8

Example: s 8(5, 1, 4) = (5, 1)(1, 1)(4, 0) (*For any i 8 we have si(5, 1, 4) s 8(5, 1, 4). For instance, s 5(5, 1, 4) = (5, 0)(1, 0)(4, 1) s 8(5, 1, 4), s 2(5, 1, 4) = (5, 1)(1, 1)(4, 1) s 8(5, 1, 4), and so on. *) s 1(5, 4, 1) = (5, 0)(4, 1)(1, 1) (*For any i 1 we have si(5, 4, 1) s 1(5, 4, 1). *)

Signature Graph • Definition (signature graph) A signature graph G for a signature file

Signature Graph • Definition (signature graph) A signature graph G for a signature file S = s 1. s 2. . sn, where si sj for i j and |sk| = F for k = 1, . . . , n, is a graph G = (V, E) such that 1. each node v V is of the form (p, skip), where p is a pointer to a signature s in S, and skip is a non-negative integer i. If i > 0, it tells that the ith bit of sq will be checked when searching. If i = 0, s will be compared with sq. 2. Let e = (u, v) E. Then, e is labeled with 0 or 1 and skip(u) > 0. Let skip(u) = i. If e is labeled with 0 and i > 0, the ith bit of the signature pointed to by p(v) is 0. If e is labeled with 1 and i > 0, the ith bit of the signature pointed to by p(v) is 1. A node v with skip(u) = 0 does not have any children.

S 1: 1011 0110 S 2: 1011 1001 S 3: 1010 0111 S 4:

S 1: 1011 0110 S 2: 1011 1001 S 3: 1010 0111 S 4: 0111 0110 S 5: 0111 0101 S 6: 0101 1100 S 7: 1110 0100 S 8: 1010 1011 p 2 5 0 0 p 7 p 3 2 0 1 0 4 1 1 p 4 0 p 5 3 0 1 1 p 6 1 p 1 0 1 1 1 p 8 0 4

Construction of signature graph: Insert s 1 p 1 Insert s 2 0 p

Construction of signature graph: Insert s 1 p 1 Insert s 2 0 p 1 p 2 0 5 1 p 3 4 0 p 1 1 0 Insert s 4 p 2 0 5 0 1 p 3 1 Insert s 3 5 4 0 1 p 4 0 1 1 p 1 0

Insert s 5 p 2 0 4 p 4 0 0 p 7 3

Insert s 5 p 2 0 4 p 4 0 0 p 7 3 1 0 1 1 p 4 1 0 3 1 1 0 0 p 1 0 3 Insert s 8 0 p 2 5 p 7 2 0 1 1 p 4 1 0 p 5 3 0 1 0 p 3 4 1 1 p 1 1 1 p 6 1 0 p 6 1 1 p 4 p 5 1 0 1 1 p 3 4 4 0 p 1 0 0 p 5 p 3 1 p 2 5 2 0 p 5 5 0 1 0 0 p 2 1 p 3 Insert s 7 Insert s 6 5 p 6 1 p 1 0 1 1 1 p 8 4 0

Signature Graph Searching a signature graph Denote sq(i) the i-th position of sq. During

Signature Graph Searching a signature graph Denote sq(i) the i-th position of sq. During the traversal of a signature graph, the inexact matching can be done as follows: (i) Let v be the node encountered and sq (i) be the position to be checked. (ii) If sq (i) = 1, we move to the right child of v (iii)If sq (i) = 0, both the right and left child of v will be visited. (iv)A search along a path stops when a node without any child node or a node is encountered for the second time.

Signature Graph marked p 2 0 0 p 7 0 1 marked p 3

Signature Graph marked p 2 0 0 p 7 0 1 marked p 3 2 5 0 4 1 0 1 marked p 4 p 5 3 0 p 6 1 1 1 0 marked 1 1 marked p 8 1 p 1 marked 4 0 marked

Maintenance of Signature Graph - Insertion of a signature s into G Same as

Maintenance of Signature Graph - Insertion of a signature s into G Same as the construction of a signature graph - Deletion of a signature s from G (i) Search G from the root until a node v is encountered, which is marked or skip(v) = 0. (ii) If skip(v) = 0, Compare p(v) and s. If s matches p(v) exactly, do the following; otherwise, nothing will be done. Let v 1 . . . vk-1 vk v be the path explored. Let u 1 be another child of vk (not on the path). Remove vk-1 vk, vk u 1 and v; and generate a new edge vk-1 u 1. skip(vk) : = 0.

Maintenance of Signature Graph - Deletion of a signature s from G (continued) (iii)

Maintenance of Signature Graph - Deletion of a signature s from G (continued) (iii) If skip(v) 0, Compare p(v’s father) and s. If s matches p(v’s father) exactly, do the following; otherwise, nothing will be done. Let v 1 . . . vk-1 vk v be the path explored. If vk v, replace p(v) with p(vk). Let u 1 be another child of vk (not on the path). Let u 2 be another parent of vk (not on the path). Replace vk-1 vk with vk u 1, and replace vk v with u 2 v. Remove vk. Note that u 2 can be found by searching G from vk with the target signature being p(vk). If vk = v, replace vk with vk-1 u 1. Remove vk.

Maintenance of Signature Graph Illustration for (ii) To be removed v 1 vk-1 vk

Maintenance of Signature Graph Illustration for (ii) To be removed v 1 vk-1 vk … u 1 v u 2

Example: p 2 5 1 0 0 p 7 p 3 2 0 4

Example: p 2 5 1 0 0 p 7 p 3 2 0 4 1 1 p 4 0 0 p 5 3 0 1 5 p 7 2 0 1 1 p 8 0 0 0 1 0 p 3 1 remove p 1 p 2 0 1 1 p 6 0 4 1 p 5 p 6 3 1 0 p 0 4 1 1 1 p 8 0 0

Maintenance of Signature Graph Illustration for (iii) To be removed v 1 vk-1 vk

Maintenance of Signature Graph Illustration for (iii) To be removed v 1 vk-1 vk … u 1 v u 2

Example: p 2 5 1 0 0 p 7 p 3 0 4 1

Example: p 2 5 1 0 0 p 7 p 3 0 4 1 1 2 p 4 0 0 p 5 3 0 1 p 7 5 0 4 1 p 4 0 0 p 5 4 0 1 2 1 p 8 0 0 p 3 1 remove p 8 p 2 0 1 1 p 6 3 1 1 p 6 1 p 1 0 1 1

Illustration for (iii) To be removed v 1 vk-1 v … u 1

Illustration for (iii) To be removed v 1 vk-1 v … u 1

Example: p 2 5 1 0 0 p 7 p 3 2 0 4

Example: p 2 5 1 0 0 p 7 p 3 2 0 4 1 1 p 4 0 0 p 5 1 p 1 3 0 1 p 2 5 p 3 p 5 1 p 8 4 0 0 1 p 4 3 0 1 remove p 7 4 0 1 1 0 0 p 6 1 1 p 6 1 p 1 0 1 1 1 p 8 0 4

Summary and Future Work - Signature and signature file - Signature graph Construction of

Summary and Future Work - Signature and signature file - Signature graph Construction of a signature graph Search of a signature graph Maintenance of a signature graph Future work: Apply signature techniques to evaluation of path-oriented queries in document databases.