Parallel Algorithms for general Galois lattices building Fatma
Parallel Algorithms for general Galois lattices building Fatma BAKLOUTI , Gérard LEVY CERIA fatma. baklouti@dauphine. fr, gerardlevy@dauphine. fr Workshop WAS 2003 9/9/2020 Workshop WDAS 2003 1
Plan q Knowledge Discovery in Databases (KDD) q One tool for data mining : Galois Lattices q Problems and solutions : q Row-sharing q Column-sharing q Conclusion 9/9/2020 Workshop WDAS 2003 2
Knowledge Discovery in Databases (KDD) ’Knowledge Discovery in Databases’ (KDD) or ‘Data Mining’ (DM) Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information (knowledge) or patterns from data in large databases or other information repositories [Fayyad et al. , 1996] 9/9/2020 Workshop WDAS 2003 3
l DM emergence factors § Wide Data bases volume – from Gbyte to Tbyte § Clientele report l Example § Analysis of a client basket in mass distribution l Which group or set of products were frequently bought by a client during a passage in a shop? Disposition of product on shelves. § Example : Milk and bread when a client buys milk, does he buy bread too ? 9/9/2020 Workshop WDAS 2003 4
l Various applications § Medecine, Finances, Distribution, telecommunication … l Fields of research Data Base Statistics IHM KDD Etc … Information Science 9/9/2020 Learning Workshop WDAS 2003 5
KDD General Process Data acquisition Text Picture Sound Data Preparation Selection, cleaning, Transformations, editing integration construction of attributes Table Description, Data Mining Structure, Explanation Evaluation, Simplification, Knowledge Data base 9/9/2020 Editing Knowledge management Workshop WDAS 2003 Model Concept 6
l Books : § Data Mining, • Han & Kamber (Morgan Kaufmann Pubs, 2001) § Mastering Data Mining, • Berry & Linoff (Wiley Computer Publishing, 2000) §… l Interesting sites : § http: //www. kddnuggets. com § http: //www. crisp-dm. org : CRoss-Industry Standard Process for Data Mining - effort de standardization §… 9/9/2020 Workshop WDAS 2003 7
Galois Lattices l Using Galois Lattice (mathematical structure) for solving Data Mining problems. l References : § Birkhoff’s Lattice Theory: 1940, 1973 § Barbut & Monjardet : 1970 § Wille : 1982 § Chein, Norris, Ganter, Bordat, … § Diday, Duquenne, … § Emilion, Lévy, Diday, Lambert l Basic Concepts : Context, Galois connection, Concept. 9/9/2020 Workshop WDAS 2003 8
Galois Lattices - Definition l Context = (O, A, I) : n O : finite set of examples n A : finite set of attributes n I : binary relation between O and A, (I O x A) binary relation l Example : A a b c 1 1 2 1 1 O 3 9/9/2020 1 Workshop WDAS 2003 1 9
Galois Lattices - Definition l Galois connection § Oi O and Ai A, we define f et g like this : § f : P(O) P(A) f(Oi) = {a A / (o, a) I, o Oi} intention § g: P(A) P(O) g(Ai) = {o O / (o, a) I, a Ai} extension § f et g are decreasing applications § h =g · f and k = f · g, are : n Increasing O 1 O 2 h (O 1) h (O 2) n Extensive O 1 h (O 1) n Idempotent h (O 1) = h · h (O 1) § h and k are closure operators. l (f, g) = Galois connection between P(O) and P(A) 9/9/2020 Workshop WDAS 2003 10
Galois Lattices - Definition l Concept § Oi O et Ai A, § (Oi, Ai) is a concept iff Oi is the extension of Ai and Ai is the intention of Oi Ø Oi = g (Ai) and Ai = f(Oi) § L ={(Oi, Ai) P(O) P(A) / Oi= g(Ai) et Ai = f(Oi)} : concepts set. l L: ordered set by the relationship ≤ § (O 1, A 1) ≤ (O 2, A 2) iff O 1 O 2 (or A 2 A 1). l Galois Lattice § T=(L, ≤) an ordered set of concepts. 9/9/2020 Workshop WDAS 2003 11
Galois Lattices - Definition l Concept: Example § O 1 = {6, 7} f(O 1)= {a, c} intention § A 1 = {a, c} g(A 1)= {1, 2, 3, 4, 6, 7} extension § Remark: h(O 1)= g · f(O 1)= g(A 1) ≠ O 1 § ({6, 7} , {a, c}) L § ({1, 2, 3, 4, 6, 7}, {a, c}) L Because: h({1, 2, 3, 4, 6, 7}) = g · f({1, 2, 3, 4, 6, 7}) = g ({a, c}) = {1, 2, 3, 4, 6, 7} 9/9/2020 a 1 1 2 1 3 1 4 1 5 1 6 1 7 1 Workshop WDAS 2003 b 1 1 1 c 1 1 1 d 1 1 1 e 1 1 1 f 1 1 g 1 h 1 1 1 1 12
1234567, a 123467, ac 123456, ab 12345, abd 12346, abc 12356, abe 1247, acf 1234, abcd 1235, abde 1236, abce 124, abcdf 135, abdeg 123, abcde 236, abceh 12, abcdef 13, abcdeg 23, abcdeh 1, abcdefg 2, abcdefh 3, abcdegh Ø, abcdefgh 9/9/2020 Workshop WDAS 2003 13
Generalized Galois Lattices l Context : < I, F, d > l T = <F, , , ≤> l Tj= <Fj, j, ≤j> for all j de J, J = [1, n] l d: I F l di = (di 1, …, dij, …, din) : description of the individual i relatively to the attributes j of J. 1 2 j n Individuals I 1 §x I i di 1 dij din f (x) = ∧d(i) i x Intention § z F k dk 1 9/9/2020 dkj dkn g (z) = { i I | z ≤ d(i) } Workshop WDAS 2003 Extension 14
General Galois Lattice - Example F = F 1 x F 2 x F 3 Size : short, medium, high 1 < Weight : thin, fat 0 < Age : child, adolescent, adult 1 Individuals I F 1 F 2 < < 3 1 2 F 3 Size Weight Age Marc 2 0 1 Cedric 1 1 2 Céline 2 0 3 Carine 3 1 2 9/9/2020 2 Workshop WDAS 2003 f {Cedric, Carine} = {1, 1, 2} g{1, 1, 2}= {Cedric, Carine} 15
Ø, 313 3, 203 4, 312 34, 202 24, 112 134, 201 234, 102 1234, 101 9/9/2020 Workshop WDAS 2003 16
Problems n Large data volume: Partition data on different server nodes Process in parallel locally Group results on one (client) node Post-process Our tool: SDDS (Scalable Distributed Data Structures ) 9/9/2020 Workshop WDAS 2003 17
Solutions : Column-sharing Row-sharing 1 2 3 1 2 3 1 1 2 2 3 4 C 3 4 5 5 3 1 2 1 2 3 4 1 2 3 2 C 3 4 C 4 1 2 C 1 2 3 4 5 C 2 3 5 5 9/9/2020 Workshop WDAS 2003 18
Row-sharing M 1 M 2 C 1 C 2 T 1=TG(C 1) (X 1 , z 1) T 2=TG(C 2) (X 2 , z 2) M g 1(z) = g 1 (z 1) = X 1 ? g 2(z) = g 2(z 2) = X 2 ? X = X 1 U X 2 z = z 1 ∧ z 2 T=TG(C) gj(z) = {i Ij : z ≤ d(i) }, for j =1, 2. 9/9/2020 Workshop WDAS 2003 19
Example C 1 j i 1 2 3 1 1 0 2 2 2 1 0 3 1 4 1 1 1 5 0 1 3 6 0 0 2 7 2 0 0 C j i 1 2 3 1 1 0 2 5 0 1 3 2 2 1 0 6 0 0 2 3 0 3 1 7 2 0 0 4 1 1 1 T 1=GL(C 1) C 2 T 2=GL(C 2) T = GL(C) is it egal to the horizontal product of lattices T 1 = GL (C 1) and T 2 = GL (C 2) ? 9/9/2020 Workshop WDAS 2003 20
We apply an algorithm (here Bordat’s algorithm) to context C 1 and C 2 to build respectively lattice T 1 = GL(C 1) and lattice T 2 = GL(C 2). Graph of lattice T 1 = GL(C 1) 9/9/2020 Graph of lattice T 2 = GL(C 2) Workshop WDAS 2003 21
Total number of closed pairs ( X , z ) of lattice T 1 =GL(C 1) = 12. pair(1)= X={}, z=(2, 3, 3) pair(2)= X={1}, z=(1, 0, 2) pair(3)= X={2}, z=(2, 1, 0) pair(4)= X={3}, z=(0, 3, 1) pair(5)= X={4}, z=(1, 1, 1) pair(6)= X={1, 4}, z=(1, 0, 1) pair(7)= X={2, 4}, z=(1, 1, 0) pair(8)= X={3, 4}, z=(0, 1, 1) pair(9)= X={1, 2, 4}, z=(1, 0, 0) pair(10)= X={1, 3, 4}, z=(0, 0, 1) pair(11)= X={2, 3, 4}, z=(0, 1, 0) pair(12)= X={1, 2, 3, 4}, z=(0, 0, 0). Total number of closed pairs of T 2 =GL(C 2) = 5 pair (1)= X={}, z=(2, 3, 3) pair(2)= X={5}, z=(0, 1, 3) pair(3)= X={6}, z=(2, 0, 0) pair (4)= X={5, 6}, z=(0, 0, 2) pair(5)= X={5, 6, 7}, z=(0, 0, 0). 9/9/2020 Workshop WDAS 2003 22
X 1 | X 2 {} {5} {5, 6, 7} {7} z 1 | z 2 (2, 3, 3) (0, 1, 3) (0, 0, 2) (0, 0, 0) (2, 0, 0) {} {5} {5, 6, 7} {7} (2, 3, 3) (0, 1, 3) (0, 0, 2) (0, 0, 0) (2, 0, 0) {1} {1, 5} {1, 5, 6, 7} {1, 7} (1, 0, 2) (0, 0, 2) (0, 0, 0) (1, 0, 0) {1, 2, 4} {1, 2, 4, 5} {1, 2, 4, 5, 6} {1, 2, 4, 5, 6, 7} {2, 4, 7} (1, 0, 0) (0, 0, 0) (0 , 0, 0) (1, 0, 0) {1, 2, 3, 4} {1, 2, 3, 4, 5} {1, 2, 3, 4, 5, 6, 7} {1, 2, 3, 4, 7} (0, 0, 0) (0, 0, 0) {1, 3, 4} {1, 3, 4, 5} {1, 3, 4, 5, 6} {1, 3, 4, 5, 6, 7} {1, 3, 4, 7} (0, 0, 1) (0, 0, 0) {1, 4} {1, 4, 5} {1, 4, 5, 6, 7} {1, 3, 4, 7} (1, 0, 1) (0, 0, 2) (0, 0, 0) {2} {2, 5} {2, 5, 6, 7} {2, 7} (2, 1, 0) (0, 0, 0) (2, 0, 0) {2, 3, 4} {2, 3, 4, 5} {2, 3, 4, 5, 6} {2, 3, 4, 5, 6, 7} {2, 3, 4, 7} (0, 1, 0) (0, 0, 0) {2, 4} {2, 4, 5} {2, 4, 5, 6, 7} {2, 4, 7} (0, 1, 0) (1, 1, 0) (0, 1 , 0) (0, 0, 0) (1, 0, 0) {3} {3, 5} {3, 5, 6, 7} {3, 7} (0, 3, 1) (0, 1, 1) (0, 0, 1) (0, 0, 0) {3, 4} {3, 4, 5} {3, 4, 5, 6} {3, 4, 5, 6, 7} {3, 4, 7} (0, 1, 1) (0, 0, 0) {4} {4, 5} {4, 5, 6} {4, 5, 6, 7} {4, 7} (1, 1, 1) (0, 0, 0) (1, 0, 0) { } 9/9/2020 Workshop WDAS 2003 Horizontal product of lattices T 1 = GL (C 1) and T 2 = GL (C 2) X = X 1 X 2 z = z 1 z 2 23
We apply BORDAT’s algorithm to the full context C. Graph of lattice T = GL(C) 9/9/2020 Workshop WDAS 2003 24
Total number of closed pairs (X, z) of T = GL(C) =15. pair(1)= X={}, z=(2, 3, 3) pair(2)= X={1}, z=(1, 0, 2) pair(3)= X={2}, z=(2, 1, 0) pair (4)= X={3}, z=(0, 3, 1) pair(5)= X={4}, z=(1, 1, 1) pair(6)= X={5, }, z=(0, 1, 3) pair(7)= X={1, 4}, z=(1, 0, 1) pair(8)= X={1, 5, 6}, z=(0, 0, 2) pair(9)= X={2, 4}, z=(1, 1, 0) pair(10)= X={2, 7}, z=(2, 0, 0) pair(11)= X={3, 4, 5}, z=(0, 1, 1) pair(12)= X={1, 2, 4, 7}, z=(1, 0, 0) pair(13)= X={1, 3, 4, 5, 6}, z=(0, 0, 1) pair(14)= X={2, 3, 4, 5}, z=(0, 1, 0) pair(15)= X={1, 2, 3, 4, 5, 6, 7}, z=(0, 0, 0). T = GL(C) is the horizontal product of lattices T 1 = GL(C 1) and T 2 = GL(C 2) 9/9/2020 Workshop WDAS 2003 25
Column–sharing M 1 M 2 C 1 C 2 T 1=TG(C 2) T 2=TG(C 2) (X 1 , z 1) (X 2 , z 2) M f 2 (x) = z 2 ? f 1 (X) = z 1 ? X = X 1 ∩ X 2 z = (z 1 , z 2) T=TG(C) f 1(X) = Ù {d 1(i) : i Î X }, and f 2(X) = Ù { d 2(i) : i Î X }. 9/9/2020 Workshop WDAS 2003 26
Example C 1 T 1=GL(C 1) j i 1 2 3 1 1 0 2 2 2 1 0 3 1 4 1 1 1 5 0 1 3 6 0 0 2 7 2 0 0 C j i 1 2 j i 3 1 1 0 1 2 2 2 1 2 0 3 3 1 4 1 5 0 1 5 3 6 0 0 6 2 7 2 0 7 0 C 2 T 2=GL(C 2) T = GL(C) is it egal to the vertical product of lattices T 1 = GL (C 1) and T 2 = GL (C 2) ? 9/9/2020 Workshop WDAS 2003 27
Graph of lattice T 1 = GL(C 1) 9/9/2020 Graph of lattice T 2 = GL(C 2) Workshop WDAS 2003 28
Total number of closed pairs ( X , z ) of lattice T 1 =GL(C 1) = 8. pair(1) : X={}, z =(2, 3) pair(2) : X={2}, z = (2, 1) pair(3) : X ={3}, z = (0, 3), pair(4) : X = {2, 4}, z = (1, 1) pair(5) : X= {2, 7}, z =(2, 0) pair(6) : X= {2, 3, 4, 5}, z=(0, 1), pair(7) : X = {1, 2, 4, 7}, z=(1, 0) pair(8) : X ={1, 2 , 3, 4, 5, 6, 7}, z=(0, 0). Total number of closed pairs ( X , z ) of lattice T 1 =GL(C 1) = 4. pair(1) : X = {5}, z=(3) pair(2) : X= {1, 5, 6}, z=(2) pair(3) : X= {1, 3, 4, 5, 6}, z=(1) pair(4) : X ={1, 2 , 3, 4, 5, 6, 7}, z= (0). 9/9/2020 Workshop WDAS 2003 29
X 1 X 2 {5} {1, 5, 6} {1, 3, 4, 5, 6} [1. . 7] Z 1 z 2 (3) (2) (1) (0) {} {} {} (2, 3) (2, 3, 2) (2, 3, 1) (2, 3, 0) {2} {} {2} (2, 1) (2, 1, 3) (2, 1, 2) (2, 1, 1) (2, 1, 0) {3} {} {} {3} (0, 3) (0, 3, 2) (0, 3, 1) (0, 3, 0) {2, 4} {} {} {4} {2, 4} (1, 1) (1, 1, 3) (1, 1, 2) (1, 1, 1) (1, 1, 0) {2 , 7} {} {2, 7} (2, 0) (2, 0, 3) (2, 0, 2) (2, 0, 1) (2, 0, 0) {2, 3, 4, 5} {5} {3, 4, 5} {2, 3, 4, 5} (0, 1) (0, 1, 3) (0, 1, 2) (0, 1, 1) (0, 1, 0) {1, 2, 4, 7} {} {1, 4} {1, 2, 4, 7} (1, 0) (1, 0, 3) (1, 0, 2) (1, 0, 1) (1, 0, 0) [1. . 7] {5} {1, 5, 6} {1, 3, 4, 5, 6} [1. . 7] (0, 0) (0, 0, 3) (0, 0, 2) (0, 0, 1) (0, 0, 0) 9/9/2020 Workshop WDAS 2003 X = X 1 X 2 z = (z 1 , z 2 ) T = GL(C) is the vertical product of lattices T 1 = GL(C 1) and T 2 = GL(C 2) 30
Conclusion n Generalized Galois Lattices. n Problem of large data base can be perhaps resolved in our way. n Sharing context into subsets. n Possibility of building different architectures for station’s networks. 9/9/2020 Workshop WDAS 2003 31
Thank you for Your Attention Fatma Baklouti Gérard LEVY fatma. baklouti@dauphine. fr gerardlevy@dauphine. fr 9/9/2020 Workshop WDAS 2003 32
- Slides: 32