# g Approx Mining Frequent Approximate Patterns from a

- Slides: 27

g. Approx: Mining Frequent Approximate Patterns from a Massive Network Cheny, Xifeng Yanz, Feida Zhuy, Jiawei Han [ICDM 2007] reporter: Che-Wei, Liang 10/16 1

Outline • Introduction • Problem Formulation • Algorithm – Pattern Space Exploration – Support Counting • Experiment • Conclusions 2

Introduction • A set of graphs vs. a single network • Recently, a large number of graphs with massive sizes and complex structures in many applications. – Biological networks, social networks, Web. – demanding powerful data mining methods. • Now interested in patterns that frequently appear at many different places of a single network. 3

Introduction • Protein-Protein Interaction (PPI) network △= degree of approximation = 5 4

Two major complications 1. Mining frequent patterns in a single network – Partition it into regions – Each contains one occurrence of the pattern 2. Due to various inherent noise or data diversity, it is crucial to account for approximations so that all potentially interesting patterns can be captured. 5

Outline • Introduction • Problem Formulation • Algorithm – Pattern Space Exploration – Support Counting • Experiment • Conclusion 6

Problem Formulation 7

Approximate Pattern Occurrences • Injective function m: Vp → VG mapping each vertex v Vp to m(v) VG • Quantify the degree of approximation m incurs i. e. , approximations can only happen within the matchable list. 8

Approximate Pattern Occurrences 9

Approximate Pattern Occurrences 10

Approximate Pattern Occurrences 11

Pattern Support with Approximation 12

Pattern Support with Approximation 13

Pattern Support with Approximation 14

Outline • Introduction • Problem Formulation • Algorithm – Pattern Space Exploration – Support Counting • Experiment • Conclusion 15

Algorithm • Two major issues: 1. Pattern Space Exploration 2. Support Counting – Enumerate approximate occurrences of each pattern in the network. – Decide the maximal number of disjoint occurrences. 16

Pattern Space Exploration • Decompose pattern space – Find all connected vertex sets in G that contain 1. – Remove 1 from G, and find all connected vertex sets in the new graph G’ that contain 2. – And so on so forth … 17

Pattern Space Exploration • Example: Generating all connected vertex sets starting from 1. Stage 1. Start from 1 and mark 1. Stage 2. Expand from 1 to reach 2, 5, 6. Mark 2, 5, 6. There are totally seven connected vertex sets in this stage. {1, 2}, {1, 5}, {1, 6}, {1, 2, 5}, {1, 2, 6}, {1, 5, 6}, {1, 2, 5, 6} Stage 3. Taking each of the seven connected vertex sets in stage 2 as a starting point, continue expansion. Stage 4. Until there are no more unmarked vertices. 18

19

20

21

Theorem 1 Explore() in Algorithm 1 is both complete and redundancy-free, i. e. , given a network G (1) it only generates connected vertex sets in G. (2) it can generate all connected vertex sets in G. (3) it does not generate the same connected vertex set more than once. 22

Support Counting • A pattern P’s support is defined to be the maximal number of “disjoint” ones that can be chosen from P’s approximate occurrences in the network. — NP-Complete maximal independent set. • Use algorithm 2 can provide an upperbound. 23

Support Counting 24

g. Approx • g. Approx – Combine with pattern space exploration and support counting. – Conditional branch on the 3 rd line of Algorithm 1’s DFS_horizontal() function. 25

Experiment 26

Conclusions • Give an approximation measure and show its impact on mining. – count a pattern’s support based on its approximate occurrences in the network. • The techniques is general – can be applied to networks from other domains. • Can be modified – to reach bigger, more interesting patterns even faster – with some sacrifice on the completeness of mining results. 27

- Approx mining
- Mining frequent patterns associations and correlations
- Mining frequent patterns associations and correlations
- Eclat algorithm
- Mining frequent patterns without candidate generation
- Mining frequent patterns without candidate generation
- Where is the carrying capacity on a graph
- Apriori algorithm
- Strip mining vs open pit mining
- Mining multimedia databases
- Chapter 13 mineral resources and mining
- Eck
- Difference between strip mining and open pit mining
- Web text mining
- Maj. buang-ly
- Association analysis advanced concepts
- Frequent earthquakes in an area may indicate *
- The thunder was accompanied ____ heavy rains
- Maximal and closed frequent itemsets
- Laag frequent geluid
- Milady chapter 22 review questions
- Finding frequent items in data streams
- Frequent negative prefix
- 10 common prefixes
- Association pattern mining
- Nfrequent
- Effective frequent-shopper programs are transparent.
- Icd 10 dismenore