Frequent Neighborhood Patterns Mining Algorithms and Applications Jialong

Frequent Neighborhood Patterns: Mining Algorithms and Applications Jialong Han Doctoral thesis work, supervised by Prof. Ji-Rong Wen

Outline • Background • Frequent Neighborhood Patterns: Definitions • Mining Algorithm • Applications • Knowledge Discovery in Graphs • Within-Network Classification • Reverse Top-k Queries • Conclusions 2020/11/27 2

Graphs Social Networks 2 Web Graphs 3 Molecule Structure Databases 1 2020/11/27 Academic Networks 4 Knowledge Bases 5 3

Graph Databases: Two Settings [KK 05] • Graph-transaction setting • Core concept: transactions • Molecule structure databases • Properties of a transaction depends on its structure. • Frequent subgraph mining • Applications 2020/11/27 • Single-graph setting • Social networks, web graphs, academic networks, knowledge bases, … • Core concept: nodes • Persons, web pages, papers, general entities, … 4

Frequent Patterns for Nodes (in the Single-Graph Setting)? • Properties of a node depends on its surrounding structure. • Academic networks: an author citing his own paper • Social networks: a person with a son and a daughter • Within a molecule structure: a carbon atom appearing on a cycle of length 6 • Problems to be answered in this thesis 1. Is there a class of frequent patterns characterizing the common surrounding structures of many nodes? 2. If yes, can these frequent patterns support any node-related applications? 2020/11/27 “Mining Frequent Neighborhood Patterns in a Large Labeled Graph”, CIKM’ 13 5

Problem Formulation • Pivot NP: authors once citing their own papers 2020/11/27 Single-graph database “Mining Frequent Neighborhood Patterns in a Large Labeled Graph”, CIKM’ 13 6

Mining Algorithm FNM（Frequent Neighborhood The Apriori Framework Mining） • 2020/11/27 “Mining Frequent Neighborhood Patterns in a Large Labeled Graph”, CIKM’ 13 7

Building Block Theorem of FNM • Path Patterns Search Space Extend … Level 1 φ Level 0 Frequent subgraph mining 2020/11/27 • BBs Non-BBs FNM “Mining Frequent Neighborhood Patterns in a Large Labeled Graph”, CIKM’ 13 8

Application 1: Knowledge Discovery in Single-Graphs • Frequent neighborhood patterns • has easy-to-interpret semantics, and • helps discover hidden knowledge in single-graphs. 2020/11/27 “Mining Frequent Neighborhood Patterns in a Large Labeled Graph”, CIKM’ 13 9

Application 2: Within-Network Classification • ? ? 2020/11/27 ”Within-Network Classification Using Radius-Constrained Neighborhood Patterns”, CIKM’ 14 10

Preliminary Results and Problems RL-RW-Deg #Feature - 906. 2 4804. 1 7370. 7 7978. 6 F 1 0. 804 0. 824 0. 836 Time(s) 79. 6 3. 1 18. 3 28. 8 31. 4 Label ratio = 50% • 27/11/2020 ”Within-Network Classification Using Radius-Constrained Neighborhood Patterns”, CIKM’ 14 11

Markov Assumption for WNC [MP 07] • 2020/11/27 ”Within-Network Classification Using Radius-Constrained Neighborhood Patterns”, CIKM’ 14 12

BB Theorem of Radius-Constrained FNM（r-FNM） • 2020/11/27 ”Within-Network Classification Using Radius-Constrained Neighborhood Patterns”, CIKM’ 14 13

Superiorities of r-FNM • 2020/11/27 ”Within-Network Classification Using Radius-Constrained Neighborhood Patterns”, CIKM’ 14 14

Application 3: Reverse Top-k Queries • Knowledge bases • A single-graph database • Access interface: structural query languages • Hard for ordinary users to formulate queries • Can we find the query using Which chess player was born and representative partial died in the same place 6 ? answers? • “Representative” • Persons born in Europe Which chess player was born and died in the same place? SELECT ? uri WHERE { ? uri : type : Chess. Player. ? uri : birth. Place ? place. ? uri : death. Place ? place } ？ Complete Answers M. Botvinnik P. Morphy … Representative Partial Answers M. Botvinnik 2020/11/27 “Discovering Neighborhood Pattern Queries by Sample Answers in Knowledge Base”, ICDE’ 16 15

Reverse Top-k Neighborhood Pattern Queries SELECT ? uri WHERE { • ? uri : type : Chess. Player. ? uri : birth. Place ? place. ? uri : death. Place ? place = } 2020/11/27 “Discovering Neighborhood Pattern Queries by Sample Answers in Knowledge Base”, ICDE’ 16 16

Refine Stage: Observations and Optimizations • B. Obama V. Putin G. Kasparov E. Lasker M. Botvinnik P. Morphy 2020/11/27 Persons born in Europe V. Putin G. Kasparov E. Lasker M. Botvinnik Rank: 4 Chess players dying in his birth place M. Botvinnik P. Morphy Rank: 1 “Discovering Neighborhood Pattern Queries by Sample Answers in Knowledge Base”, ICDE’ 16 17

Experiments • DBpedia 3. 9 knowledge base, 52 questions in QALD-4 -Task-1 dataset, allocated into 5 groups w. r. t. the shape of their ground truth query. • Efficiency evaluation • Three optimizations: speedup of up to 1 to 2 orders of magnitude each. • Effectiveness evaluation • Two examples are enough to narrow down the sets of returned queries. 2020/11/27 “Discovering Neighborhood Pattern Queries by Sample Answers in Knowledge Base”, ICDE’ 16 18

Related Work • Frequent subgraph mining • Graph-transaction setting [IWM 00, KK 04, YH 02] • Single-graph setting [KK 05, VGS 02, FB 07, BN 08] • Within-network classification • Homophily-based [MP 03] • Neighborhood-structure-based [DK 09, NGK 13] • Reverse queries • Reverse engineering SQL queries [TCP 09, ZEPS 13, SCC+14] • Reverse nearest neighbor queries [KM 00]、reverse top-k queries [VDKN 10]、 Reverse skyline queries [DS 07] 2020/11/27 19

Conclusions • We proposed a new class of node patterns in the single-graph setting: Frequent Neighborhood Patterns. • Algorithmic challenge: non-trivial building blocks • We discussed three applications of frequent neighborhood patterns. • Knowledge discovery, within-network classification, and reverse top-k queries • Future work: other node-centric applications in single-graph databases Setting Graphtransaction Patterns Designed for Applications Frequent Pattern Discovery Classification Reverse Queries Subgraph Transactions [IWM 00, KK 04, YH 02] patterns Subgraphs [KK 05, VGS 02, FB 07, BN 08] Patterns Single-graph Neighborhood Nodes √ Patterns 2020/11/27 Indexing [DKK 03] [YYH 04] √ √ Future work 20

Thank you! Q&A 2020/11/27 21

References 1 Picture is from http: //icep. wikispaces. com/2 D+chemical+database+searching+systems 2 Picture is from http: //7. mshcdn. com/wp-content/uploads/2012/09/social-graph-640. jpeg 3 Picture is from http: //www. analiticaweb. es/wp-content/uploads/2009/09/google. page. rank. explained. jpg 4 Picture is from http: //pages. cs. wisc. edu/~lixiujun/samples/social/dblp 5 Picture is from http: //resources. mpi-inf. mpg. de/yago-naga/yago/img/yago-graph. png 6 Picture is from http: //upload. chinaz. com/upimg/allimg/091020/1718320. gif [AIS 93] Rakesh Agrawal, Tomasz Imielinski, and Arun N. Swami. Mining association rules between sets of items in large databases. In SIGMOD Conference, pages 207– 216, 1993. [BN 08] Björn Bringmann and Siegfried Nijssen. What is frequent in a single graph? In PAKDD, pages 858– 863, 2008. [DK 09] Christian Desrosiers and George Karypis. Within-network classification using local structure similarity. In ECML/PKDD (1), pages 260– 275, 2009. [DKK 03] Mukund Deshpande, Michihiro Kuramochi, and George Karypis. Frequent sub-structure-based approaches for classifying chemical compounds. In ICDM, pages 35– 42, 2003. 2020/11/27 22

References (cont. ) [DS 07] Evangelos Dellis and Bernhard Seeger. Efficient computation of reverse skyline queries. In Proceedings of the 33 rd international conference on Very large data bases, pages 291– 302. VLDB Endowment, 2007. [FB 07] Mathias Fiedler and Christian Borgelt. Subgraph support in a single large graph. In Data Mining Workshops, 2007. ICDM Workshops 2007, pages 399– 404. IEEE, 2007. [IWM 00] Akihiro Inokuchi, Takashi Washio, and Hiroshi Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In PKDD, pages 13– 23, 2000. [KK 04] Michihiro Kuramochi and George Karypis. An efficient algorithm for discovering frequent subgraphs. Knowledge and Data Engineering, 16(9): 1038– 1051, 2004. [KK 05] Michihiro Kuramochi and George Karypis. Finding frequent patterns in a large sparse graph. Data Min. Knowl. Discov. , 11(3): 243– 271, 2005. [KM 00] Flip Korn and S Muthukrishnan. Influence sets based on reverse nearest neighbor queries. In ACM SIGMOD Record, volume 29, pages 201– 212. ACM, 2000. [MP 03] Sofus A Macskassy and Foster Provost. A simple relational classifier. In Proc. of the 2 nd Workshop on Multi-Relational Data Mining (MRDM) at KDD, pages 64– 76, 2003. [MP 07] Sofus A. Macskassy and Foster J. Provost. Classification in networked data: A toolkit and a univariate case study. Journal of Machine Learning Research, 8: 935– 983, 2007. 2020/11/27 23

References (cont. ) [NGK 13] Marion Neumann, Roman Garnett, and Kristian Kersting. Coinciding walk kernels: Parallel absorbing random walks for learning with graphs and few labels. In Asian Conference on Machine Learning, pages 357– 372, 2013. [SCC+14] Yanyan Shen, Kaushik Chakrabarti, Surajit Chaudhuri, Bolin Ding, and Lev Novik. Discovering queries based on example tuples. In SIGMOD, 2014. [TCP 09] Quoc Trung Tran, Chee-Yong Chan, and Srinivasan Parthasarathy. Query by output. In SIGMOD, 2009. [VDKN 10]Akrivi Vlachou, Christos Doulkeridis, Yannis Kotidis, and Kjetil Norvag. Reverse top-k queries. In ICDE, 2010. [VGS 02] Natalia Vanetik, Ehud Gudes, and Solomon Eyal Shimony. Computing frequent graph patterns from semistructured data. In ICDM, pages 458– 465, 2002. [YH 02] Xifeng Yan and Jiawei Han. gspan: Graph-based substructure pattern mining. In ICDM, pages 721– 724, 2002. [YYH 04] Xifeng Yan, Philip S. Yu, and Jiawei Han. Graph indexing: A frequent structure-based approach. In SIGMOD Conference, pages 335– 346, 2004. [ZEPS 13] Meihui Zhang, Hazem Elmeleegy, Cecilia M Procopiuc, and Divesh Srivastava. Reverse engineering complex join queries. In SIGMOD, 2013. 2020/11/27 24