Neural networks for processing heterogeneous graphs Federica Baccini
Neural networks for processing heterogeneous graphs Federica Baccini Department of Computer Science, University of Pisa Institute for Informatics and Telematics of CNR, Pisa
HOMOGENEOUS GRAPH It models a single relation among objects of a unique type. HETEROGENEOUS GRAPH It allows multiple relation and vertex types. 2
Heterogeneous graph: formal definition • 3
Homogeneous vs graphs One type of information in the nodes. Heterogeneous graphs Multiple types of information in the nodes. One type of relationship modeled by the Multiple types of relationships modeled by edges. the edges. Relatively easy to define similarity between Difficult to define a similarity between pairs nodes. of nodes. Neural networs and deep learning are Only few graph neural network models have efficient in performing graph embeddings, been designed for performing the same tasks link prediction, node classification… on heterogeneous graphs. 4
Some perspectives in heterogeneous graphs processing • Discovering useful theoretical properties of heterogeneous graphs for designing appropriate models; • using the acquired knowledge to extend GNN models to the heterogeneous case; • applying and testing new devised architectures in a variety of application domains. 5
A central problem: similarity search • It represents the starting point for many spectral-based methods (e. g. modularity); • Many existing methods for similarity search are designed for homogeneous graphs. • IDEA: two objects are similar if they are referenced by similar objects. 6
A central problem: similarity search • It represents the starting point for many spectral-based methods (e. g. modularity); • Many existing methods for similarity search are designed for homogeneous graphs. • IDEA: two objects are similar if they are referenced by similar objects. 6
An example for HETGs: Pathsim • Pathsim defines the similarity between objects of the same type based on the definition of a symmetric meta-path. Definition of the network schema. Definition of a similarity semantics Definition of pathsim Yizhou Sun et al. . Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. In: Proceedings of the VLDB Endowment (2011). 7
An example for HETGs: Pathsim • Pathsim defines the similarity between objects of the same type based on the definition of a symmetric meta-path. ADVANTAGE Simple and intuitive definition of similarity based on multiple relation types. Definition of the network schema. DISADVANTAGE Semantics defined by the user, computationally expensive. Definition of a similarity semantics Definition of pathsim Yizhou Sun et al. . Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. In: Proceedings of the VLDB Endowment (2011). 7
Multilayer networks • Multilayer networks represent a popular model for studying heterogeneous graph information; • theoretical properties of multilayers have been investigated; • some techniques (e. g. modularity) have been extended to multilayer networks. Bianconi, Ginestra. Multilayer networks: structure and function. Oxford university press (2018). 8
CENTRAL PROBLEM many state-of-the-art methods for similarity search require an a priori definition of a similarity measure between entities 9
A solution: learning similarity with graph neural networks Zonghan Wu et al. A comprehensive survey on graph neural networks. In: IEEE Transactions on Neural Networks and Learning Systems (2020). 10
GNNs and heterogeneous graphs • Most GNNs models are designed to process homogeneous graphs; • recent works combine classical techniques (random walks, meta-paths…) with GNNs in order to integrate heterogeneous information; • attention mechanisms are exploited to learn the most significant feature types in determining nodes representations. Xiao Wang et al. . Heterogeneous graph attention network. In: The World Wide Web Conference. (2019) Ziniu Hu et al. . Heterogeneous graph transformer. In: Proceedings of The Web Conference. (2020) 11
An example: HETGNN Chuxu Zhang et al. . Heterogeneous graph neural network. In: Proceedings of the 25 th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2019). 12
Wrap-up • Heterogeneous graphs are important because they capture relationships of different nature between objects; • the study of the properties of both homogeneous and heterogeneous graphs is necessary for the design of efficient models; • combining classical techniques with neural networks can help constructing proper graph neural networks for heterogeneous graphs. 13
Wrap-up • Heterogeneous graphs are important because they capture relationships of different nature between objects; • the study of the properties of both homogeneous and heterogeneous graphs is necessary for the design of efficient models; • combining classical techniques with neural networks can help constructing proper graph neural networks for heterogeneous graphs. 13
Wrap-up • Heterogeneous graphs are important because they capture relationships of different nature between objects; • the study of the properties of both homogeneous and heterogeneous graphs is necessary for the design of efficient models; • combining classical techniques with neural networks can help constructing proper graph neural networks for heterogeneous graphs. 13
Multi-omics integration Only a portion of the genome sequenced Whole genome sequenced. interactions studied within single omics Whole genome sequenced, interactions studied between omics 14
Proposed tasks 1. Individuating genes that are correlated in determining a certain phenotype BUILD MULTI-OMICSBASED GRAPHS OF GENES CLUSTERING USING NEURAL NETWORKS OR OTHER TOOLS. INDIVIDUATION OF GROUPS OF GENES CORRELATED WITH A PHENOTYPE IDENTIFICATION OF BIOMARKERS 15
Proposed tasks 1. Individuating genes that are correlated in determining a certain phenotype BUILD MULTI-OMICSBASED GRAPHS OF GENES 2. CLUSTERING USING NEURAL NETWORKS OR OTHER TOOLS. INDIVIDUATION OF GROUPS OF GENES CORRELATED WITH A PHENOTYPE IDENTIFICATION OF BIOMARKERS Predicting relations among multi-omics cell types profiles BUILD MULTI-OMICSBASED GRAPHS OF CELL SAMPLES LINK PREDICTION USING GNNS PREDICT RELATIONS FOR UNSEEN SAMPLES IMPROVEMENTS IN PERSONALIZED MEDICINE 15
Text mining FOREST OF DEPENDENCY TREES Entities are organized hierarchically following syntactic rules. TEXT DOCUMENT ONTOLOGY OF DOMAIN SPECIFIC KNOWLEDGE BASE KNOWLEDGE GRAPH Nodes are entities of the ontology, edges are relations among them. 16
Text mining FOREST OF DEPENDENCY TREES Entities are organized hierarchically following syntactic rules. TEXT DOCUMENT ! CHALLENGING ! ONTOLOGY OF DOMAIN SPECIFIC KNOWLEDGE BASE KNOWLEDGE GRAPH Nodes are entities of the ontology, edges are relations among them. 16
DEPENDENCY TREE OF A FRAGMENT KNOWLEDGE GRAPH OF A DOCUMENT Olivier Bodenreider. The unifed medical language system (UMLS): integrating biomedical terminology. In: Nucleic acids research (2004). 17
Proposed tasks 1. Learn how to turn the dependency tree into the knowledge graph of a document. Train a recursive/graph neural network for learning relations in the knowledge graph of a document, with dependency tree as inputs. 18
Proposed tasks 1. Learn how to turn the dependency tree into the knowledge graph of a document. Train a recursive/graph neural network for learning relations in the knowledge graph of a document, with dependency tree as inputs. 2. Integrate the new discovered information in a comprehensive knowledge graph of a set of documents. Exploit graph neural networks for predicting links in the comprehensive graph, starting from single knowledge graphs. 18
Conclusions • Heterogeneous graphs can model a broader variety of applications with respect to homogeneous graphs; • many existing models for heterogeneous graphs processing combine classical techniques with neural networks; • it is relevant to study theoretical properties of heterogeneous graphs for extending GNNs to the heterogeneous case. 19
Thank you! Federica Baccini contact: federica. baccini@phd. unipi. it
- Slides: 27