Representation Learning on Networks Yuxiao Dong Microsoft Research

  • Slides: 76
Download presentation
Representation Learning on Networks Yuxiao Dong Microsoft Research, Redmond Joint work with Kuansan Wang

Representation Learning on Networks Yuxiao Dong Microsoft Research, Redmond Joint work with Kuansan Wang (MSR); Jie Tang, Jiezhong Qiu, Jie Zhang, Jian Li (Tsinghua University); Hao Ma (MSR & Facebook AI)

Microsoft Academic Graph 664, 195 fields of study 48, 728 journals 4, 391 conferences

Microsoft Academic Graph 664, 195 fields of study 48, 728 journals 4, 391 conferences 219 million papers/patents/books/preprints 240 million authors 25, 512 Institutions https: //academic. microsoft. com as of May 25, 2019

Example 1: Inferring Entities’ Research Topics CS Math ? Physics, Math Physics Biology ?

Example 1: Inferring Entities’ Research Topics CS Math ? Physics, Math Physics Biology ? 240 million authors Shen, Ma, Wang. A Web-scale system for scientific knowledge exploration. ACL 2018

Example 2: Inferring Scholars’ Future Impact ? ? Dong, Johnson, Chawla. Will This Paper

Example 2: Inferring Scholars’ Future Impact ? ? Dong, Johnson, Chawla. Will This Paper Increase Your h-index? Scientific Impact Prediction. WSDM 2015.

Example 3: Inferring Future Collaboration Dong, Johnson, Xu, Chawla. Structural Diversity and Homophily: A

Example 3: Inferring Future Collaboration Dong, Johnson, Xu, Chawla. Structural Diversity and Homophily: A Study Across More Than One Hundred Big Networks. KDD 2017.

Example 3: Inferring Future Collaboration P 1 ( | ) P 2 ( |

Example 3: Inferring Future Collaboration P 1 ( | ) P 2 ( | ) Dong, Johnson, Xu, Chawla. Structural Diversity and Homophily: A Study Across More Than One Hundred Big Networks. KDD 2017.

The network mining paradigm Graph & network applications X • • • Node label

The network mining paradigm Graph & network applications X • • • Node label inference; Link prediction; User behavior… … hand-crafted feature matrix feature engineering machine learning models

Representation learning for networks Graph & network applications • • Z Node label inference;

Representation learning for networks Graph & network applications • • Z Node label inference; Node clustering; Link prediction; … … hand-crafted latent feature matrix Feature engineering learning machine learning models

Network Embedding Random Walk Skip Gram (dense) Matrix Factorization (sparse) Matrix Factorization

Network Embedding Random Walk Skip Gram (dense) Matrix Factorization (sparse) Matrix Factorization

Word embedding in NLP o o o Computational lens on big social and information

Word embedding in NLP o o o Computational lens on big social and information networks. The connections between individuals form the structural … In a network sense, individuals matters in the ways in which. . . Accordingly, this thesis develops computational models to investigating the ways that. . . We study two fundamental and interconnected directions: user demographics and network diversity. . . sentences X Word embedding models • Harris’ distributional hypothesis: words in similar contexts have similar meanings. • Key idea: try to predict the words that surrounding each one. 1. Harris, Z. (1954). Distributional structure. Word, 10(23): 146 -162. 2. Bengio, et al. Representation learning: A review and new perspectives. In IEEE TPAMI 2013. 3. Mikolov, et al. Efficient estimation of word representations in vector space. In ICLR 2013. latent feature matrix

Network embedding o Computational lens b c a on big social and d e

Network embedding o Computational lens b c a on big social and d e f information networks. g h o … … sentences skip-gram Feature engineering learning hand-crafted latent feature matrix

Network embedding: Deep. Walk v 1 v 3 v 2 v 3 v 5

Network embedding: Deep. Walk v 1 v 3 v 2 v 3 v 5 v 1 v 2 v 3 v 5 v 3 v 1 v 3 v 5 v 3 v 4 v 1 v 2 v 1 v 3 v 4 sentences node-paths skip-gram hand-crafted latent feature matrix Feature learning Perozzi et al. Deep. Walk: Online learning of social representations. In KDD’ 14, pp. 701– 710. Most Cited Paper in KDD’ 14.

Distributional Hypothesis of Harris • Word embedding: words in similar contexts have similar meanings

Distributional Hypothesis of Harris • Word embedding: words in similar contexts have similar meanings (e. g. , skip-gram in word embedding) • Node embeddings: nodes in similar structural contexts are similar • Deep. Walk: structural contexts are defined by co-occurrence over random walk paths Harris, Z. (1954). Distributional structure. Word, 10(23): 146 -162.

The main idea behind

The main idea behind

Network embedding: Deep. Walk Graph & network applications • • Node label inference; Node

Network embedding: Deep. Walk Graph & network applications • • Node label inference; Node clustering; Link prediction; … … Perozzi et al. Deep. Walk: Online learning of social representations. In KDD’ 14, pp. 701– 710. Most Cited Paper in KDD’ 14.

Random Walk Strategies • Random Walk – Deep. Walk (walk length > 1) –

Random Walk Strategies • Random Walk – Deep. Walk (walk length > 1) – LINE (walk length = 1) • Biased Random Walk • 2 nd order Random Walk – node 2 vec • Metapath guided Random Walk – metapath 2 vec

node 2 vec Picture snipped from Leskovec

node 2 vec Picture snipped from Leskovec

Heterogeneous graph embedding: metapath 2 vec meta-path-based random walks heterogeneous skip-gram Dong, Chawla, Swami.

Heterogeneous graph embedding: metapath 2 vec meta-path-based random walks heterogeneous skip-gram Dong, Chawla, Swami. metapath 2 vec: Scalable Representation Learning for Heterogeneous Networks. KDD 2017

Application: Embedding Heterogeneous Academic Graph fields of study journal conference metapath 2 vec paper/patent/book

Application: Embedding Heterogeneous Academic Graph fields of study journal conference metapath 2 vec paper/patent/book affiliation author Microsoft Academic Graph & AMiner

Application 1: Related Venues

Application 1: Related Venues

Application 2: Similarity Search (Institution) Microsoft Johns Hopkins UChicago Harvar d Facebook Stanford AT&T

Application 2: Similarity Search (Institution) Microsoft Johns Hopkins UChicago Harvar d Facebook Stanford AT&T Labs Google MIT Yale Columbia CMU

Network Embedding Random Walk Skip Gram Deep. Walk, LINE, node 2 vec, metapath 2

Network Embedding Random Walk Skip Gram Deep. Walk, LINE, node 2 vec, metapath 2 vec

What are the fundamentals underlying random-walk + skip-gram based network embedding models?

What are the fundamentals underlying random-walk + skip-gram based network embedding models?

Network embedding as matrix factorization: unifying deepwalk, line, pte, and node 2 vec Qiu

Network embedding as matrix factorization: unifying deepwalk, line, pte, and node 2 vec Qiu et al. , Network embedding as matrix factorization: unifying deepwalk, line, pte, and node 2 vec. In WSDM’ 18. 27

Unifying Deep. Walk, LINE, PTE, & node 2 vec as Matrix Factorization The most

Unifying Deep. Walk, LINE, PTE, & node 2 vec as Matrix Factorization The most cited paper in KDD 14 • Deep. Walk The most cited paper in WWW 15 • LINE • PTE The 5 th most cited paper in KDD 15 • node 2 vec The 2 nd most cited paper in KDD 16 Adjacency matrix Degree matrix b: #negative samples T: context window size 1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node 2 vec. In WSDM’ 18. The most cited paper in WSDM’ 18 as of May 2019

Understanding random walk + skip gram ? Levy and Goldberg. Neural word embeddings as

Understanding random walk + skip gram ? Levy and Goldberg. Neural word embeddings as implicit matrix factorization. In NIPS 2014

Understanding random walk + skip gram

Understanding random walk + skip gram

Understanding random walk + skip gram

Understanding random walk + skip gram

Understanding random walk + skip gram •

Understanding random walk + skip gram •

Understanding random walk + skip gram • Distinguish direction and distance

Understanding random walk + skip gram • Distinguish direction and distance

Understanding random walk + skip gram

Understanding random walk + skip gram

Understanding random walk + skip gram

Understanding random walk + skip gram

Understanding random walk + skip gram

Understanding random walk + skip gram

Understanding random walk + skip gram

Understanding random walk + skip gram

Understanding random walk + skip gram

Understanding random walk + skip gram

Understanding random walk + skip gram

Understanding random walk + skip gram

Understanding random walk + skip gram

Understanding random walk + skip gram

Understanding random walk + skip gram Deep. Walk is asymptotically and implicitly factorizing 1.

Understanding random walk + skip gram Deep. Walk is asymptotically and implicitly factorizing 1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node 2 vec. In WSDM’ 18. The most cited paper in WSDM’ 18 as of May 2019 Adjacency matrix Degree matrix b: #negative samples T: context window size

Unifying Deep. Walk, LINE, PTE, & node 2 vec as Matrix Factorization • Deep.

Unifying Deep. Walk, LINE, PTE, & node 2 vec as Matrix Factorization • Deep. Walk • LINE • PTE • node 2 vec Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node 2 vec. In WSDM’ 18. The most cited paper in WSDM’ 18 as of May 2019

Can we directly factorize the derived matrices for learning embeddings?

Can we directly factorize the derived matrices for learning embeddings?

Net. MF: explicitly factorizing the DW matrix Matrix Factorization Deep. Walk is asymptotically and

Net. MF: explicitly factorizing the DW matrix Matrix Factorization Deep. Walk is asymptotically and implicitly factorizing 1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node 2 vec. In WSDM’ 18. The most cited paper in WSDM’ 18 as of May 2019

Net. MF How can we solve this issue? 1. Construction 2. Factorization 1. Qiu

Net. MF How can we solve this issue? 1. Construction 2. Factorization 1. Qiu et al. Net. SMF: Network embedding as sparse matrix factorization. In WWW 2019

Net. MF • Deep. Walk is asymptotically and implicitly factorizing • Net. MF is

Net. MF • Deep. Walk is asymptotically and implicitly factorizing • Net. MF is explicitly factorizing

Experimental Setup

Experimental Setup

Experimental Results Predictive performance on varying the ratio of training data; The x-axis represents

Experimental Results Predictive performance on varying the ratio of training data; The x-axis represents the ratio of labeled data (%)

Network Embedding Random Walk Skip Gram (dense) Matrix Factorization Net. MF

Network Embedding Random Walk Skip Gram (dense) Matrix Factorization Net. MF

Challenges Net. MF is not practical for very large networks dense

Challenges Net. MF is not practical for very large networks dense

Net. MF How can we solve this issue? 1. Construction 2. Factorization 1. Qiu

Net. MF How can we solve this issue? 1. Construction 2. Factorization 1. Qiu et al. Net. SMF: Network embedding as sparse matrix factorization. In WWW 2019

Net. SMF--Sparse How can we solve this issue? 1. Sparse Construction 2. Sparse Factorization

Net. SMF--Sparse How can we solve this issue? 1. Sparse Construction 2. Sparse Factorization 1. Qiu et al. Net. SMF: Network embedding as sparse matrix factorization. In WWW 2019

Fast & Large-Scale Network Representation Learning Qiu et al. , Net. SMF: Network embedding

Fast & Large-Scale Network Representation Learning Qiu et al. , Net. SMF: Network embedding as sparse matrix factorization. In WWW 2019. 54 Tutorial @WWW 2019

 For random-walk matrix polynomial where and non-negative in time for undirected graphs •

For random-walk matrix polynomial where and non-negative in time for undirected graphs • Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, and Shang-Hua Teng, Efficient Sampling for Gaussian Graphical Models via Spectral Sparsification, COLT 2015. • Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, and Shang-Hua Teng. Spectral sparsification of random-walk matrix polynomials. ar. Xiv: 1502. 03496.

 For random-walk matrix polynomial where and non-negative in time

For random-walk matrix polynomial where and non-negative in time

 For random-walk matrix polynomial where and non-negative in time

For random-walk matrix polynomial where and non-negative in time

 For random-walk matrix polynomial where and non-negative in time

For random-walk matrix polynomial where and non-negative in time

Net. SMF --- Sparse Factorize the constructed matrix

Net. SMF --- Sparse Factorize the constructed matrix

Net. SMF---bounded approximation error

Net. SMF---bounded approximation error

Network Embedding Random Walk Skip Gram Deep. Walk, LINE, node 2 vec, metapath 2

Network Embedding Random Walk Skip Gram Deep. Walk, LINE, node 2 vec, metapath 2 vec (dense) Matrix Factorization Net. MF (sparse) Matrix Factorization Net. SMF

Much More Fast and Scalable Network Representation Learning Zhang et al. , Pro. NE:

Much More Fast and Scalable Network Representation Learning Zhang et al. , Pro. NE: Fast and Scalable Network Representation Learning. In IJCAI 2019 69

Network Embedding Random Walk Skip Gram Deep. Walk, LINE, node 2 vec, metapath 2

Network Embedding Random Walk Skip Gram Deep. Walk, LINE, node 2 vec, metapath 2 vec (dense) Matrix Factorization Net. MF (sparse) Matrix Factorization Net. SMF

Pro. NE: More fast & scalable network embedding

Pro. NE: More fast & scalable network embedding

Embedding enhancement via spectral propagation

Embedding enhancement via spectral propagation

 • Chebyshev Expansion for Efficiency • To avoid explicit eigendecomposition and Fourier transform

• Chebyshev Expansion for Efficiency • To avoid explicit eigendecomposition and Fourier transform o Chebyshev expansion

Efficiency 20 Threads 19 hours 98 mins 1. 1 M nodes

Efficiency 20 Threads 19 hours 98 mins 1. 1 M nodes

Efficiency 20 Threads 19 hours 98 mins 1 Thread 10 mins 1. 1 M

Efficiency 20 Threads 19 hours 98 mins 1 Thread 10 mins 1. 1 M nodes Pro. NE offers 10 -400 X speedups (1 thread vs 20 threads)

Scalability & Effectiveness Embed 100, 000 nodes by one thread: 29 hours with performance

Scalability & Effectiveness Embed 100, 000 nodes by one thread: 29 hours with performance superiority

Embedding enhancement

Embedding enhancement

A general embedding enhancement framework

A general embedding enhancement framework

Network Embedding Random Walk Skip Gram Deep. Walk, LINE, node 2 vec, metapath 2

Network Embedding Random Walk Skip Gram Deep. Walk, LINE, node 2 vec, metapath 2 vec (dense) Matrix Factorization Net. MF (sparse) Matrix Factorization Net. SMF (sparse) Matrix Factorization Pro. NE

Tutorials & Open Data • Representation learning on networks – Tutorial at WWW 2019

Tutorials & Open Data • Representation learning on networks – Tutorial at WWW 2019 – ~500 slides at https: //www. aminer. cn/nrl_www 2019 • Computational models on social & information network analysis – Tutorial at KDD 2018 – ~400 slides at https: //www. aminer. cn/kdd 18 -sna • Bi-Weekly Microsoft Academic Graph Updates – Spark & Databricks on Azure – https: //docs. microsoft. com/en-us/academic-services/graph/ – The data is FREE! • Open Academic Graph – Joint work with AMiner@Tsinghua University – https: //www. openacademic. ai/oag/ – See detail at Zhang et al. KDD 2019.

Microsoft Academic Graph 664, 195 fields of study 48, 728 journals 4, 391 conferences

Microsoft Academic Graph 664, 195 fields of study 48, 728 journals 4, 391 conferences 219 million papers/patents/books/preprints 240 million authors 25, 512 Institutions https: //academic. microsoft. com as of May 25, 2019

References 1. F. Zhang, X Liu, J Tang, Y Dong, P Yao, J Zhang,

References 1. F. Zhang, X Liu, J Tang, Y Dong, P Yao, J Zhang, X Gu, Y Wang, B Shao, R, K. Wang. OAG: Toward Linking Large-scale Heterogeneous Entity Graphs. ACM KDD 2019. 2. Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Chi Wang, Kuansan Wang, and Jie Tang. Net. SMF: Large -Scale Network Embedding as Sparse Matrix Factorization. WWW 2019. 3. Jie Zhang, Yuxiao Dong, Yan Wang, Jie Tang, and Ming Ding. Pro. NE: Fast and Scalable Network Representation Learning. IJCAI 2019. 4. Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. Network Embedding as Matrix Factorization: Unifying Deep. Walk, LINE, PTE, and node 2 vec. WSDM 2018. 5. Yuxiao Dong, Nitesh V. Chawla, Ananthram Swami. metapath 2 vec: Scalable Representation Learning for Heterogeneous Networks. KDD 2017. 6. Yuxiao Dong, Reid A. Johnson, Jian Xu, Nitesh V. Chawla. Structural Diversity and Homophily: A Study Across More Than One Hundred Big Networks. KDD 2017. 7. Yuxiao Dong, Reid A. Johnson, Nitesh V. Chawla. Will This Paper Increase Your h-index? Scientific Impact Prediction. WSDM 2015.

Thank you!

Thank you!