CMU SCS Giga Tensor Scaling Tensor Analysis Up

  • Slides: 22
Download presentation
CMU SCS Giga. Tensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and

CMU SCS Giga. Tensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Evangelos Papalexakis Abhay Harpale Christos Faloutsos School of Computer Science Carnegie Mellon University KDD 2012 U Kang (CMU) 1

CMU SCS Outline Problem Definition Algorithm Discoveries Conclusions KDD 2012 U Kang (CMU) 2

CMU SCS Outline Problem Definition Algorithm Discoveries Conclusions KDD 2012 U Kang (CMU) 2

CMU SCS Background: Tensor n Tensors (=multi-dimensional arrays) are everywhere q Hyperlinks and anchor

CMU SCS Background: Tensor n Tensors (=multi-dimensional arrays) are everywhere q Hyperlinks and anchor texts in Web graphs 1 Anchor Text 1 C# 1 URL 2 1 1 C++ 1 Java URL 1 KDD 2012 1 U Kang (CMU)

CMU SCS Background: Tensor n Tensors (=multi-dimensional arrays) are everywhere q q Sensor stream

CMU SCS Background: Tensor n Tensors (=multi-dimensional arrays) are everywhere q q Sensor stream (time, location, type) Predicates (subject, verb, object) in knowledge base “Eric Clapton plays guitar” “Barrack Obama is the president of U. S. ” KDD 2012 (48 M ) NELL (Never Ending Language Learner) data Nonzeros =144 M (26 M ) U Kang (CMU) (26 M )

CMU SCS Problem Definition n Q 1: How to decompose a billion-scale tensor? q

CMU SCS Problem Definition n Q 1: How to decompose a billion-scale tensor? q Corresponds to SVD in 2 D case KDD 2012 U Kang (CMU)

CMU SCS Problem Definition n Q 2: What are the important concepts and synonyms

CMU SCS Problem Definition n Q 2: What are the important concepts and synonyms in a KB tensor? q q Q 2. 1: What are the dominant concepts in the knowledge base tensor? Q 2. 2: What are the synonyms to a given noun phrase? (48 M ) NELL (Never Ending Language Learner) data Nonzeros =144 M (26 M ) KDD 2012 U Kang (CMU) (26 M ) 6

CMU SCS Outline Problem Definition Algorithm Discoveries Conclusions KDD 2012 U Kang (CMU) 7

CMU SCS Outline Problem Definition Algorithm Discoveries Conclusions KDD 2012 U Kang (CMU) 7

CMU SCS Algorithm: Problem Definition n Q 1: How to decompose a billion-scale tensor?

CMU SCS Algorithm: Problem Definition n Q 1: How to decompose a billion-scale tensor? q Corresponds to SVD in 2 D case KDD 2012 U Kang (CMU)

CMU SCS De ta Challenge n ils Alternating Least Square (ALS) Algorithm (K=48 M)

CMU SCS De ta Challenge n ils Alternating Least Square (ALS) Algorithm (K=48 M) (J=26 M ) (I=26 M) : Khatri-Rao : Hadamard : pseudo-inverse How to design fast Map. Reduce algorithm for the ALS? KDD 2012 U Kang (CMU)

CMU SCS Main Idea n 1. Ordering of Computation De ta ils Our choice

CMU SCS Main Idea n 1. Ordering of Computation De ta ils Our choice FLOPS (NELL data) KDD 2012 U Kang (CMU)

CMU SCS De ta Main Idea n ils 2. Avoiding Intermediate Data Explosion (K=48

CMU SCS De ta Main Idea n ils 2. Avoiding Intermediate Data Explosion (K=48 M) (J=26 M ) (I=26 M) Size of Intermediate Data (NELL) - Naïve: 100 PB KDD 2012 U Kang (CMU)

CMU SCS De ta Main Idea n ils 2. Avoiding Intermediate Data Explosion (After)

CMU SCS De ta Main Idea n ils 2. Avoiding Intermediate Data Explosion (After) (Before) Size of Intermediate Data (NELL) - Naïve: 100 PB KDD 2012 Size of Intermediate Data (NELL) - Proposed: 1. 5 GB U Kang (CMU)

CMU SCS Experiments n Giga. Tensor solves 100 x larger problem (K) (J) Giga.

CMU SCS Experiments n Giga. Tensor solves 100 x larger problem (K) (J) Giga. Tensor (I) r o s n box e T ool T KDD 2012 100 x Out of Memory U Kang (CMU) Number of nonzero = I / 50

CMU SCS Outline Problem Definition Algorithm Discoveries Conclusions KDD 2012 U Kang (CMU) 14

CMU SCS Outline Problem Definition Algorithm Discoveries Conclusions KDD 2012 U Kang (CMU) 14

CMU SCS Discoveries: Problem Definition n Q 2: What are the important concepts and

CMU SCS Discoveries: Problem Definition n Q 2: What are the important concepts and synonyms in a KB tensor? q q Q 2. 1: What are the dominant concepts in the knowledge base tensor? Q 2. 2: What are the synonyms to a given noun phrase? (48 M ) NELL (Never Ending Language Learner) data Nonzeros =144 M (26 M ) KDD 2012 U Kang (CMU) (26 M ) 15

CMU SCS A 2. 1: Concept Discovery n Concept Discovery in Knowledge Base KDD

CMU SCS A 2. 1: Concept Discovery n Concept Discovery in Knowledge Base KDD 2012 U Kang (CMU)

CMU SCS A 2. 1: Concept Discovery KDD 2012 U Kang (CMU)

CMU SCS A 2. 1: Concept Discovery KDD 2012 U Kang (CMU)

CMU SCS A 2. 2: Synonym Discovery n Synonym Discovery in Knowledge Base a

CMU SCS A 2. 2: Synonym Discovery n Synonym Discovery in Knowledge Base a 1 a 2 … a. R (Given) noun phrase (Discovered) synonym 1 (Discovered) synonym 2 KDD 2012 U Kang (CMU)

CMU SCS A 2. 2: Synonym Discovery KDD 2012 U Kang (CMU)

CMU SCS A 2. 2: Synonym Discovery KDD 2012 U Kang (CMU)

CMU SCS Outline Problem Definition Algorithm Discoveries Conclusions KDD 2012 U Kang (CMU) 20

CMU SCS Outline Problem Definition Algorithm Discoveries Conclusions KDD 2012 U Kang (CMU) 20

CMU SCS Conclusion n Giga. Tensor: scalable tensor decomposition algorithm for billion-length modes tensors

CMU SCS Conclusion n Giga. Tensor: scalable tensor decomposition algorithm for billion-length modes tensors q q Algorithm: avoid intermediate data explosion Discoveries: concept discovery and contextual synonym detection on KB tensor KDD 2012 U Kang (CMU) 21

CMU SCS Thank you ! www. cs. cmu. edu/~pegasus www. cs. cmu. edu/~ukang KDD

CMU SCS Thank you ! www. cs. cmu. edu/~pegasus www. cs. cmu. edu/~ukang KDD 2012 U Kang (CMU) 22