Chapter 10 Link Analysis Data Mining Techniques So

  • Slides: 12
Download presentation
Chapter 10 Link Analysis

Chapter 10 Link Analysis

Data Mining Techniques So Far… • Chapter 5 – Statistics • Chapter 6 –

Data Mining Techniques So Far… • Chapter 5 – Statistics • Chapter 6 – Decision Trees • Chapter 7 – Neural Networks • Chapter 8 – Nearest Neighbor Approaches: Memory. Based Reasoning and Collaborative Filtering • Chapter 9 – Market Basket Analysis and Association Rules 2

Introduction • Airline Route Maps are useful • Hyperlinks were revolutionary – Apple’s Hyper.

Introduction • Airline Route Maps are useful • Hyperlinks were revolutionary – Apple’s Hyper. Card (Bill Atkinson) • Claim that there are no more than 6 degrees of separation between any two people on the planet • Link Analysis is the data mining technique that addresses relationships and connections • Link Analysis is based on Graph Theory 3

Introduction • As you would expect, Link Analysis has its limitations as a DM

Introduction • As you would expect, Link Analysis has its limitations as a DM technique also • However, quite effective in these and similar situations – Identifying authoritative sources of information on the WWW by analyzing page links – Understanding physician referral patterns – Analyzing telephone call patterns 4

Basic Graph Theory • Graphs are an abstraction used to represent relationships • Graphs

Basic Graph Theory • Graphs are an abstraction used to represent relationships • Graphs consist of – Nodes (vertices) which are things in the graph that have relationships – Edges are pairs of nodes connected by a relationship • Visualization is a key characteristic of a graph 5

Basic Graph Theory • A path is an ordered sequence of nodes connected by

Basic Graph Theory • A path is an ordered sequence of nodes connected by edges – Flight Segments (legs) such as LA – Denver – Boston • A weighted graph is one in which the edges have weights associated with them – Example: Weights support the association between two products being purchased together 6

Graph Theory Classic Problems 1. Finding a path in the graph that visits every

Graph Theory Classic Problems 1. Finding a path in the graph that visits every edge exactly one time (Seven Bridges – edges are bridges and nodes are land) 2. Finding the shortest path that visits the nodes in the graph exactly one time (Traveling Salesman) – Completely connected graph with n nodes has n! (n factorial) unique paths that contain all nodes (5! = 5 * 4 * 3 * 2 * 1 = 120) 7

Directed vs Undirected Graphs • Undirected graphs – edges between nodes go in both

Directed vs Undirected Graphs • Undirected graphs – edges between nodes go in both directions (A to B; B to A) • Directed graphs – edges between nodes only go in one direction (A to B is different than B to A) – Ex: WWW 8

Google – Directed Graph Example • Web pages = nodes • Hyperlinks = edges

Google – Directed Graph Example • Web pages = nodes • Hyperlinks = edges • Spiders & Web crawlers updating • Kleinberg’s Algorithm – Hub – a page that links to many authorities – Authority – a page that is linked to by many hubs 9

Google – example continued • Authority versus mere popularity – Rank by number of

Google – example continued • Authority versus mere popularity – Rank by number of unrelated sites linking to a site yields popularity – Rank by number of subjectrelated hubs that point to them yields authority – Helps to overcome the situation that often arises in popularity where the real authority (eg Home Page) is ranked lower because of lack of popularity of links to it 10

Examples of Link Analysis • Recent Int’l Data Mining Conference – http: //www. siam.

Examples of Link Analysis • Recent Int’l Data Mining Conference – http: //www. siam. org/meetings/sdm 04/ • Chapter 10 -Example 1. pdf • Chapter 10 -Example 2. pdf • Chapter 10 -Example 3. pdf • Megaputer (Poly. Analyst vendor) page: – http: //www. megaputer. com/products/pa/algorithms/la. php 3 11

End of Chapter 10 12

End of Chapter 10 12