Seminar Series Social Information Systems Manos Papagelis Department

  • Slides: 32
Download presentation
Seminar Series Social Information Systems Manos Papagelis Department of Computer Science, University of Toronto

Seminar Series Social Information Systems Manos Papagelis Department of Computer Science, University of Toronto papaggel@cs. toronto. edu Toronto, Spring, 2007 Computer Science Department, University of Toronto 1

Presentation Outline Part I: Exploiting Social Networks for Internet Search Part II: An Experimental

Presentation Outline Part I: Exploiting Social Networks for Internet Search Part II: An Experimental Study of the Coloring Problem on Human Subject Networks 2

Exploiting Social Networks for Internet Search Alan Mislove, Krishna Gummadi, and Peter Druschel, Hot.

Exploiting Social Networks for Internet Search Alan Mislove, Krishna Gummadi, and Peter Druschel, Hot. Nets 2006 Part I Computer Science Department, University of Toronto 3

Introduction § Social Networking (SN) A new form of publishing and locating information §

Introduction § Social Networking (SN) A new form of publishing and locating information § Objective To understand whether these social links can be exploited by search engines to provide better results § Contributions • Comparison of the mechanisms in Web and online SN for § Publishing: Mechanisms to make information available to users § Locating: Mechanisms to find information • Results from an experiment in social network-based Web Search 4 • Challenges and opportunities in using Social

Web vs. SN (1/2) Web § Publishing: By placing documents on a Web Server

Web vs. SN (1/2) Web § Publishing: By placing documents on a Web Server (and then search for incoming links) § Locating: Via Search engines (Exploiting the link graph) Pros § Very Effective (incoming links are good indicators of importance) Limitations § No fresh data § No personalized results § Unlinked pages are not indexed 5

Web vs. SN (2/2) Social Networks § Publishing: No explicit links between content (photos,

Web vs. SN (2/2) Social Networks § Publishing: No explicit links between content (photos, videos, blogs) but implicit links between content through explicit links between users. § Locating: • Navigation through the social network and browsing users’ content • Keyword based search for textual or tagged content • Through "Top-10" lists Pros § Helps a user find timely, relevant information by browsing adjacent regions of the network of users with similar interests § Content is rated rapidly (by comments and feedback of a 6

Integration of Web Search and SN § Web and SN information is disjoint §

Integration of Web Search and SN § Web and SN information is disjoint § No unified search tool that locates information across different systems 7

Peer. Spective: SN-based Web Search § Technology: • Lucene text search engine and Free.

Peer. Spective: SN-based Web Search § Technology: • Lucene text search engine and Free. Pastry P 2 P Overlay • Lightweight HTTP Proxy transparently indexes all visited URLs of user 8

Searching Process § A query is submitted by a user to Google § The

Searching Process § A query is submitted by a user to Google § The proxy transparently forwards the query to both Google and the Proxies of Users in the network § Each proxy executes the query on the local index § Results are then collated and presented alongside Google results § Peerspective Ranking: Lucene Sc. + Pagerank + Scores from users who previously viewed the result 9

Search Results Example 10

Search Results Example 10

Experiments § 10 grad. students share downloaded or viewed Web content § One month

Experiments § 10 grad. students share downloaded or viewed Web content § One month long experiments § 200. 000 Distinct URLs § 25% were of type text/html or application/pdf (so the can be indexed) Reports On: § Limits of hyperlink-based search § Benefits of SN-based Search 11

Limits of hyperlink-based search § Report on fraction of visited URLs that are not

Limits of hyperlink-based search § Report on fraction of visited URLs that are not indexed by Google • Too new page (blogs) • Deep Web • Dark Web (no links) Results § About 1/3 of requests cannot be retrieved by Google § Peerspective’s indices covers 30% of the requested URLs § 13. 3% of URLs were contained in Peer. Spective but not in Google's index 12

Random samples of URLs not in Google and Potential Reason 13

Random samples of URLs not in Google and Potential Reason 13

Benefits of SN-based Search § Experiments on clicks on results on first page For

Benefits of SN-based Search § Experiments on clicks on results on first page For 1730 queries (1079 resulted in clicks) Results § 86. 5% of the clicked results were returned only by Google § 5. 7% of the clicked results were returned by both § 7. 7% of the clicked results were returned only by Peer. Spective Conclusions § This 7. 7% is considered to be the gold standard of web search engineering 14

Reasons for Clicks on Peerspective § Disambiguation Community tend to share definitions or interpretation

Reasons for Clicks on Peerspective § Disambiguation Community tend to share definitions or interpretation of popular terms (bus) § Ranking SN information can bias the ranking algorithms to the interests of users (Cool. Streaming) § Serendipity Ample opportunity of finding interesting things without searching 15

Example of URLs found in Peerspective 16

Example of URLs found in Peerspective 16

Opportunities and Challenges § Privacy • Willingness of users to disclose information • Need

Opportunities and Challenges § Privacy • Willingness of users to disclose information • Need for mechanisms to control information flow and anonymity § Membership and Clustering of SN • Users may participate in many networks • Need for searching with respect to the different clusters § Content rating and ranking • New approaches to ranking search results • System Architecture: centralized or Distributed? 17

An Experimental Study of the Coloring Problem on Human Subject Networks Michael Kearns, Siddharth

An Experimental Study of the Coloring Problem on Human Subject Networks Michael Kearns, Siddharth Suri, Nick Montfort, SCIENCE, (313), Aug 2006 Part II Computer Science Department, University of Toronto 18

Experimental Study on Human Subject Networks § Theoretical work suggests that structural properties of

Experimental Study on Human Subject Networks § Theoretical work suggests that structural properties of naturally occurring networks are important in shaping behavior and dynamics • E. g. Hubs in networks are important in routing information § Empirical Structural Properties established by many disciplines • Small Diameter (the “six” degrees of separation) • Local clustering of connectivity • Heavy-tailed distribution of connectivity (Power-law distributions) § Empirical Studies of Networks • Limitation: Networks are fixed and given (no alternatives) 19

Experiment § Experimental Scenario • Distributed problem-solving from local information § Experimental Setting •

Experiment § Experimental Scenario • Distributed problem-solving from local information § Experimental Setting • 38 human subjects (network vertices) • Each subject controls the color of a vertex in a network • Networks: simple and more complex • Goal: Select a different color from that of all neighbors • Problem: Coloring problem • Information Available: Variable (Low, Medium, High) 20

Graph Coloring Problem § Graph coloring An assignment of "colors" to certain objects in

Graph Coloring Problem § Graph coloring An assignment of "colors" to certain objects in a graph such that no two adjacent objects are assigned the same color § Graph Coloring Problem Find the minimum number of colors for an arbitrary graph (NP-hard) § Chromatic number The least number of colors needed to color the graph Example § Vertex coloring § A 3 -coloring suits this graph but fewer colors would result in adjacent vertices of the same color 21

Network Topologies Simple Cycle 5 -Chord Cycle Leader Cycle Pref. Att. v=2 20 -Chord

Network Topologies Simple Cycle 5 -Chord Cycle Leader Cycle Pref. Att. v=2 20 -Chord Cycle Pref. Att. v=3 22

Information View Low Medium (Color of each Neighbor) (#of Links of each Neighbor) All

Information View Low Medium (Color of each Neighbor) (#of Links of each Neighbor) All (All network) 3 6 3 YOU 7 Overall Progress YOU 10 Overall Progress 23

Graph Properties and Experimental Results Graph Properties Experimental Results Avg. Distanc e Avg. Exp.

Graph Properties and Experimental Results Graph Properties Experimental Results Avg. Distanc e Avg. Exp. Duratio n (sec) # Exp. Solved (sec) No. of Change s 2 9. 76 144. 17 5/6 378 2 4 5. 63 121. 14 7/7 687 2 2 7 3. 34 65. 67 6/6 8265 Leader Cycle 2 3 19 2. 31 40. 86 7/7 8797 Pref. Att. V=2 3 2 13 2. 63 219. 67 2/6 1744 Pref. Att. V=3 4 3 22 2. 08 154. 83 4/6 4703 Colors Require d Min Links Max Links Simple Cycle 2 2 5 -Chord Cycle 2 20 Chord Cycle 24

1: Collective Performance § Subjects could indeed solve the coloring problem across a wide

1: Collective Performance § Subjects could indeed solve the coloring problem across a wide range of networks • 31/38 experiments ended in solution in less that 300 seconds • 82 sec mean completion time § Collective Performance affected by network structure • Preferential Attachment harder than Cycle-based networks § Cycle-based networks: • Monotonic relationship between solution time and average network distance (smaller distance leading to shorter solution times) § Addition of random chords: Systematically reduces solution time 25

2: Human Performance VS Artificial Distributed Heuristics Heuristic considered: § A vertex is randomly

2: Human Performance VS Artificial Distributed Heuristics Heuristic considered: § A vertex is randomly selected • If there are unused colors in the neighbor of this vertex then a color is selected randomly from the available ones • If there are not unused then a color is selected randomly Comparison measure § Number of vertex color changes Findings: § Results exactly reversed: lower average distance increases the difficulty for the heuristic § Preferential attachment networks easier for the heuristic 26

3: Effects on Varying the Locality of Information View § Variable locality information provided

3: Effects on Varying the Locality of Information View § Variable locality information provided to subjects • Low: Their own and neighboring colors are visible • Medium: Their own and neighboring colors are visible but providing information on connectivity of neighbors • High: global coloring state at all times Findings: § Increased amount of information • Reduces solution times for cycle-based networks • Decreases solution times for preferential attachment networks • Rapid convergence to one of the two solutions in cycle-based networks 27

Information View Effect 1: Pref. Att. VS Cyclebased Networks 28

Information View Effect 1: Pref. Att. VS Cyclebased Networks 28

Information View Effect 2: Cycle-based Solution Convergence Low Information View Population oscillates between approaches

Information View Effect 2: Cycle-based Solution Convergence Low Information View Population oscillates between approaches to the two solutions High Information View Rapid convergence to one of the Two possible solutions 29

Individual Strategies § Choosing colors that result in the fewest local conflicts § Attempt

Individual Strategies § Choosing colors that result in the fewest local conflicts § Attempt to avoid conflicts with highly connected subjects § Signaling behavior of subjects § Introducing conflicts to avoid local minima 30

Questions? Computer Science Department, University of Toronto 31

Questions? Computer Science Department, University of Toronto 31

Thanks! Computer Science Department, University of Toronto 32

Thanks! Computer Science Department, University of Toronto 32