Tim Benke Supervisors Josiane Xavier Parreira Sebastian Michel
Tim Benke Supervisors: Josiane Xavier Parreira, Sebastian Michel Bachelor thesis P 2 PDATING FOR BUILDING SEMANTIC OVERLAY NETWORKS
Why P 2 P-Networks ? Decentralisation No single point of failure No content-control Distribution of content, computing power, bandwith
Querying in P 2 P-networks TTL: 34 hop 12 0 hops Dirk Nowitzki
Idea Semantic Overlay Networks Querying in unstructured P 2 P-networks message flooding with Time. To. Live many redundant messages Group peers according to their content Querying in Semantic Overlay Network (SON) only ask all nodes for specific content field
Querying in a SON computer science geology Dirk Nowitzki basketball flowers
How to build a SON Dating Contact other peer P If( is. Friend(P) ) Add P in list of friends Add P‘s friends in list of candidates is. Friend(P) judged by How high is the similarity? How small is the overlap? How well did P cooperate?
Process of P 2 P-dating peer to send to chosen from 3 lists: friends, candidates, random send check-alive message to friends send contact message to candidates and random peers receive synopses of collections and compute scores friend and candidate lists have fixed lengths Add until full then drop worst peers
Search in SON peer P sends queries to peers with similar interest profile, i. e. all friends Each peer only sends his top-k results back When all answers have arrived P merges results, removes duplicates and delivers top-k results
Strategies for scores Similarity Only: Overlap Only: Weighted Sum: Random: no Score computed Similarity(A, B) 0 = the same >0 until ∞ : differs
Overlap Measure Minwise Independent Permutations measure the overlap with formula: = hashs of documents
Similarity Measure Kullback Leibler Divergence/ Relative Entropy Similarity(A, B) 0 = the same >0 until ∞ : differs
PASTRY: network infrastructure Distributed Hash Table maps keys to peers currently responsible for that key MINERVA uses PASTRY O( log(N) ) hops for any message to reach any destination
Local Collections Index file saved on hard disk LUCENE Index is an Inverted Index for terms occuring in websites obtained by user – with surfing (e. g. by a plugin) crawler on bookmarks Allows additions and deletions
Experimental Setup NUTCH was used as crawler Seeds: 14 -16 start URL‘s on a certain topic from del. icio. us and dmoz. org Depth: 2 each peer ~400 pages peer 1 -4 Basketball peer 5 -7 Computer Science peer 8 -10 Flowers peer 11 -12 Geology Queries for peer 1: „playoffs“, „Dirk Nowitzki“ Queries for peer 7: „thesis“ Queries for peer 12: „earth science“
Chart 1 Comparision for 75 Iterations between - 5 random peers - and p 2 pdating for 5 friends with weighted sum strategy, alpha=0. 8 y-axis: recall x-axis: iterations in steps of 5
Chart 1
Chart 2 Comparision for 50 Iterations between - random peers asked - and p 2 pdating for x friends with weighted sum strategy, alpha=0. 8 y-axis: recall x-axis: #peers asked
Chart 2
Conclusion Use of PASTRY as underlying routing/networking infrastructure Implementation of details of peer-to-peer network, p 2 pdating algorithm Messages-handling several message types protocol for sending and receiving messages Adaption of NUTCH to crawling Use of LUCENE to query indexes Experiments show benifit of P 2 PDating algorithm
Future Work Further Experiments: real-world data from bookmark lists of active del. icio. us users Firefox- or Proxy-Plugin for on-the-fly indexing, querying and display of results Further Applications: Adaption to MINERVA P 2 P Web Search
Thank you for your interest Tim Benke PLAGIA 11/3/2020 21
Free. Pastry Free open source version under BSD-license called Free. Pastry provides application level interface to underlying P 2 P-Network API for Java 1. 5 Version used: 2. 0 Beta
Overview Basics of P 2 P-networks Querying in P 2 P-networks Overlap and Similarity Computation Process of P 2 P-dating Application examples: Firefox plugin del. icio. us
Chart 2 Comparision for 50 Iterations between - random peers asked - and p 2 pdating for x-1 Friends and 1 Stranger with weighted sum strategy, alpha=0. 8 - only K-L-Divergence y-axis: recall x-axis: #Peers asked
Chart 1 Comparision for 75 Iterations between - 5 random peers - and p 2 pdating for 4 Friends and 1 Stranger with weighted sum strategy, alpha=0. 8 - only K-L-Divergence y-axis: recall x-axis: iterations in steps of 5
O: P 2 P-Dating Project Internet Crawls performed with APACHEProject NUTCH provides collections Collections are indexed by NUTCH and a LUCENE index is produced 1 similarity measure and 1 overlap measure used to determine if node is a Friend
Process of P 2 P-dating Friend List Michael Jordan
- Slides: 27