Attacking the Kad Network E ChanTin P Wang

  • Slides: 15
Download presentation
Attacking the Kad Network E. Chan-Tin, P. Wang, J. Tyra, T. Malchow, D. Foo

Attacking the Kad Network E. Chan-Tin, P. Wang, J. Tyra, T. Malchow, D. Foo Kune, N. Hopper, Y. Kim, Yongdae Kim

P 2 P Applications ^ File Sharing : Napster, Gnutella, Bit. Torrent, etc ^

P 2 P Applications ^ File Sharing : Napster, Gnutella, Bit. Torrent, etc ^ Recent Commercial Applications 4 Skype 4 Bit. Torrent becomes legit 4 P 2 P TV by Yahoo Japan ^ Research community 4 P 2 P File and archival systems: Ivy, Kosha, Oceanstore, CFS 4 Web caching: Squirrel, Coral 4 Multicast systems: SCRIBE 4 P 2 P DNS: Co. DNS and Co. Do. NS 4 Internet routing: RON 4 Next generation Internet Architecture: I 3 2

P 2 P Systems ^ How to find the desired information? 4 Centralized structured:

P 2 P Systems ^ How to find the desired information? 4 Centralized structured: Napster 4 Decentralized unstructured: Gnutella Napster. com 4 Decentralized structured: Distributed Hash Table -Content Addressable! Match O K V O ^ A DHT provides K V a hash table’s simple put/get interface Napster K V 4 Insert a data object, i. e. , Kkey-value pair (k, v) V 4 Retrieve the value v using key k K V P: a node looking for a file V B O: offerer of. Athe. Kfile … K V P K V Query. Hit X Download retrieve (K 1) 3

DHT: Terminologies ^ ^ Every node has a unique ID: node. ID Every object

DHT: Terminologies ^ ^ Every node has a unique ID: node. ID Every object has a unique ID: key Keys and node. IDs are logically arranged on a ring (ID space) A data object is stored at its root(key) and several replica roots 4 Closest node. ID to the key (or successor of k) ^ ^ C Q A X D B Y R (k, v) Range: the set of keys that a node is responsible for Routing table size: O(log(N)) Routing delay: O(log(N)) hops Content addressable! k

Target P 2 P System ^ Kad 4 A peer-to-peer DHT based on Kademlia

Target P 2 P System ^ Kad 4 A peer-to-peer DHT based on Kademlia ^ Kad Network 4 Overnet: an overlay built on top of e. Donkey clients -Used by P 2 P Bots 4 Overlay built using e. D 2 K series clients -e. Mule, a. Mule, MLDonkey -Over 1 million nodes, many more firewalled users 4 BT series clients -Overlay on Azureus -Overlay on Mainline and Bit. Comet 5

Kademlia Protocol 01001011 123. 24. 3. 1 00100101 23. 37. 12. 13 01011010 311.

Kademlia Protocol 01001011 123. 24. 3. 1 00100101 23. 37. 12. 13 01011010 311. 1. 3. 4 … 01000001 129. 5. 3. 1 0 1 11011011 11000100 11111110 11001011 0 1 … 11010001011 10010100 10001110 … 10101100 K bucket 10101100 0 1 10000001 1 11000100 1100 Find/store 11001010 0 ^ d(X, Y) = X XOR Y ^ An entry in k-bucket shares at least k-bit prefix with the node. ID 4 k=20 in overnet ^ Add new contact if 4 k-bucket is not full ^ Parallel, iterative, prefix-matching routing ^ Replica roots: k closest nodes 6

Kad Protocol 10101100 1 1 1 0 0 0 1 0 15 14 13

Kad Protocol 10101100 1 1 1 0 0 0 1 0 15 14 13 12 1 1 0 11 10 1 9 1 0 8 1 7 0 0 0 6 1 5 1 0 4 3 0 2 1 0 0 1 1 0 ^ No restriction on node. ID ^ Wide routing table short routing path ^ Replica root: |r, k| < ^ K bucket in i-th level covers 1/2 i ID space ^ K buckets with index [0, 4] can be ^ A knows new node by asking or contact from split if new contact is added to other nodes full bucket ^ Hello_req is used for liveness 4 routing request can be used 7

Vulnerabilities of Kad ^ No admission control, no verifiable binding 4 An attacker can

Vulnerabilities of Kad ^ No admission control, no verifiable binding 4 An attacker can launch a Sybil attack by generating an arbitrary number of IDs ^ Eclipse Attack 4 Stay long enough: Kad prefers long-lived contact 4 (ID, IP) update: Kad client will update IP for a given ID without any verification ^ Termination condition 4 Query terminates when A receives 300 matches. ^ Timeout 4 When M returns many contacts close to K, A contacts only those nodes and timeouts. 8

Actual Attack ^ Preparation phase 4 Backpointer Hijacking: 8 A, attacker M -Learns A’s

Actual Attack ^ Preparation phase 4 Backpointer Hijacking: 8 A, attacker M -Learns A’s Routing Table by sending appropriate queries -Then, change routing table by sending the following message. 0 x. D 00 D IPMB A Hello, B, IPM M ^ Execution phase 4 Provide many non-existing contacts -Fact: Query will timeout after trying 25 contacts. 9

Screen Shots 10

Screen Shots 10

Summary of Estimated Cost ^ Assumption 4 Total 1 M nodes 4 800 routing

Summary of Estimated Cost ^ Assumption 4 Total 1 M nodes 4 800 routing table entries 4 100 Mbps network link ^ Preparation cost 4 41. 2 GB bandwidth to hijack 30% of routing table 4 Takes 55 minutes with 100 Mbps link ^ Query prevention 4 100 Mbps link is sufficient to stop 65% of WHOLE query messages. 11

Large scale simulation ^ 11, 303 ~ 16, 105 Kad nodes running on ~500

Large scale simulation ^ 11, 303 ~ 16, 105 Kad nodes running on ~500 Planet. Lab machines ^Comparison between expected and measured 4 keyword query failures 4 Number of messages used to attack one node 4 Bandwidth usage 12

Self reflection attack ^ Fill node A’s routing table with A itself. A C

Self reflection attack ^ Fill node A’s routing table with A itself. A C IPC … G IPG A C Hello, X, IPA G Attack C … G C G ^ ≈ 100% queries failed after attack ^ Nodes can recover slowly ^ Second round of attack 13

Mitigations ^ Identity authentication Method Secure Persistent ID Incremental deployable Verify the liveness of

Mitigations ^ Identity authentication Method Secure Persistent ID Incremental deployable Verify the liveness of old IP No Yes Drop Hello with new IP Yes No Yes ID=hash(IP) Yes No No ID=hash(Public Key) Yes No ^ Routing correctness 4 Independent parallel routes -Incrementally deployable backpointers Current method Independent parallel routes 40% 98% fail 45% fail 10% 59. 5% fail 1. 7% fail 14

Then

Then