Concurrent Data Structures Concurrent Algorithms 2017 Igor Zablotchi
Concurrent Data Structures Concurrent Algorithms 2017 Igor Zablotchi (based in part on slides by Tudor David and Vasileios Trigonakis) Igor Zablotchi | 12. 2017 1
Data Structures (DSs) • Constructs for efficiently storing and retrieving data – Different types: lists, hash tables, trees, queues, … • Accessed through the DS interface – Depends on the DS type, but always includes – Store an element – Retrieve an element • Element – Set: just one value – Map: key/value pair CA Igor Zablotchi | 12. 2017 2
Concurrent Data Structures (CDSs) • Concurrently accessed by multiple threads – Through the CDS interface linearizable operations! • Really important on multi-cores • Used in most software systems ASCY Igor Zablotchi | 12. 2017 3
What do we care about in practice? • Progress of individual operations sometimes • More often: Throughput (Mop/s) – Number of operations per second (throughput) – The evolution of throughput as we increase 12 the number of threads (scalability) 10 8 6 4 2 0 1 10 20 30 40 # Threads CA Igor Zablotchi | 12. 2017 4
DS Example: Linked List delete(6) 1 2 3 insert(4) 5 CA 8 4 • A sequence of elements (nodes) • Interface – search (aka contains) – insert – remove (aka delete) 6 struct node { value_t value; struct node* next; }; Igor Zablotchi | 12. 2017 5
Search Data Structures • Interface search(k) 1. search update(k) 2. insert updates 3. remove k parse(k) modify(k) • Semantics 1. 2. 3. 4. read-only read-write CA Igor Zablotchi | 12. 2017 6
Concurrency Control • How threads synchronize their writes to the shared memory (e. g. , nodes) – Locks – CAS – Transactional memory OPTIK Igor Zablotchi | 12. 2017 7
Optimistic vs. Pessimistic Concurrency 10 traverse c i t is m ti p o 8 6 traverse k loc k 4 2 loc Throughput (Mop/s) 12 20 -core Xeon 1024 elements 0 1 10 20 pessimistic 30 40 # Cores "bad" linked list "good" linked list (Lesson 1) Optimistic concurrency is the only way to get OPTIK scalability Igor Zablotchi | 12. 2017 8
Tools for Optimistic Concurrency Control (OCC) • RCU: slow in the presence of updates – (also a memory reclamation scheme) • STM: slow in general • HTM: not ubiquitous, not very fast (yet) • Wait-free algorithms: slow in general • (Optimistic) Lock-free algorithms: • Optimistic lock-based algorithms: We either need a lock-free or an optimistic lock-based OPTIK algorithm Igor Zablotchi | 12. 2017 9
Parenthesis: Target platform 2 -socket Intel Xeon E 5 -2680 v 2 Ivy Bridge – 20 cores @ 2. 8 GHz, 40 hyper-threads – 25 MB LLC (per socket) – 256 GB RAM CA c c c c c Igor Zablotchi | 12. 2017 10
Concurrent Linked Lists – 5% Updates Blocking Lock-free 1024 elements 5% updates Wait-free Throughput (Mops/s) 12 10 8 6 4 2 0 1 5 9 13 17 21 25 Number of threads 29 33 37 Wait-free algorithm is slow OPTIK Igor Zablotchi | 12. 2017 11
Optimistic Concurrency in Data Structures operation Pattern optimistic prepare perform (non-synchronized) (synchronized) validate optimistic (synchroniz perform prepare ed) detect conflicting failed concurrent operations find insertion spot validate Example linked list insert Validation plays a key role in concurrent data structures OPTIK Igor Zablotchi | 12. 2017 12
Validation in Concurrent Data Structures • Lock-free: atomic operations optimistic prepare validate & perform (atomic ops) failed – marking pointers, flags, helping, … • Lock-based: lock validate optimistic prepare loc validat perfor e m k unlock failed unlock – flags, pointer reversal, parsing twice, … Validation is what differentiates algorithms OPTIK Igor Zablotchi | 12. 2017 13
Let’s design two concurrent linked lists: A lock-free and a lock-based Igor Zablotchi | 12. 2017 14
Lock-free Sorted Linked List: Naïve Search find spot Insert find modification spot CAS Delete find modification spot CAS return Is this a correct (linearizable) linked list? OPTIK Igor Zablotchi | 12. 2017 15
Lock-free Sorted Linked List: Naïve – Incorrect P 1: CAS P 1: find modification spot P 0: Insert(x) P 1: Delete(y) P 0: find modification spot P 0: CAS y x Lost update! • What is the problem? – Insert involves one existing node; – Delete involves two existing nodes How can we fix the problem? OPTIK Igor Zablotchi | 12. 2017 16
Lock-free Sorted Linked List: Fix • Idea! To delete a node, make it unusable first… – Mark it for deletion so that 1. You fail marking if someone changes next pointer; 2. An insertion fails if the predecessor node is marked. In other words: delete in two steps 2. CAS(remove) 1. CAS(mark) 1. Mark for deletion; and then find modification spot Delete(y) 2. Physical deletion y OPTIK Igor Zablotchi | 12. 2017 17
1. Failing Deletion (Marking) P 1: CAS(mark) fa P 1: find modification spot P 0: Insert(x) P 1: Delete(y) P 0: find modification spot P 0: CAS y x • Upon failure restart the operation – Restarting is part of “all” state-of-the-art-data structures OPTIK Igor Zablotchi | 12. 2017 18
1. Failing Insertion due to Marked Node P 1: CAS(remove) P 1: CAS(mark) P 1: find modification spot P 0: Insert(x) P 1: Delete(y) P 0: find modification spot P 0: CAS false y • Upon failure restart the operation – Restarting is part of “all” state-of-the-art-data structures How can we implement marking? OPTIK Igor Zablotchi | 12. 2017 19
Implementing Marking (C Style) • Pointers in 64 bit architectures – Word aligned - 8 bit aligned! next pointer 0 0 0 e d o n boolean mark(node_t* n) uintptr_t unmarked = n->next & ~0 x 1 L; uintptr_t marked = n->next | 0 x 1 L; return CAS(&n->next, unmarked, marked) == unmarked; OPTIK Igor Zablotchi | 12. 2017 20
Lock-free List: Putting Everything Together • Traversal: traverse (requires unmarking nodes) • Search: traverse • Insert: traverse CAS to insert • Delete: traverse CAS to mark CAS to remove What • Garbage (marked) nodes happers if this CAS fails? ? – Cleanup while traversing A pragmatic implementation of lock-free linked lists (helping in this course’s terms) OPTIK Igor Zablotchi | 12. 2017 21
What is not Perfect with the Lock-free List? 1. Garbage nodes – Increase path length; and – Increase complexity if (is_marked_node(n)) … 2. Unmarking every single pointer – Increase complexity curr = get_unmark_ref(curr->next) Can we simplify the design with locks? OPTIK Igor Zablotchi | 12. 2017 22
Lock-based Sorted Linked List: Naïve Search find spot Insert find modification spot lock Delete lock(target) find modification spot lock(predecessor) return Is this a correct (linearizable) linked list? OPTIK Igor Zablotchi | 12. 2017 23
Lock-based List: Validate After Locking Search find spot return validate !pred->marked && pred->next did not change find modification spot lock Insert mark(curr) lock(curr) find modification spot lock(predecessor) Delete !pred->marked && !curr->marked && pred->next did not change OPTIK Igor Zablotchi | 12. 2017 24
Throughput (Mop/s) Concurrent Linked Lists – 0% updates 50 45 40 35 30 25 20 15 10 5 0 1024 elements 0% updates Just because the lock-based is not unmarking! 1 10 20 30 40 # Cores lock-free lock-based (Lesson 2) Sequential complexity matters Simplicity OPTIK Igor Zablotchi | 12. 2017 25
Another DS Example: the Skiplist • The linked list is: – Easy to understand/design – But slow: O(n) for search, insert & remove • A good alternative: the binary search tree (BST) – O(log(n)) search, insert & remove if balanced (else O(n)) – Needs rebalancing: slow • An even better alternative: the skiplist – O(log(n)) search, insert & remove – Builds on the simplicity of the linked list CA Igor Zablotchi | 12. 2017 26
Skiplist Overview • Linked list: – One next pointer per node • Skiplist: – Multiple levels of pointers per node … node Level 1 Level 0 node CA Igor Zablotchi | 12. 2017 27
Skiplist Overview 1 3 5 7 12 10 Each node has a random number of levels Higher levels are shortcuts for lower levels CA Igor Zablotchi | 12. 2017 28
Searching in a Skiplist 1 3 5 7 10 12 We’re searching for 7! CA Igor Zablotchi | 12. 2017 29
Inserting in a Skiplist (single-threaded) 1 3 5 10 12 7 We want to insert 7 CA Igor Zablotchi | 12. 2017 30
Deleting from a Skiplist (single-threaded) 1 3 5 7 10 12 We want to delete 7 CA Igor Zablotchi | 12. 2017 31
Let’s design a lock-free skiplist! Igor Zablotchi | 12. 2017 32
Lock-free Skiplist – Searches • Similar to the single-threaded case • Search for the element on every level, starting with the topmost level • Element is in the skiplist if present on level 0. 1 3 CA 5 7 10 12 Igor Zablotchi | 12. 2017 33
Lock-free Skiplist – Insert • • • Randomly choose number of levels of new node Find predecessors and successors for new element Set element’s next pointers to successors Atomically link element into level 0 (lin. point) Link element into higher levels, one by one 1 3 CA 5 10 7 12 Igor Zablotchi | 12. 2017 34
Lock-free Skiplist – Delete • Find predecessors and successors for element • Atomically mark element’s next pointers one by one, starting from top • Atomically mark bottom level next pointer (lin. point) • Unlink marked node from all levels 1 3 CA 5 7 10 12 Igor Zablotchi | 12. 2017 35
Optimistic Concurrency Control: Summary • Lock-free: atomic operations optimistic prepare validate & perform (atomic ops) failed – marking pointers, flags, helping, … • Lock-based: lock validate optimistic prepare loc validat perfor e m k unlock failed unlock – flags, pointer reversal, parsing twice, … OPTIK Igor Zablotchi | 12. 2017 36
Summary • Concurrent data structures are very important • Optimistic concurrency necessary for scalability – Only recently a lot of active work for CDSs OPTIK Igor Zablotchi | 12. 2017 37
- Slides: 37