Concurrent Data Structures Concurrent Algorithms 2016 Tudor David
- Slides: 40
Concurrent Data Structures Concurrent Algorithms 2016 Tudor David (based on slides by Vasileios Trigonakis) Tudor David | 11. 2016 1
Data Structures (DSs) • Constructs for efficiently storing and retrieving data – Different types: lists, hash tables, trees, queues, … • Accessed through the DS interface – Depends on the DS type, but always includes – Store an element – Retrieve an element • Element – Set: just one value – Map: key/value pair CA Tudor David | 11. 2016 2
Concurrent Data Structures (CDSs) • Concurrently accessed by multiple threads – Through the CDS interface linearizable operations! • Really important on multi-cores • Used in most software systems ASCY Tudor David | 11. 2016 3
What do we care about in practice? • Progress of individual operations sometimes • More often: Throughput (Mop/s) – Number of operations per second (throughput) – The evolution of throughput as we increase 12 the number of threads (scalability) 10 8 6 4 2 0 1 10 20 30 40 # Threads CA Tudor David | 11. 2016 4
DS Example: Linked List delete(6) 1 2 3 insert(4) 5 CA 8 4 • A sequence of elements (nodes) • Interface – search (aka contains) – insert – remove (aka delete) 6 struct node { value_t value; struct node* next; }; Tudor David | 11. 2016 5
Search Data Structures • Interface search(k) 1. search update(k) 2. insert updates 3. remove k parse(k) modify(k) • Semantics 1. 2. 3. 4. read-only read-write CA Tudor David | 11. 2016 6
Optimistic vs. Pessimistic Concurrency 10 traverse c i t is m ti p o 8 6 traverse k loc k 4 2 loc Throughput (Mop/s) 12 20 -core Xeon 1024 elements 0 1 10 20 pessimistic 30 40 # Cores "bad" linked list "good" linked list (Lesson 1) Optimistic concurrency is the only way to get OPTIK scalability Tudor David | 11. 2016 7
The Two Problems in Optimistic Concurrency • Concurrency Control How threads synchronize their writes to the shared memory (e. g. , nodes) – Locks – CAS – Transactional memory OPTIK • Memory Reclamation How and when threads free and reuse the shared memory (e. g. , nodes) – Garbage collectors – Hazard pointers – RCU – Quiescent states Tudor David | 11. 2016 8
Tools for Optimistic Concurrency Control (OCC) • RCU: slow in the presence of updates – (also a memory reclamation scheme) • STM: slow in general • HTM: not ubiquitous, not very fast (yet) • Wait-free algorithms: slow in general • (Optimistic) Lock-free algorithms: • Optimistic lock-based algorithms: We either need a lock-free or an optimistic lock-based OPTIK algorithm Tudor David | 11. 2016 9
Parenthesis: Target platform 2 -socket Intel Xeon E 5 -2680 v 2 Ivy Bridge – 20 cores @ 2. 8 GHz, 40 hyper-threads – 25 MB LLC (per socket) – 256 GB RAM CA c c c c c Tudor David | 11. 2016 10
Concurrent Linked Lists – 5% Updates Blocking Lock-free 1024 elements 5% updates Wait-free Throughput (Mops/s) 12 10 8 6 4 2 0 1 5 9 13 17 21 25 Number of threads 29 33 37 Wait-free algorithm is slow OPTIK Tudor David | 11. 2016 11
Optimistic Concurrency in Data Structures operation Pattern optimistic prepare perform (non-synchronized) (synchronized) validate optimistic (synchroniz perform prepare ed) detect conflicting failed concurrent operations find insertion spot validate Example linked list insert Validation plays a key role in concurrent data structures OPTIK Tudor David | 11. 2016 12
Validation in Concurrent Data Structures • Lock-free: atomic operations optimistic prepare validate & perform (atomic ops) failed – marking pointers, flags, helping, … • Lock-based: lock validate optimistic prepare loc validat perfor e m k unlock failed unlock – flags, pointer reversal, parsing twice, … Validation is what differentiates algorithms OPTIK Tudor David | 11. 2016 13
Let’s design two concurrent linked lists: A lock-free and a lock-based Tudor David | 11. 2016 14
Lock-free Sorted Linked List: Naïve Search find spot Insert find modification spot CAS Delete find modification spot CAS return Is this a correct (linearizable) linked list? OPTIK Tudor David | 11. 2016 15
Lock-free Sorted Linked List: Naïve – Incorrect P 1: CAS P 1: find modification spot P 0: Insert(x) P 1: Delete(y) P 0: find modification spot P 0: CAS y x Lost update! • What is the problem? – Insert involves one existing node; – Delete involves two existing nodes How can we fix the problem? OPTIK Tudor David | 11. 2016 16
Lock-free Sorted Linked List: Fix • Idea! To delete a node, make it unusable first… – Mark it for deletion so that 1. You fail marking if someone changes next pointer; 2. An insertion fails if the predecessor node is marked. In other words: delete in two steps 2. CAS(remove) 1. CAS(mark) 1. Mark for deletion; and then find modification spot Delete(y) 2. Physical deletion y OPTIK Tudor David | 11. 2016 17
1. Failing Deletion (Marking) P 1: CAS(mark) fa P 1: find modification spot P 0: Insert(x) P 1: Delete(y) P 0: find modification spot P 0: CAS y x • Upon failure restart the operation – Restarting is part of “all” state-of-the-art-data structures OPTIK Tudor David | 11. 2016 18
1. Failing Insertion due to Marked Node P 1: CAS(remove) P 1: CAS(mark) P 1: find modification spot P 0: Insert(x) P 1: Delete(y) P 0: find modification spot P 0: CAS false y • Upon failure restart the operation – Restarting is part of “all” state-of-the-art-data structures How can we implement marking? OPTIK Tudor David | 11. 2016 19
Implementing Marking (C Style) • Pointers in 64 bit architectures – Word aligned - 8 bit aligned! next pointer 0 0 0 e d o n boolean mark(node_t* n) uintptr_t unmarked = n->next & ~0 x 1 L; uintptr_t marked = n->next | 0 x 1 L; return CAS(&n->next, unmarked, marked) == unmarked; OPTIK Tudor David | 11. 2016 20
Lock-free List: Putting Everything Together • Traversal: traverse (requires unmarking nodes) • Search: traverse • Insert: traverse CAS to insert • Delete: traverse CAS to mark CAS to remove What • Garbage (marked) nodes happers if this CAS fails? ? – Cleanup while traversing A pragmatic implementation of lock-free linked lists (helping in this course’s terms) OPTIK Tudor David | 11. 2016 21
What is not Perfect with the Lock-free List? 1. Garbage nodes – Increase path length; and – Increase complexity if (is_marked_node(n)) … 2. Unmarking every single pointer – Increase complexity curr = get_unmark_ref(curr->next) Can we simplify the design with locks? OPTIK Tudor David | 11. 2016 22
Lock-based Sorted Linked List: Naïve Search find spot Insert find modification spot lock Delete lock(target) find modification spot lock(predecessor) return Is this a correct (linearizable) linked list? OPTIK Tudor David | 11. 2016 23
Lock-based List: Validate After Locking Search find spot return validate !pred->marked && pred->next did not change find modification spot lock Insert mark(curr) lock(curr) find modification spot lock(predecessor) Delete !pred->marked && !curr->marked && pred->next did not change OPTIK Tudor David | 11. 2016 24
Throughput (Mop/s) Concurrent Linked Lists – 0% updates 50 45 40 35 30 25 20 15 10 5 0 1024 elements 0% updates Just because the lock-based is not unmarking! 1 10 20 30 40 # Cores lock-free lock-based (Lesson 2) Sequential complexity matters Simplicity OPTIK Tudor David | 11. 2016 25
Optimistic Concurrency Control: Summary • Lock-free: atomic operations optimistic prepare validate & perform (atomic ops) failed – marking pointers, flags, helping, … • Lock-based: lock validate optimistic prepare loc validat perfor e m k unlock failed unlock – flags, pointer reversal, parsing twice, … OPTIK Tudor David | 11. 2016 26
Word of caution: lock-based algorithms • Search data structures • Queues, stacks, counters, . . . Queue, 40 threads Throughput (Mop/s) 3 2. 5 2 1. 5 1 0. 5 0 Lock-based CA Non-blocking Tudor David | 11. 2016 27
Memory Reclamation: OCC’s Side Effect • Delete a node free and reuse this memory • Subset of the garbage collection problem • Who is accessing that memory? • Can we just directly do free(node)? P 0: pointer P 0: search on x x P 1: delete(x) P 1: free(x) We cannot directly free the memory! Need memory OPTIK reclamation Tudor David | 11. 2016 28
Memory Reclamation Schemes 1. Reference counting – Count how many references exist on a node 2. Hazard pointers – Tell to others what exactly you are reading 3. Quiescent states – Wait until it is certain than no one holds references 4. Read-Copy Update (RCU) – Quiescent states – The extreme approach OPTIK Tudor David | 11. 2016 29
1. Reference Counting rc_pointer • Pointer + Counter counter pointer • Dereference: rc_dereference(rc_pointer* rcp) atomic_increment(&rcp->counter); return *pointer; • “Release”: rc_release(rc_pointer* rcp) atomic_decrement(&rcp->counter); • Free: iff counter = 0 (Lesson 3) Readers cannot write on the shared nodes Bad bad idea: Readers write on shared nodes! OPTIK Tudor David | 11. 2016 30
2. Hazard pointers (1/2) • Reference counter property of the node • Hazard pointer property of the thread hazard_pointe – A Multi-Reader Single-Writer (MRSW) register addres s • Protect: • hp_protect(node* n) hazard_pointer* hp = hp_get(n); hp->address = n; Depends on the data Release: structure hp_release(hazard_pointer* hp) type hp->address = NULL; OPTIK Tudor David | 11. 2016 31
2. Hazard pointers (2/2) • Free memory x 1. Collect all hazard pointers hazard_pointe 2. Check if x is accessed by any thread addres 1. If yes, buffer the free for later s 2. If not, free the memory • Buffering the free is implementation specific • + lock-free structure size) hazard pointers hp_protect • O(data - not scalable OPTIK Tudor David | 11. 2016 32
3. Quiescent States • Keep the memory until it is certain it is not accessed • Can be implemented in various ways • Example implementation search / insert / delete qs_unsafe(); I’m accessing shared data … qs_safe(); I’m not accessing shared data return … The data written in qs_[un]safe must be local-mostly OPTIK Tudor David | 11. 2016 33
3. Quiescent States: qs_[un]safe Implementation • List of “thread-local” (mostly) counters (id = 0) qs_state (id = x) qs_state (id = y) qs_state • qs_state (initialized to 0) – even : in safe mode (not accessing shared data) – odd : in unsafe mode • qs_safe / qs_unsafe qs_state++; How do we free memory? OPTIK Tudor David | 11. 2016 34
3. Quiescent States: Freeing memory • List of “thread-local” (mostly) counters (id = 0) qs_state (id = x) qs_state (id = y) qs_state • Upon qs_free: Timestamp memory (vector_ts) for t in thread_ids – Can do this for batches of frees if (vts_mem[t] is odd && vts_now[t] = vts_mem[t]) return false; return true; • Safe to reuse the memory vector_tsnow >> vector_tsmem How do the schemes we have seen perform? OPTIK Tudor David | 11. 2016 35
Hazard Pointers vs. Quiescent States 1024 elements 0% updates Throughput (Mop/s) 12 10 8 6 4 2 0 0 10 20 #Threads None QSBR 30 40 HP Quiescent-state reclamation is as fast as it gets ASCY Tudor David | 11. 2016 36
4. Read-Copy Update (RCU) • Quiescent states at their extreme – Deletions wait all readers to reach a safe state • Introduced in the Linux kernel in ~2002 – More than 10000 uses in the kernel! • (Example) Interface – rcu_read_lock (= qs_unsafe) – rcu_read_unlock (= qs_safe) – synchronize_rcu wait all readers OPTIK Tudor David | 11. 2016 37
4. Using RCU • Search / Traverse rcu_read_lock() … rcu_read_unlock() • Delete … physical deletion of x synchronize_rcu() free(x) • + simple • + read-only workloads • - bad for writes OPTIK Tudor David | 11. 2016 38
Memory Reclamation: Summary • How and when to reuse freed memory • Many techniques, no silver bullet 1. 2. 3. 4. Reference counting Hazard pointers Quiescent states Read-Copy Update (RCU) OPTIK Tudor David | 11. 2016 39
Summary • Concurrent data structures are very important • Optimistic concurrency necessary for scalability – Only recently a lot of active work for CDSs • Memory reclamation is – Inherent to optimistic concurrency; – A difficult problem; – A potential performance/scalability bottleneck OPTIK Tudor David | 11. 2016 40
- Ajit diwan iit bombay
- Cos423
- Data structures and algorithms tutorial
- Information retrieval data structures and algorithms
- Data structures and algorithms bits pilani
- Ajit diwan iit bombay
- Algorithms + data structures = programs
- Data structures and algorithms
- Data structures and algorithms
- Ian munro waterloo
- Information retrieval data structures and algorithms
- Data structures and algorithms
- Synchronization algorithms and concurrent programming
- Homologous structures
- Muthukrishnan data stream algorithms
- Master data management sql server 2016
- Magic quadrant for data masking technology 2016
- One variable data table excel 2016
- Btechsmartclass data structures
- Types of data structures in r
- Oblivious data structures
- Kernel data structures
- Introduction to data structures
- Introduction to data structures
- Esoteric data structures
- Geometric data structures
- Hadoop i/o compression
- Define macro processor
- Advanced data structures in java
- Assembler algorithm and data structures
- Debasis samanta data structure
- Persistent vs ephemeral data structures
- Php data structures
- Spatial data structures in gis
- Linked list java
- Recurrence data structures
- Data structures in c ppt
- Data structures for parallel computing
- Data structures and abstractions with java
- Data structures for language processing
- Cos 423 princeton