Shared memory and message passing revisited in the

  • Slides: 34
Download presentation
Shared memory and message passing revisited in the many-core era Shared memory and Message

Shared memory and message passing revisited in the many-core era Shared memory and Message passing revisited in the many-core era Aram Santogidis CERN Inverted CERN School of Computing, 29 February – 2 March 2016 1 i. CSC 2016, Aram Santogidis, CERN

Shared memory and message passing revisited in the many-core era The pioneers of concurrent

Shared memory and message passing revisited in the many-core era The pioneers of concurrent programming 2 Edsger Dijkstra Per Brinch Hansen • Mutual exclusion • Cooperating Sequential Processes • Semaphores • • Concurrent Pascal Shared Classes The Solo OS Distributed Processes i. CSC 2016, Aram Santogidis, CERN C. A. R Hoare • Communicating Sequential Processes (CSP) • Monitors

Shared memory and message passing revisited in the many-core era Communication is important Process

Shared memory and message passing revisited in the many-core era Communication is important Process 1 Process 2 Process 3 Time Shared memory Communication/ Synchronization VS Message passing 3 i. CSC 2016, Aram Santogidis, CERN

Shared memory and message passing revisited in the many-core era Agenda of the talk

Shared memory and message passing revisited in the many-core era Agenda of the talk § Concurrency and communication § Two basic examples of the two models § Conventional wisdom for the two models § Cache coherence and manycore processors § Emerging paradigm shift in OS architectures § The future perspective 4 i. CSC 2016, Aram Santogidis, CERN

Shared memory and message passing revisited in the many-core era The Shared Memory model

Shared memory and message passing revisited in the many-core era The Shared Memory model § Threads communicate implicitly with each other via shared data structures Thread Shared data structures § Synchronization primitives (locks, semaphores, etc. ) 5 i. CSC 2016, Aram Santogidis, CERN Shared Memory

Shared memory and message passing revisited in the many-core era The message passing model

Shared memory and message passing revisited in the many-core era The message passing model § Threads communicate explicitly with each other by exchanging messages Thread Message § Is the more fundamental class from the two § Synchronous or asynchronous communication 6 i. CSC 2016, Aram Santogidis, CERN

Shared memory and message passing revisited in the many-core era Lets see an example

Shared memory and message passing revisited in the many-core era Lets see an example for each model 1. Image processing (shared memory) 2. Simple GUI (message passing) 7 i. CSC 2016, Aram Santogidis, CERN

Shared memory and message passing revisited in the many-core era A shared memory-based example:

Shared memory and message passing revisited in the many-core era A shared memory-based example: Convert from colour to grayscale R: 204 G: 46 B: 10 8 (R+G+B) / 3 = 130 i. CSC 2016, Aram Santogidis, CERN

Shared memory and message passing revisited in the many-core era A shared memory-based example:

Shared memory and message passing revisited in the many-core era A shared memory-based example: D Convert from colour to grayscale E SPE We parallelize the computation by assigning tiles (pieces) of the image to threads which execute the conversion in parallel. 9 i. CSC 2016, Aram Santogidis, CERN

Shared memory and message passing revisited in the many-core era A message passing-based example

Shared memory and message passing revisited in the many-core era A message passing-based example Many operating system designs can be placed into one of two very rough categories, depending upon how they… 10 i. CSC 2016, Aram Santogidis, CERN § A GUI with 3 widgets § Text Area § Up scroll button § Down scroll button § Must be interactive (Immediate feedback)

Shared memory and message passing revisited in the many-core era GUI example implementation: Message

Shared memory and message passing revisited in the many-core era GUI example implementation: Message passing solution c u r T S Many operating system designs can be placed into one of two very rough categories, depending upon how they… Thread Up. Button Thread Text. Area 11 e r tu Thread Down. Button i. CSC 2016, Aram Santogidis, CERN

Shared memory and message passing revisited in the many-core era Conventional wisdom about the

Shared memory and message passing revisited in the many-core era Conventional wisdom about the characteristics of the two models 1. Performance 2. Programmability 12 i. CSC 2016, Aram Santogidis, CERN

Shared memory and message passing revisited in the many-core era Performance comparison 13 Shared

Shared memory and message passing revisited in the many-core era Performance comparison 13 Shared Memory Message Passing Hardware support Extensive (All popular architectures) Limited (Only special purpose architectures) Data transfer Overhead Low (Cache block management in HW) High (Data replication) Access/Sync overhead Sometimes high (Critical section contention, NUMA effects) Low (Local private memory access) i. CSC 2016, Aram Santogidis, CERN

Shared memory and message passing revisited in the many-core era Programmability comparison 14 Shared

Shared memory and message passing revisited in the many-core era Programmability comparison 14 Shared Memory Message Passing Communication Implicit Explicit Synchronization Explicit (locks etc. ) Implicit (side-effect) Interface (API) Read/write shared data structures, mutex primitives Send/Receive messages, Multicast Hazards Race conditions, Deadlocks, Starvation Deadlocks, Starvation i. CSC 2016, Aram Santogidis, CERN

Shared memory and message passing revisited in the many-core era Towards the manycore architectures

Shared memory and message passing revisited in the many-core era Towards the manycore architectures http: //www. wired. com/images_blogs/gadgetlab/2009/10/tilera-wafer-1. jpg 15 i. CSC 2016, Aram Santogidis, CERN

Shared memory and message passing revisited in the many-core era The manycore era http:

Shared memory and message passing revisited in the many-core era The manycore era http: //image. slideserve. com/277797/manycore-systems-design-space-n. jpg § Power limits the frequency increase of the processor. § Moore’s law: The transistors keep doubling every two years § Replication: Increasing number of cores The graph is from (presentation): “Joshi, Ajay, et al. "Building manycore processor-to-DRAM networks using monolithic silicon photonics. " High Performance Embedded Computing (HPEC) Workshop. 2008. ” 16 i. CSC 2016, Aram Santogidis, CERN

Shared memory and message passing revisited in the many-core era On the duality of

Shared memory and message passing revisited in the many-core era On the duality of operating systems structures § Operating Systems are generally classified as: § Message passing oriented § Procedure-oriented (shared memory) § Each system from one category has the other category. § Neither model is inherently better than the other (depends on the machine architecture). From: Lauer, Hugh C. , and Roger M. Needham. "On the duality of operating system structures. " ACM SIGOPS Operating Systems Review 13. 2 (1979): 3 -19. 17 i. CSC 2016, Aram Santogidis, CERN

Shared memory and message passing revisited in the many-core era Non Uniform Memory Access

Shared memory and message passing revisited in the many-core era Non Uniform Memory Access (NUMA) P 0 18 Socket 0 P 1 P 2 core CPU 1 CPU 2 Local Cache P 3 QPI Socket 1 P 4 P 5 CPU 3 CPU 4 CPU 5 Local Cache Last Level Cache RAM Domain 0 RAM Domain 1 i. CSC 2016, Aram Santogidis, CERN

Shared memory and message passing revisited in the many-core era Cache coherence i: 0

Shared memory and message passing revisited in the many-core era Cache coherence i: 0 j=i; Shared L 1 Core 0 Cache Controller 19 i: 0 i++; L 1 i: 0 LLC i: 0 BUS i. CSC 2016, Aram Santogidis, CERN Core 1 Cache Controller

Shared memory and message passing revisited in the many-core era Cache coherence i: 0

Shared memory and message passing revisited in the many-core era Cache coherence i: 0 j=i; Modified L 1 Core 0 Cache Controller 20 i++; i: 0 L 1 i: 1 Invalid LLC Message i: 0 BUS i. CSC 2016, Aram Santogidis, CERN Core 1 Bus/ Invalidate Cache Controller

Shared memory and message passing revisited in the many-core era Cache coherence i: 0

Shared memory and message passing revisited in the many-core era Cache coherence i: 0 j=i; Shared L 1 Core 0 i: 1 i++; L 1 i: 1 Core 1 Bus Read Cache Controller 21 LLC i: 1 BUS i. CSC 2016, Aram Santogidis, CERN Write back Cache Controller

Shared memory and message passing revisited in the many-core era A key question When

Shared memory and message passing revisited in the many-core era A key question When updating shared state, which uproach is more expensive (in terms of latency), Shared memory or Message passing? 22 i. CSC 2016, Aram Santogidis, CERN

Shared memory and message passing revisited in the many-core era An experiment of shared

Shared memory and message passing revisited in the many-core era An experiment of shared memory vs message passing performance Thread Core Cache Shared state BUS CPU in Socket 23 Hyper Transport Updating shared state of size [1, 8] cachelines, relying on cache coherent shared memory on 4 x 4 AMD system i. CSC 2016, Aram Santogidis, CERN

Shared memory and message passing revisited in the many-core era An experiment of shared

Shared memory and message passing revisited in the many-core era An experiment of shared memory vs message passing performance S Server, updating the shared state on behalf of the threads BUS CPU in Socket 24 Hyper Transport Updating shared state of size [1, 8] cachelines, relying on synchronous Lightweight Remote Procedure Calls (message passing) i. CSC 2016, Aram Santogidis, CERN

Shared memory and message passing revisited in the many-core era Messages scale better than

Shared memory and message passing revisited in the many-core era Messages scale better than shared memory Message passing scales better than shared memory when increasing the core count and the size of the shared state. 25 The plot is adapted from: Baumann, Andrew, et al. "The multikernel: a new OS architecture for scalable multicore systems. " Proceedings of the ACM SIGOPS 22 nd symposium on Operating systems principles. ACM, 2009. i. CSC 2016, Aram Santogidis, CERN

Shared memory and message passing revisited in the many-core era …some other hints that

Shared memory and message passing revisited in the many-core era …some other hints that may lead to further fragmentation of coherency domains http: //www. racktopsystems. com/wp-content/uploads/2013/01/sql-server-fragmentation. jpg 26 i. CSC 2016, Aram Santogidis, CERN

Shared memory and message passing revisited in the many-core era Heterogeneity Increasing Heterogeneity of

Shared memory and message passing revisited in the many-core era Heterogeneity Increasing Heterogeneity of computing platforms Multi socket Manycores, GPU Coprocessor FPGA Dual cores, pthreads Single cores, Concurrent OS, Coroutines Time 27 i. CSC 2016, Aram Santogidis, CERN § Message passing: Fundamental for communication in heterogeneous environment § Shared memory: Hard to implement in a heterogeneous environment

Shared memory and message passing revisited in the many-core era Message passing OS vs

Shared memory and message passing revisited in the many-core era Message passing OS vs Shared memory OS Barrelfish OS (Message passing) Linux OS (Shared memory) App App OSnode State replica x 86 OSnode State Async replica messages ARM … App OSnode State replica Linux Kernel Single arch(x 86, etc. ) Driver GPU Architecture dependant code Adapted from : Baumann, Andrew, et al. "The multikernel: a new OS architecture for scalable multicore systems. " Proceedings of the ACM SIGOPS 22 nd symposium on Operating systems principles. ACM, 2009 28 i. CSC 2016, Aram Santogidis, CERN

Shared memory and message passing revisited in the many-core era What to expect ?

Shared memory and message passing revisited in the many-core era What to expect ? http: //tech. co/wp-content/uploads/2014/12/future-marketing. jpg 29 i. CSC 2016, Aram Santogidis, CERN

Shared memory and message passing revisited in the many-core era Emerging concurrency paradigms New

Shared memory and message passing revisited in the many-core era Emerging concurrency paradigms New high level paradigms are being developed, based on shared memory and/or message passing constructs. § Asynchronous tasks (Futures/Promises) § Partitioned Global Address Space (PGAS) languages/libraries § Actor Model § Functional Concurrency 30 i. CSC 2016, Aram Santogidis, CERN

Shared memory and message passing revisited in the many-core era The future perspective §

Shared memory and message passing revisited in the many-core era The future perspective § Communication is the key § For energy efficiency § For runtime performance § To manage software complexity § To manage hardware heterogeneity § Innovation in the hardware sector pressures to systems software engineers to develop appropriate support § At the operating system level § Concurrent programming frameworks level § Communication-oriented tools and techniques to design, implement, analyse concurrent programs 31 i. CSC 2016, Aram Santogidis, CERN

Shared memory and message passing revisited in the many-core era http: //globe-views. com/dcim/dreams/surprise-05. jpg

Shared memory and message passing revisited in the many-core era http: //globe-views. com/dcim/dreams/surprise-05. jpg 32 i. CSC 2016, Aram Santogidis, CERN

Shared memory and message passing revisited in the many-core era References 33 § Baumann,

Shared memory and message passing revisited in the many-core era References 33 § Baumann, Andrew, et al. "The multikernel: a new OS architecture for scalable multicore systems. " Proceedings of the ACM SIGOPS 22 nd symposium on Operating systems principles. ACM, 2009. § Gerber, et al. "Not Your Parents' Physical Address Space", Proceedings ot the 15 th Workshop on Hot Topics in Operating Systems (Hot. OS 15) § A Primer on Memory Consistency and Cache Coherence Daniel J. Sorin, Mark D. Hill, and David A. Wood § Martin, Milo MK, Mark D. Hill, and Daniel J. Sorin. "Why on-chip cache coherence is here to stay. " Communications of the ACM 55. 7 (2012): 78 -89. § Hansen, Per Brinch. "The invention of concurrent programming. " The origin of concurrent programming. Springer New York, 2002. 3 -61. § Butcher, Paul. Seven Concurrency Models in Seven Weeks: When Threads Unravel. Pragmatic Bookshelf, 2014. § Hoare, Charles Antony Richard. "Communicating sequential processes. "Communications of the ACM 21. 8 (1978): 666 -677. i. CSC 2016, Aram Santogidis, CERN

Shared memory and message passing revisited in the many-core era Thank you for your

Shared memory and message passing revisited in the many-core era Thank you for your attention Many thanks to my supporters and mentors for this presentation: Sebastian Lopienski, Sebastian. Lopienski@cern. ch, CERN Andreas Joachim Peters, Andreas. Joachim. Peters@cern. ch, CERN Andreas Hirstius, andreas. hirstius@intel. com, Intel Gmb. H Spyros Lalis, lalis@inf. uth. gr, University of Thessaly This work is support by the Marie Curie Early European Industrial Doctorates Fellowship of the European Community’s Seventh Framework Programme under contract number (PITN-GA-2012 -316596 -ICE-DIP). 34 i. CSC 2016, Aram Santogidis, CERN