http www eas asu educalypso 1 Parallel Processing

  • Slides: 29
Download presentation
http: //www. eas. asu. edu/~calypso 1

http: //www. eas. asu. edu/~calypso 1

Parallel Processing with Windows NT Networks Collborators: Zvi M. Kedem Donald Mc. Laughlin Shantanu

Parallel Processing with Windows NT Networks Collborators: Zvi M. Kedem Donald Mc. Laughlin Shantanu Sardesai Rahul Thombre Partha Dasgupta Arizona State University The MILAN Project New York University Arizona State University Funding Sources: DARPA/Rome Laboratory, NSF, Intel, and Microsoft http: //www. eas. asu. edu/~calypso 2

ECLIPSE Calypso Linux 1. 0 Malaxis Joint Research of Arizona State University and New

ECLIPSE Calypso Linux 1. 0 Malaxis Joint Research of Arizona State University and New York University http: //www. eas. asu. edu/~calypso + Chime 3

The Platforms u Calypso · Language independent parallel processing · Shared memory and fault

The Platforms u Calypso · Language independent parallel processing · Shared memory and fault tolerance. u Chime · CC++ based parallel processing · Shared memory, fault tolerance u Malaxis · DSM package for Windows NT · Read/write locking, barriers u Milan · A metacomputing platform · Coalesces features from the above systems to a general purpose computing platform http: //www. eas. asu. edu/~calypso 4

Unix to Windows NT u Port a system program or middleware from Unix to

Unix to Windows NT u Port a system program or middleware from Unix to Windows NT. How? · Just change the system calls? · Does not work. u Change programming and design styles to NT-centric: · · · no signals in NT use structured event handling (no such thing in Unix) use threads (useful) integrate with windows messages or MFC remote execution support is weak u Learn NT-centrism, and NT lingo http: //www. eas. asu. edu/~calypso 5

NT Terminology u u MSDN is not a network Developer’s library contains books Resource

NT Terminology u u MSDN is not a network Developer’s library contains books Resource Kit is not about resources Huh? · SDK, DDK, checked build · Service Pack · OSR 2 u Remote access does not let you execute anything remotely u Use a Share? · You mean remote mount? No, I mean map network drive u Memory can be reserved or committed or both. u Synchronization primitives - never mind. . . http: //www. eas. asu. edu/~calypso 6

What is u Yet another parallel processing system, which runs on a distributed network

What is u Yet another parallel processing system, which runs on a distributed network of microcomputers: · Shared Memory · Novel execution and memory management strategy u Fault Tolerant: · Machines may stop and start dynamically without affecting the execution u Automatic Load Balancing: · Manages slow and fast machines Provides near optimal thread assignments (measured) u Execution strategy hidden from programmer: · No message passing, process management, data partitioning u Low-overhead mechanisms http: //www. eas. asu. edu/~calypso 7

Key Techniques in Calypso u Eager Scheduling manager · Manager - worker architecture ·

Key Techniques in Calypso u Eager Scheduling manager · Manager - worker architecture · Provides fault-tolerant and loadshared executions with minimal overhead u Two-phase Idempotent Execution Strategy · Distributed memory management strategy · Stops side effects due to failures · Ensures idempotence of results, in spite of duplicate executions u These techniques developed in previous joint theoretical worker research worker http: //www. eas. asu. edu/~calypso 8

Eager Scheduling u Workers contact the manager for work after finishing previous assignment, if

Eager Scheduling u Workers contact the manager for work after finishing previous assignment, if any · When there is unfinished work, the manager has the option of assigning an unfinished thread to a “willing” worker regardless of who is already working on that thread u An example of Round Robin Eager Scheduling: · 3 machines: fast, slow and transient · 12 threads of equal length (50 secs) A B 1 3 5 2 8 10 6 12 9 11 9 Worker interrupted C 4 7 9 Worker crashed time 50 100 150 200 250 300 350 400 http: //www. eas. asu. edu/~calypso 9

Chime u Chime is a programming system and runtime environment for parallel processing u

Chime u Chime is a programming system and runtime environment for parallel processing u The first system to incorporate standard parallel language support on a network of workstations: · · Nested Parallelism, Parallel statements Language-defined scoping of variables Synchronization support Transparent shared memory u Chime supports the “shared memory” constructs of CC++ · Adds fault tolerance…. · Adds load balancing…. …. with low overhead A “distributed” cactus stack http: //www. eas. asu. edu/~calypso 10

Chime Software Architecture http: //www. eas. asu. edu/~calypso 11

Chime Software Architecture http: //www. eas. asu. edu/~calypso 11

Chime Execution Trace http: //www. eas. asu. edu/~calypso 12

Chime Execution Trace http: //www. eas. asu. edu/~calypso 12

Malaxis u A DSM Package u Uses NT threads and memory mapping and protection

Malaxis u A DSM Package u Uses NT threads and memory mapping and protection features u Uses barrier synchronization, memory XOR-ing and intelligent monitoring of page/lock requests to prevent page shuttling u Programmer support: · · Spawning processes on remote machines Mapping shared segments Barrier Synchronization Read and Write locks (abstract, advisory) http: //www. eas. asu. edu/~calypso 13

Milan u A metacomputing platform u Creates a system image of a large computer

Milan u A metacomputing platform u Creates a system image of a large computer on a set of workstations u Smart scheduling · bunching · job recall · pre-emption u Shared memory u Fault tolerant http: //www. eas. asu. edu/~calypso 14

Using Windows NT u The needs of our implementations: · · · User Level

Using Windows NT u The needs of our implementations: · · · User Level page fault handling Getting and setting thread contexts Getting and setting stack contents Asynchronous notification and exception handling Networking support Process/Thread control u Windows NT provides all of the above http: //www. eas. asu. edu/~calypso 15

Memory Handling u Windows NT memory handling is elegant and powerful (After you understand

Memory Handling u Windows NT memory handling is elegant and powerful (After you understand the terminology) u States of memory: · committed · reserved · guarded u Protection and allocation is done by: · Virtual. Alloc · Virtual. Protect u Access violations generate exceptions u Needed reprogramming Calypso - for the better http: //www. eas. asu. edu/~calypso 16

Exception Handling u All exceptions are delivered to an exception handler, defined in the

Exception Handling u All exceptions are delivered to an exception handler, defined in the current scope of execution. u Great, for programmers - nice and structured u Not good for middleware solutions…. · How can I execute another persons code, with my exception handlers? · I cannot change the exception handler, from within my exception handler. u In our case, we found reasonable workarounds - but don’t have general solutions to the above problems. http: //www. eas. asu. edu/~calypso 17

Threads u Good, consistent, kernel threads. · Easy to use · works great ·

Threads u Good, consistent, kernel threads. · Easy to use · works great · plethora of synchronization constructs (too many, in fact) u Threads are useful for: · Threads inside middleware - wow! · Handling distributed shared memory (callbacks, caching, memory service) · Process migration - a thread can set up the main process · Segregating functionality (assign a thread per job) http: //www. eas. asu. edu/~calypso 18

Process and Stack Migration u Migration is used by our system for several purposes:

Process and Stack Migration u Migration is used by our system for several purposes: · Cactus stacks · Checkpointing · Pre-emptive scheduling (produces better turnaround times in dynamic environments) u When a thread has to be migrated: · · Another thread suspends it and gets its context The context is a checkpoint The context is sent to the target machine A thread sets the context of a suspended thread with the new context and resumes it. Stack has to be reset too. · IT WORKS http: //www. eas. asu. edu/~calypso 19

Other Features u Networking · winsock is like sockets, no surprises u Remote execution

Other Features u Networking · winsock is like sockets, no surprises u Remote execution · our approach: Use a daemon process · NT approach: use a starter service u Execution Monitor (GUI) · External process, that controls and displays state of the distributed computation http: //www. eas. asu. edu/~calypso 20

Performance u Program: Ray Trace, generates a nice picture u Equipment: Pentium-90, running Windows

Performance u Program: Ray Trace, generates a nice picture u Equipment: Pentium-90, running Windows NT (Calypso tests) Pentium Pro 200, running Windows NT (Chime tests) u Tests conducted · · Speedup in case of mixed speed machines Speedup in case of crashing and recovering machines Micro-tests (migration, stack creation) – Not all tests will be shown now. http: //www. eas. asu. edu/~calypso 21

Calypso Performance is comparable to Unix systems http: //www. eas. asu. edu/~calypso 22

Calypso Performance is comparable to Unix systems http: //www. eas. asu. edu/~calypso 22

Chime Performance Chime has higher network overhead than Calypso http: //www. eas. asu. edu/~calypso

Chime Performance Chime has higher network overhead than Calypso http: //www. eas. asu. edu/~calypso 23

In Retrospect u NT has some strong points, things that are better than Unix

In Retrospect u NT has some strong points, things that are better than Unix · · Threads Exception Handling Memory Management Program development tools – (very good, especially the debugger) · Documentation u A few shortcomings · no signals · no remote execution facility · terrible terminology http: //www. eas. asu. edu/~calypso 24

Status u Operational prototype systems · Calypso on Windows NT / Windows 95 released

Status u Operational prototype systems · Calypso on Windows NT / Windows 95 released · A prototype of Chime implementing most of the “parallel part” of Compositional C++ on an unreliable network of workstations u Ongoing research · Distributed scheduling and resource management (for MILAN) · Quality of service · Better integration with NT (MFC support, remote services, global scheduling…) http: //www. eas. asu. edu/~calypso 25

Acknowledgements u Co-PI · Zvi M. Kedem u Calypso · Arash Baratloo, Mehmet Karaul

Acknowledgements u Co-PI · Zvi M. Kedem u Calypso · Arash Baratloo, Mehmet Karaul u Calypso NT · Donald Mc. Laughlin and Shantanu Sardesai u Chime · Shantanu Sardesai u Calypso Linux · Arash Baratloo http: //www. eas. asu. edu/~calypso 26

http: //www. eas. asu. edu/~calypso 27

http: //www. eas. asu. edu/~calypso 27

done? http: //www. eas. asu. edu/~calypso 28

done? http: //www. eas. asu. edu/~calypso 28

Review request for SP&E u. Done ? http: //www. eas. asu. edu/~calypso 29

Review request for SP&E u. Done ? http: //www. eas. asu. edu/~calypso 29