Program Systems Institute Russian Academy of Sciences Recent

  • Slides: 48
Download presentation
Program Systems Institute Russian Academy of Sciences Recent Advances in Parallel Computing Technologies at

Program Systems Institute Russian Academy of Sciences Recent Advances in Parallel Computing Technologies at PSI RAS RCMS Program Systems Institute RAS, Alexander Moskovsky, Sergei Abramov 06/09/05 Pereslavl-Zalessky 1

Program Systems Institute Russian Academy of Sciences Supercomputing Project “SKIF” 2

Program Systems Institute Russian Academy of Sciences Supercomputing Project “SKIF” 2

Open TS: an advanced tool for parallel and distributed computing. SKIF Supercomputing Project q

Open TS: an advanced tool for parallel and distributed computing. SKIF Supercomputing Project q Joint of Russian Federation and Republic of Belarus q 2000 -2004 q 10 + 10 organizations q PSI RAS is lead organization from Russian Federation q Hardware and Software 3

Program Systems Institute Russian Academy of Sciences Open TS Overview 4

Program Systems Institute Russian Academy of Sciences Open TS Overview 4

Open TS: an advanced tool for parallel and distributed computing. T-System History q q

Open TS: an advanced tool for parallel and distributed computing. T-System History q q Mid-80 -ies Basic ideas of T-System 1990 -ies First implementation of T-System 2001 -2002, “SKIF” GRACE — Graph Reduction Applied to Cluster Environment 2003 -current, “SKIF” Open TS — Open T-system 5

Open TS: an advanced tool for parallel and distributed computing. Comparison: T-System and MPI

Open TS: an advanced tool for parallel and distributed computing. Comparison: T-System and MPI High-level a few keywords Low-level hundred(s) primitives C/Fortran T-System Assembler MPI Sequential Parallel 6

Open TS: an advanced tool for parallel and distributed computing. Related work q Parallel

Open TS: an advanced tool for parallel and distributed computing. Related work q Parallel Programming Using C++ (Scientific and Engineering Computation) by Gregory V. Wilson (Editor), Paul Lu (Editor) ABC++, Amelia, CC++, CHAOS++, COOL, C++//, ICC++, Mentat, MPC++, MPI++, p. C++, POOMA, TAU, UC++ 7

Open TS: an advanced tool for parallel and distributed computing. T-System in Comparison Related

Open TS: an advanced tool for parallel and distributed computing. T-System in Comparison Related work Charm++ UPC, mp. C++ Glasgow Parallel Haskell Open TS differentiator FP-based approach Implicit parallelism Allows C/C++ based lowlevel optimization OMPC++ Provides both language and C++ templates library Supports SMP, MPI, PVM, and GRID platforms 8 Cilk

Open TS: an advanced tool for parallel and distributed computing. Open TS: an Outline

Open TS: an advanced tool for parallel and distributed computing. Open TS: an Outline q q High-performance computing “Automatic dynamic parallelization” Combining functional and imperative approaches, high-level parallel programming Т++ language: “Parallel dialect” of C++ — an approach popular in 90 -ies 9

Open TS: an advanced tool for parallel and distributed computing. Т-Approach “Pure” function (tfunction)

Open TS: an advanced tool for parallel and distributed computing. Т-Approach “Pure” function (tfunction) invocations produce grains of parallelism q T-Program is q Functional – on higher level µ Imperative – on low level (optimization) µ C-compatible execution model q Non-ready variables, Multiple assignment q “Seamless” C-extension (or Fortran-extension) q 10

Open TS: an advanced tool for parallel and distributed computing. Т++ Keywords q q

Open TS: an advanced tool for parallel and distributed computing. Т++ Keywords q q q q tfun tval tptr tout tdrop twait tct — Т-function — Т-variable — Т-pointer — Output parameter (like &) — Make ready — Wait for readiness — Т-context 11

Open TS: an advanced tool for parallel and distributed computing. Sample Program #include <stdio.

Open TS: an advanced tool for parallel and distributed computing. Sample Program #include <stdio. h> tfun int fib (int n) { return n < 2 ? n : fib(n-1)+fib(n-2); } tfun int main (int argc, char **argv) { if (argc != 2) { printf("Usage: fib <n>n"); return 1; } int n = atoi(argv[1]); printf("fib(%d) = %dn", n, (int)fib(n)); return 0; } 12

Open TS: an advanced tool for parallel and distributed computing. Open TS: Environment Supports

Open TS: an advanced tool for parallel and distributed computing. Open TS: Environment Supports 1000 threads per CPU 13

Open TS: an advanced tool for parallel and distributed computing. q ЕР NPB, Test

Open TS: an advanced tool for parallel and distributed computing. q ЕР NPB, Test ЕР Rewritten @Open. TS – Embarrassingly Parallel q NASA Parallel Benchmarks suite q Speedup = 96% of theoretical maximum (on 10 nodes) Efficiency, % of theoretical Time, % of sequential 14

Program Systems Institute Russian Academy of Sciences Open TS vs MPI case study 15

Program Systems Institute Russian Academy of Sciences Open TS vs MPI case study 15

Open TS: an advanced tool for parallel and distributed computing. Applications Popular and widely

Open TS: an advanced tool for parallel and distributed computing. Applications Popular and widely used q Developed by independent teams (MPI experts) q Pov. Ray – Persistence of Vision Ray-tracer, enabled for parallel run by a patch q ALCMD/MP_lite – molecular dynamics package (Ames Lab) q 16

Open TS: an advanced tool for parallel and distributed computing. T-Pov. Ray vs MPI

Open TS: an advanced tool for parallel and distributed computing. T-Pov. Ray vs MPI Pov. Ray: code complexity Program MPI modules for Pov. Ray 3. 10 g MPI patch for Pov. Ray 3. 50 c T++ modules (for both versions 3. 10 g & 3. 50 c) Source code volume 1, 500 lines 3, 000 lines 200 lines 17

Open TS: an advanced tool for parallel and distributed computing. T-Pov. Ray vs MPI

Open TS: an advanced tool for parallel and distributed computing. T-Pov. Ray vs MPI Pov. Ray: performance 16 dual Athlon 1800, AMD Athlon MP 1800+ RAM 1 GB, Fast. Ethernet, LAM 7. 0. 6 18

Open TS: an advanced tool for parallel and distributed computing. T-Pov. Ray vs MPI

Open TS: an advanced tool for parallel and distributed computing. T-Pov. Ray vs MPI Pov. Ray: performance 2 CPUs AMD Opteron 248 2. 2 GHz RAM 4 GB, Gig. E, LAM 7. 1. 1 19

Open TS: an advanced tool for parallel and distributed computing. ALCMD/MPI vs ALCMD/Open. TS

Open TS: an advanced tool for parallel and distributed computing. ALCMD/MPI vs ALCMD/Open. TS MP_Lite component of ALCMD rewritten in T++ q Fortran code is left intact q 20

Open TS: an advanced tool for parallel and distributed computing. ALCMD/MPI vs ALCMD/Open. TS

Open TS: an advanced tool for parallel and distributed computing. ALCMD/MPI vs ALCMD/Open. TS : code complexity Program MP_Lite total/MPI Source code volume ~20, 000 lines MP_Lite, ALCMD-related/ MPI ~3, 500 lines MP_Lite, ALCMD-related/ Open. TS 500 lines 21

Open TS: an advanced tool for parallel and distributed computing. ALCMD/MPI vs ALCMD/Open. TS:

Open TS: an advanced tool for parallel and distributed computing. ALCMD/MPI vs ALCMD/Open. TS: performance 16 dual Athlon 1800, AMD Athlon MP 1800+ RAM 1 GB, Fast. Ethernet, LAM 7. 0. 6, Lennard-Jones MD, 512000 atoms 22

Open TS: an advanced tool for parallel and distributed computing. ALCMD/MPI vs ALCMD/Open. TS:

Open TS: an advanced tool for parallel and distributed computing. ALCMD/MPI vs ALCMD/Open. TS: performance 2 CPUs AMD Opteron 248 2. 2 GHz RAM 4 GB, Gig. E, LAM 7. 1. 1, Lennard-Jones MD, 512000 atoms 23

Open TS: an advanced tool for parallel and distributed computing. ALCMD/MPI vs ALCMD/Open. TS:

Open TS: an advanced tool for parallel and distributed computing. ALCMD/MPI vs ALCMD/Open. TS: performance 2 CPUs AMD Opteron 248 2. 2 GHz RAM 4 GB, Infini. Band, MVAMPICH 0. 9. 4, Lennard-Jones MD, 512000 atoms 24

Program Systems Institute Russian Academy of Sciences Open TS applications 25

Program Systems Institute Russian Academy of Sciences Open TS applications 25

Open TS: an advanced tool for parallel and distributed computing. Т-Applications q q q

Open TS: an advanced tool for parallel and distributed computing. Т-Applications q q q q Multi. Gen – biological activity estimation Remote sensing applications Plasma modeling Protein simulation Aeromechanics Query engine for XML AI-applications etc. 26

Open TS: an advanced tool for parallel and distributed computing. Multi. Gen Chelyabinsk State

Open TS: an advanced tool for parallel and distributed computing. Multi. Gen Chelyabinsk State К 0 University Level 0 Level 1 К 12 Level 2 К 21 К 22 Multi-conformation model 27

Open TS: an advanced tool for parallel and distributed computing. Multi. Gen: Speedup National

Open TS: an advanced tool for parallel and distributed computing. Multi. Gen: Speedup National Cancer Institute USA Reg. No. NCI-609067 (AIDS drug lead) National Cancer Institute USA Reg. No. NCI-641295 (AIDS drug lead) TOSLAB company (Russia-Belgium) Reg. No. TOSLAB A 2 -0261 (antiphlogistic drug lead) Substance Atom number Rotations number Conformers Exectution time (min. : с) 1 node 4 nodes 16 nodes NCI-609067 28 4 13 9: 33 3: 21 1: 22 TOSLAB A 2 -0261 82 18 49 115: 27 39: 23 16: 09 NCI-641295 126 25 74 266: 19 95: 57 34: 48 28

Open TS: an advanced tool for parallel and distributed computing. Aeromechanics Institute of Mechanics,

Open TS: an advanced tool for parallel and distributed computing. Aeromechanics Institute of Mechanics, MSU 29

Open TS: an advanced tool for parallel and distributed computing. AEROMECHANICS Institute of Mechanics,

Open TS: an advanced tool for parallel and distributed computing. AEROMECHANICS Institute of Mechanics, MSU 30

Open TS: an advanced tool for parallel and distributed computing. Creating space-born radar image

Open TS: an advanced tool for parallel and distributed computing. Creating space-born radar image from hologram 31

Open TS: an advanced tool for parallel and distributed computing. Simulating broadband radar signal

Open TS: an advanced tool for parallel and distributed computing. Simulating broadband radar signal q. Graphical User Interface q. Non-PSI RAS development team (Space research institute of Khrunichev corp. ) 32

Open TS: an advanced tool for parallel and distributed computing. Landsat Image Classification q

Open TS: an advanced tool for parallel and distributed computing. Landsat Image Classification q Computational “web-service” 33

Open TS: an advanced tool for parallel and distributed computing. Future Work Multi-kernel CPU

Open TS: an advanced tool for parallel and distributed computing. Future Work Multi-kernel CPU support q Distributed computing q µ µ µ Schedulers Transport Interface to web-services Fault-tolerance q Optimizing for modern CPUs q Algorithmic skeletons, patterns and high level parallel libraries q 34

Open TS: an advanced tool for parallel and distributed computing. Out of Presentation Scope

Open TS: an advanced tool for parallel and distributed computing. Out of Presentation Scope q q q Other T-languages: T-Refal, T-Fortan Memoization Automatically choosing between call-style and fork-style of function invocation Checkpointing Heartbeat mechanism Flavours of data references: “normal”, “glue” and “magnetic” — lazy, eager and ultraeager (speculative) data transfer 35

Program Systems Institute Russian Academy of Sciences Other Software Efforts 36

Program Systems Institute Russian Academy of Sciences Other Software Efforts 36

Open TS: an advanced tool for parallel and distributed computing. Roshydromet: Losev’s weather forecast

Open TS: an advanced tool for parallel and distributed computing. Roshydromet: Losev’s weather forecast model 37

Open TS: an advanced tool for parallel and distributed computing. GRID TESTBED Network of

Open TS: an advanced tool for parallel and distributed computing. GRID TESTBED Network of virtual machines (classes, users, etc) q Total peak performance – 79 GFlops q Linux “crippled” distribution, autoupdate, moinitoring q 38

Open TS: an advanced tool for parallel and distributed computing. CHEMICAL DOCKING w T-GRID

Open TS: an advanced tool for parallel and distributed computing. CHEMICAL DOCKING w T-GRID “Customer” – Faculty of Bioinformatics, MSU q Looking for a drug candidate among large set of substabces q 1) 2) 3). . . мишень 39

Open TS: an advanced tool for parallel and distributed computing. ACKNOLEDGEMENTS “SKIF” supercomputing project

Open TS: an advanced tool for parallel and distributed computing. ACKNOLEDGEMENTS “SKIF” supercomputing project q Russian Academy of Science grants q µ µ Program “High-performance computing systems on new principles of computational process organization” Program of Presidium of Russian Academy of Science “Development of basics for implementation of distributed scientific informational-computational environment on GRID technologies” Russian Foundation Basic Research “ 05 -07 -08005 офи_а” q Microsoft – contract for “Open TS vs MPI” case study q 40

Program Systems Institute Russian Academy of Sciences THANKS … … ANY QUESTIONS ? ?

Program Systems Institute Russian Academy of Sciences THANKS … … ANY QUESTIONS ? ? ? … … 41

Program Systems Institute Russian Academy of Sciences Open TS benchmarks 42

Program Systems Institute Russian Academy of Sciences Open TS benchmarks 42

Open TS: an advanced tool for parallel and distributed computing. Tests: NASA CG, NASA

Open TS: an advanced tool for parallel and distributed computing. Tests: NASA CG, NASA EP, FIB 43

Open TS: an advanced tool for parallel and distributed computing. EP @ Open. TS

Open TS: an advanced tool for parallel and distributed computing. EP @ Open. TS benchmark Embarrassingly parallel q Recursive implementation q Two parameters q size – number of operations in task ~ 2 size µ depth – number of grains (t-function calls) = 2 depth µ Number of operations per grain ~ 2 size-depth µ q Allows to stress Runtime 44

Open TS: an advanced tool for parallel and distributed computing. q ЕР NPB, Test

Open TS: an advanced tool for parallel and distributed computing. q ЕР NPB, Test ЕР Rewritten @Open. TS – Embarrassingly Parallel q NASA Parallel Benchmarks suite q Speedup = 96% of theoretical maximum (on 10 nodes) Efficiency, % of theoretical Time, % of sequential 45

Open TS: an advanced tool for parallel and distributed computing. Additional EPs The same

Open TS: an advanced tool for parallel and distributed computing. Additional EPs The same T++ source code linked with different RTL extensions q. EP – standard, with dynamic load balance q. EP_ASYNC – “asynchronous” , data exchange interrupts calculation q. EP_GS – “grid scheduler”, minimize load deviation when assigning a task q. EP_GS_ASYNC – “grid scheduler” with “asynchronous” data exchange 46

Open TS: an advanced tool for parallel and distributed computing. EP metrics q M

Open TS: an advanced tool for parallel and distributed computing. EP metrics q M Calculated as 1. 2 size/time/number of CPUs 2. Take % of the best over all experiments q Good metric: is approx. the same on a single CPU with depths between 6 and 12 q Cluster: 16 Dual Athlon 1800 MP+, Fast Ethernet 47

Open TS: an advanced tool for parallel and distributed computing. EP results q For

Open TS: an advanced tool for parallel and distributed computing. EP results q For all size [28, 32], depth [6, 12], M=99, 9% if Ncpu=1 q M drops below 90% if NCPU>8 CPU for size=6 q On 32 CPUs EP_GS_ASYNC is the best with M=88, 2%, and depth=12, size=32 48