Interfacing Java to the Virtual Interface Architecture ChiChao

  • Slides: 17
Download presentation
Interfacing Java to the Virtual Interface Architecture Chi-Chao Chang Dept. of Computer Science Cornell

Interfacing Java to the Virtual Interface Architecture Chi-Chao Chang Dept. of Computer Science Cornell University (joint work with Thorsten von Eicken)

Preliminaries High-performance cluster computing with Java l on homogeneous clusters of workstations User-level network

Preliminaries High-performance cluster computing with Java l on homogeneous clusters of workstations User-level network interfaces l l direct, protected access to network devices Virtual Interface Architecture: industry standard l Giganet’s GNN-1000 adapter Improving Java technology l Marmot: Java system with static bc x 86 compiler Apps RMI, RPC Javia: A Java interface to VIA 4 bottom-up approach 4 minimizes unverified code 4 focus on data-transfer inefficiencies Sockets Active Messages, MPI, FM Java VIA C Networking Devices 22

VIA and Java Application Memory VIA Endpoint Structures l l buffers, descriptors, send/recv Qs

VIA and Java Application Memory VIA Endpoint Structures l l buffers, descriptors, send/recv Qs pinned to physical memory send. Q Key Points l l direct DMA access: zero-copy buffer mgmt (alloc, free, pin, unpin) performed by application l Library buffers recv. Q descr DMA Doorbells buffer re-use amortizes pin/unpin cost (~ 5 K cycles on PII-450 W 2 K) Adapter Memory management in Java is automatic. . . l no control over object location and lifetime l l copying collector can move objects around clear separation between Java heap (GC) and native heap (no GC) l crossing heap boundaries require copying data. . . 33

Javia-I Basic Architecture l respects heap separation l l buffer mgmt in native code

Javia-I Basic Architecture l respects heap separation l l buffer mgmt in native code l Vi primitive array transfers only non-blocking l l send/recv ticket ring copying GC disabled in native code Send/Recv API l byte array ref Marmot as an “off-the-shelf” system l l GC heap bypass ring accesses copying eliminated during send by pinning array on-the-fly recv allocates new array on-the-fly Java C send/recv queue descriptor buffer VIA cannot eliminate copying during recv 44

Javia-I: Performance Basic Costs (PII-450, Windows 2000 b 3): VIA pin + unpin =

Javia-I: Performance Basic Costs (PII-450, Windows 2000 b 3): VIA pin + unpin = (10 + 10)us Marmot: native call = 0. 28 us, locks = 0. 25 us, array alloc = 0. 75 us Latency: N = transfer size in bytes 16. 5 us + (25 ns) * N 38. 0 us + (38 ns) * N 21. 5 us + (42 ns) * N 18. 0 us + (55 ns) * N raw pin(s) copy(s)+alloc(r) BW: 75% to 85% of raw, 6 KByte switch over between copy and pin 55

jbufs Motivation l hard separation between Java heap (GC) and native heap (no GC)

jbufs Motivation l hard separation between Java heap (GC) and native heap (no GC) leads to inefficiencies Goal l provide buffer management capabilities to Java without violating its safety properties jbuf: exposes communication buffers to Java programmers 1. lifetime control: explicit allocation and de-allocation 2. efficient access: direct access as primitive-typed arrays 3. location control: safe de-allocation and re-use by controlling whether or not a jbuf is part of the GC heap l heap separation becomes soft and user-controlled 66

jbufs: Lifetime Control public class jbuf { public static jbuf alloc(int bytes); /* allocates

jbufs: Lifetime Control public class jbuf { public static jbuf alloc(int bytes); /* allocates jbuf outside of GC heap */ public void free() throws Cannot. Free. Exception; /* frees jbuf if it can */ } handle jbuf GC heap 1. jbuf allocation does not result in a Java reference to it l cannot access the jbuf from the wrapper object 2. jbuf is not automatically freed if there are no Java references to it l free has to be explicitly called 77

jbufs: Efficient Access public class jbuf { /* alloc and free omitted */ public

jbufs: Efficient Access public class jbuf { /* alloc and free omitted */ public byte[] to. Byte. Array() throws Typed. Exception; /*hands out byte[] ref*/ public int[] to. Int. Array() throws Typed. Exception; /*hands out int[] ref*/. . . } jbuf Java byte[] ref GC heap 3. (Storage Safety) jbuf remains allocated as long as there array references to it l when can we ever free it? 4. (Type Safety) jbuf cannot have two differently typed references to it at any given time l when can we ever re-use it (e. g. change its reference type)? 88

jbufs: Location Control public class jbuf { /* alloc, free, to. Arrays omitted */

jbufs: Location Control public class jbuf { /* alloc, free, to. Arrays omitted */ public void un. Ref(Call. Back cb); /* app intends to free/re-use jbuf */ } Idea: Use GC to track references un. Ref: application claims it has no references into the jbuf l l l jbuf is added to the GC heap GC verifies the claim and notifies application through callback application can now free or re-use the jbuf Required GC support: change scope of GC heap dynamically jbuf Java byte[] ref GC heap un. Ref GC heap call. Back 99

jbufs: Runtime Checks to<p>Array, GC alloc free to<p>Array Unref ref<p> un. Ref GC* to-be

jbufs: Runtime Checks to<p>Array, GC alloc free to<p>Array Unref ref<p> un. Ref GC* to-be unref<p> to<p>Array, un. Ref Type safety: ref and to-be-unref states parameterized by primitive type GC* transition depends on the type of garbage collector l l non-copying: transition only if all refs to array are dropped before GC copying: transition occurs after every GC 1010

Javia-II Exploiting jbufs l l l GC heap explicit pinning/unpinning of jbufs only non-blocking

Javia-II Exploiting jbufs l l l GC heap explicit pinning/unpinning of jbufs only non-blocking send/recvs additional checks to ensure correct semantics send/recv ticket ring state jbuf array refs Vi Java C send/recv queue descriptor VIA 1111

Javia-II: Performance Basic Costs allocation = 1. 2 us, to*Array = 0. 8 us,

Javia-II: Performance Basic Costs allocation = 1. 2 us, to*Array = 0. 8 us, un. Refs = 2. 5 us Latency (n = xfer size) 16. 5 us + (0. 025 us) * n 20. 5 us + (0. 025 us) * n 38. 0 us + (0. 038 us) * n 21. 5 us + (0. 042 us) * n raw jbufs pin(s) copy(s) BW: within margin of error (< 1%) 1212

Exercising Jbufs Active Messages II l maintains a pool of free recv jbufs l

Exercising Jbufs Active Messages II l maintains a pool of free recv jbufs l l jbuf passed to handler un. Ref is invoked after handler invocation if pool is empty, alloc more jbufs or reclaim existing ones copying deferred to GC-time only if needed class First extends AMHandler { private int first; void handler(AMJbuf buf, …) { int[] tmp = buf. to. Int. Array(); first = tmp[0]; } } class Enqueue extends AMHandler { private Queue q; void handler(AMJbuf buf, …) { int[] tmp = buf. to. Int. Array(); q. enq(tmp); } } 1313

AM-II: Preliminary Numbers Latency about 15 s higher than Javia l l synch access

AM-II: Preliminary Numbers Latency about 15 s higher than Javia l l synch access to buffer pool, endpoint header, flow control checks, handler id lookup room for improvement BW within 3% of peak for 16 KByte messages 1414

Exercising Jbufs again “in-place” object unmarshaling l l l assumption: homogeneous cluster and JVMs

Exercising Jbufs again “in-place” object unmarshaling l l l assumption: homogeneous cluster and JVMs defer copying and allocation to GC-time if needed jstreams = jbuf + object stream API GC heap “typical” read. Object GC heap write. Object NETWORK GC heap “in-place” read. Object 1515

jstreams: Performance read. Object cost constant w. r. t. object size l l about

jstreams: Performance read. Object cost constant w. r. t. object size l l about 1. 5 s per object if written in C pointer swizzling, type-checking, array-bounds checking 1616

Summary Research goal: Efficient, safe, and flexible interaction with network devices using a safe

Summary Research goal: Efficient, safe, and flexible interaction with network devices using a safe language Javia: Java Interface to VIA l native buffers as baseline implementation l l jbufs: safe, explicit control over buffer placement and lifetime l l l can be implemented on off-the-shelf JVMs ability to allocate primitive arrays on memory segments ability to change scope of GC heap dynamically building blocks for Java apps and communication software l l l parallel matrix multiplication active messages remote method invocation 1717