Ck Direct Charm RDMA Put Presented by Eric
Ck. Direct: Charm++ RDMA Put Presented by Eric Bohm Ck. Direct Team: Eric Bohm, Sayantan Chakravorty, Pritish Jetley, Abhinav Bhatele ppl@cs. uiuc. edu 5/4/2008 Charm Workshop 2008 1
What is Ck. Direct? One-sided communication One-way (put only, so far) Memory to memory interface Uses RDMA for zero copy No protocol synchronization User notification via callback Pair-wise persistent channels Charm Workshop 2008 2
Motivating Example Charm Workshop 2008 3
Messaging Approach Msg SR Proc 2 A Send Message Charm Workshop 2008 B Dest Proc 4
Ck. Direct Approach Put Proc 2 A B Dest Proc Charm Workshop 2008 5
RDMA Challenges Remote Direct Memory Access Minimal overhead => fast Put is more intuitive for message driven model Get: know remote location and remote data is ready Put: know remote location Interfaces for RDMA vary by interconnect Put completion notification is lacking eithere is no notification or the put performance is hardly better than two-sided through trickery, we can do better than that. Charm Workshop 2008 6
Where is it useful? When the same size data is transferred between the same partners each iteration to buffers which are reused When the application already enforces iteration boundaries Especially when you need to aggregate data from disparate sources into a contiguous buffer before processing Charm Workshop 2008 7
How does it work? User callback triggered on put completion Application must: register send and receive processor and memory pairs in a handle register put completion callback for handle register out of band pattern for handle call ready when done using the received put data only 1 transaction per handle at a time trigger message from callback for real computation Charm Workshop 2008 8
Charm Workshop 2008 9
Ping Pong Results Ping Pong, Ck. Direct relative improvement, by message size in 1000 s of bytes Charm Workshop 2008 10
Matrix Multiply Results Matrix Multiply 2048*2048 average time in milliseconds Charm Workshop 2008 11
Jacobi 3 D Results Infiniband (Abe) Blue Gene/P (Surveyor) Jacobi 3 D 1024*512, iteration time improvement from Ck. Direct Charm Workshop 2008 12
Open. Atom Results Open. Atom Water 256 M Benchmark, minimization, time per step in seconds Charm Workshop 2008 13
Reducing Polling Overhead Polling overhead is proportional to the number of ready handles. To minimize the number of ready handles we have a split scheme. Ck. Direct_ready. Mark Ck. Direct_ready. Poll Done with data, but don't start polling yet Data was already marked, start checking Can detect puts completed since ready. Mark Charm Workshop 2008 14
Ck. Direct_ready. Poll Ck. Direct _put Invoke callback function Ck. Direct_ready. Mark message sends entry methods Ck. Direct_ready. Poll Ck. Direct _put Ck. Direct_ready. Mark Ck. Direct_ready. Poll Charm Workshop 2008 15
The Ck. Direct API /* Receiver side create handle */ struct infi. Direct. User. Handle Ck. Direct_create. Handle(int sender. Node, void *recv. Buf, int recv. Buf. Size, void (*callback. Fn. Ptr)(void *), void *callback. Data, double initial. Value); /* Sender side register memory to handle */ void Ck. Direct_assoc. Local. Buffer(struct infi. Direct. User. Handle *user. Handle, void *send. Buf, int send. Buf. Size); /* Sender side actual data transfer */ void Ck. Direct_put(struct infi. Direct. User. Handle *user. Handle); /* Receiver side done with buffer */ void Ck. Direct_ready. Mark(struct infi. Direct. User. Handle *user. Handle); /* Receiver side start checking for put */ void Ck. Direct_ready. Poll. Q(struct infi. Direct. User. Handle *user. Handle); /* Receiver side done with buffer start checking for put */ void Ck. Direct_ready(struct infi. Direct. User. Handle *user. Handle); Charm Workshop 2008 16
Conclusions Availability: cvs version of charm net-linux-amd 64 -ibverbs bluegenep Future Work Ck. Direct multicasts Ports to other architectures Questions? Feedback? Charm Workshop 2008 17
- Slides: 17