Implementing Remote Procedure Call Landon Cox February 3
Implementing Remote Procedure Call Landon Cox February 3, 2017
Modularity so far • Procedures as modules • What is private and what is shared between procedures? • Local variables are private • Stack, heap, global variables are shared Module 1 Module 2 Code Private state Shared state Private state
Modularity so far • Procedures as modules • How is control transferred between procedures? • Caller adds arguments and RA to stack, jumps into callee code • Callee sets up local variables, runs code, jumps to RA Module 1 Module 2 Code Private state Shared state Private state
Modularity so far • Procedures as modules • Is isolation between procedures enforced? • No, either module can corrupt the other • No guarantee that callee will return to caller either Module 1 Module 2 Code Private state Shared state Private state
Modularity so far • MULTICS processes as modules • What is private, shared btw MULTICS processes? • Address spaces are private • Segments can be shared Module 1 Module 2 Code Private state Shared state Private state
Modularity so far • MULTICS processes as modules • How is control transferred between MULTICS processes? • Use synchronization primitives from supervisor • Lock/unlock, wait/notify Module 1 Module 2 Code Private state Shared state Private state
Modularity so far • MULTICS processes as modules • Is isolation btw MULTICS processes enforced? • Yes, modules cannot corrupt private state of the other • Isolate shared state inside common segments Module 1 Module 2 Code Private state Shared state Private state
Modularity so far • UNIX processes as modules • What is private and what is shared btw UNIX processes? • Address spaces are private • File system and pipes are shared Module 1 Module 2 Code Private state Shared state Private state
Modularity so far • UNIX processes as modules • How is control transferred between UNIX processes? • Use synchronization primitives from supervisor • Block by reading from pipe, notify by writing to pipe Module 1 Module 2 Code Private state Shared state Private state
Modularity so far • UNIX processes as modules • Is isolation between UNIX processes enforced? • Yes, modules cannot corrupt private state of the other • Protect shared state using pipe buffer and FS access control Module 1 Module 2 Code Private state Shared state Private state
Network programming • Now say two modules are on different machines • What is the standard abstraction for communication? • Sockets • Each end of socket is bound to an <address, port > pair Module 1 Module 2 Code Private state Shared state Private state
Network programming • Now say two modules are on different machines • Sockets should look familiar • Very similar to pipes • Use read/write primitives for synchronized access to buffer • Downsides of socket programming? • • Adds complexity to a program Blocking conditions depend on data received Data structures copied into and out of messages or streams All of this work can be tedious and error-prone • Idea: programmers are used to local procedures • Try to make network programming as easy as procedure calls
Remote procedure call (RPC) • RPC makes request/response look local • Provides the illusion of a function call • RPC isn’t a really a function call • In a normal call, the PC jumps to the function • Function then jumps back to caller • This is similar to request/response though • Stream of control goes from client to server • And then returns back to the client
The RPC illusion • How to make send/recv look like a function call? • Client wants • Send to server to look like calling a function • Reply from server to look like function returning • Server wants • Receive from client to look like a function being called • Wants to send response like returning from function
Implementing RPC • Primary challenges • How to name, locate the remote code to invoke? • How to handle arguments containing pointers? • How to handle failures?
RPC architecture Import Client Server Client code Server code Interface Export Interface Client stub RPC runtime Export Server stub Network Who imports and who exports the interface? RPC runtime Import
RPC architecture Import Client Server Client code Server code Interface Export Interface Client stub RPC runtime Export Server stub Network Who defines the interface? The programmer RPC runtime Import
RPC architecture Import Client Server Client code Server code Interface Export Interface Client stub RPC runtime Export Server stub Network RPC runtime Who writes the client and server code? The programmer Import
RPC architecture Import Client Server Client code Server code Interface Export Interface Client stub RPC runtime Export Server stub Network Import RPC runtime Who writes the stub code? An automated stub generator (rmic in Java)
RPC architecture Import Client Server Client code Server code Interface Export Interface Client stub RPC runtime Export Server stub Network Why can stub code be generated automatically? Import RPC runtime Interface precisely defines behavior What data comes in, what is returned
RPC architecture Import Client Server Client code Server code Interface Export Interface Client stub RPC runtime Export Server stub Network Import RPC runtime Where else have we seen automated control transfer? Compilers + procedure calls
RPC stub functions call return call Client stub Server stub send recv
RPC stub functions • Client stub 1) Builds request message with server function name and parameters 2) Sends request message to server stub • Transfer control to server stub: clients-side code is paused 8) Receives response message from server stub 9) Returns response value to client • Server stub 3) Receives request message 4) Calls the right server function with the specified parameters 5) Waits for the server function to return 6) Builds a response message with the return value 7) Sends response message to client stub
Binding • What is binding? • Establishing map from symbolic name object • In an RPC system what needs to be bound? • Client code uses interface as a symbolic name • RPC system must bind those names to real code instances • In Cedar what managed this mapping? • The Grapevine distributed database • Types are listed as symbolic names • Instances are listed as machine addresses
Binding • Is anyone allowed to export any interface? • • Is anyone allowed to import an interface? • • No, this is regulated through Grapevine access controls Users allowed to export an interface are explicit in group Only group owner can allow someone to export Yes, clients authorized at higher level What other distributed database is Grapevine like? • • Domain name service (DNS) Contains mapping from names to IP addrs Grapevine Group map: interfaces user ids Individual map: user id network address
Binding • Is anyone allowed to export any interface? • • Is anyone allowed to import an interface? • • No, this is regulated through Grapevine access controls Users allowed to export an interface are explicit in group Only group owner can allow someone to export Yes, clients authorized at higher level Are permissions same or different than DNS (reads and writes)? • • • Basically the same DNS updates are controlled DNS retrievals are not Grapevine Group map: interfaces user ids Individual map: user id network address
Shared state • What is the shared state of the RPC abstraction? • Arguments passed through function call • What is the actual shared state in RPC? • The underlying messages between client and server Client Server Code Private state Shared state Private state
Shared state • Why is translating arguments into messages tricky? • Data structures have pointers • Client and server run in different address spaces • Need to ensure that pointer on client = pointer on server Client Server Code Private state Shared state Private state
Shared state • How do we ensure that a data structure is safely transferred? • • Must know the semantics of data structure (typed object references) Must then replace pointers on client with valid pointers on server Requires explicit help of programmer to get right Cannot just pass arbitrary C-style structs and hope to work correctly Client Server Code Private state Shared state Private state
Shared state • What about after server code completes? • Must synchronize updates to arguments • Changes by server must be reflected in client before returning Client Server Code Private state Shared state Private state
Faults • With procedures, what happens if a module faults? • No isolation, program crashes • Result of sharing the same address space • With pipes, what happens if a module faults? • Faulting module (process) crashes • OS makes pipe unreadable and unwritable • Cannot just return an error code through client stub • • Bad idea to overload errors Want to distinguish network failures from incorrectness
Faults • How are RPC faults handled in practice? • • • So how “pure” is the RPC abstraction? • • Not totally pure Programmer still knows which calls are local vs remote Have to write code for handling failures So is RPC a good abstraction? • • • Usually through a software exception Often supported by language In some cases yes, hides a lot of the complexity However, it often comes at a steep performance penalty What part of RPC is slowest? • • Argument packing and unpacking Java class introspection for shipping data structures is particularly painful
Structuring a concurrent system • Talked about two ways to build a system
Alternative structure • Can also give cooperating threads own address spaces • Each thread is basically a separate process • Use messages instead of shared data to communicate • Why would you want to do this? • Protection • Each module runs in its own address space • Reasoning behind micro-kernels • Each service runs as a separate process • Mach from CMU (influenced parts Mac OS X) • Vista/Win 7’s handling of device drivers
Augmenting the mobile experience through code offload Eduardo Cuervo - Duke Aruna Balasubramanian - U Washington Dae-ki Cho - UCLA Alec Wolman, Stefan Saroiu, Ranveer Chandra, Paramvir Bahl – Microsoft Research
Battery is a scarce resource Li-Ion Energy Density 250 Wh/Kg 200 150 100 Just 2 X in 15 years 50 0 91 92 93 94 95 96 97 98 Year 99 00 01 02 03 04 05 �CPU performance during same period: 246 X �A solution to the battery problem seems unlikely
Mobile apps can’t reach their full potential Slow, Limited or Inaccurate Not on par with desktop counterparts Interactive Games Speech Recognition and. Power Synthesis Intensive Too CPU intensive Augmented Reality Limited
One Solution: Remote Execution � Remote execution can reduce energy consumption � Challenges: What should be offloaded? How to dynamically decide when to offload? How to minimize the required programmer effort?
MAUI: Mobile Assistance Using Infrastructure MAUI Contributions: �Combine extensive profiling with an ILP solver Makes dynamic offload decisions Optimize for energy reduction Profile: device, network, application �Leverage modern language runtime (. NET CLR) To simplify program partitioning Reflection, serialization, strong typing
Roadmap �Motivation �MAUI system design MAUI proxy MAUI profiler MAUI solver �Evaluation �Summary �Beyond MAUI
MAUI Architecture Maui Runtime Client Proxy Maui Runtime RPC Server Proxy Application Profiler Solver Profiler RPC Solver Maui Controller Smartphone Maui server
How Does a Programmer Use MAUI? �Goal: make it dead-simple to MAUI-ify apps Build app as a standalone phone app Add. NET attributes to indicate “remoteable” Follow a simple set of rules
Language Run-Time Support For Partitioning �Portability: Mobile (ARM) vs Server (x 86) . NET Framework Common Intermediate Language �Type-Safety and Serialization: Automate state extraction �Reflection: Identifies methods with [Remoteable] tag Automates generation of RPC stubs
MAUI Proxy Maui Runtime Handles Errors Maui Runtime Provides runtime information RPC Client Proxy Application Intercepts Application Calls Profiler Synchronizes State RPC Solver local or remote Chooses Server Proxy Application Profiler Solver Maui Controller Smartphone Maui server
MAUI Profiler State size CPU Cycles Execution Time Callgraph Network Latency Profiler Annotated Callgraph Computational Power Cost Computational Delay Device Profile Network Bandwidth Network Power Cost Network Delay Computational Delay
MAUI Solver 10000 m. J A sample callgraph A Energy C 5000 m. J 3000 ms B and delay for state 900 transfer m. J 15 ms Computation energy and delay for execution 1000 m. J 25000 m. J D 15000 m. J 12000 ms
Is Global Program Analysis Needed? 10000 m. J Yes! – This simple example from Face Recognition app shows why local analysis fails. User Interface 1000 m. J Initialize. Face Recognizer 5000 m. J Find. Match 900 m. J 25000 Cheaper to do local m. J Detect. And. Extract Faces 15000 m. J
Is Global Program Analysis Needed? 10000 m. J Yes! – This simple example from Face Recognition app shows why local analysis fails. User Interface 1000 m. J Initialize. Face Recognizer 5000 m. J Cheaper to do local Find. Match 900 m. J 25000 m. J Detect. And. Extract Faces 15000 m. J Cheaper to do local
Is Global Program Analysis Needed? Initialize. Face Recognizer User Interface 1000 m. J Find. Match Cheaper to offload 25900 m. J Detect. And. Extract Faces
Can MAUI Adapt to Changing Conditions? �Adapt to: Network Bandwidth/Latency Changes Variability on method’s computational requirements �Experiment: Modified off the shelf arcade game application Physics Modeling (homing missiles) Evaluated under different latency settings
Can MAUI Adapt to Changing Conditions? 11 KB miss + iles Handle. Enemies Do. Frame Handle. Bonuses Do. Level s sile mis 11 KB + missiles Required state is smaller Handle. Missiles Complexity increases with # of missiles *Missiles take around 60 bytes each
Case 1 �Zero Missiles �Low latency (RTT < 10 ms) Do. Frame Do. Level Handle. Enemies Handle. Bonuses Offload starting at Do. Level Handle. Missiles Computation cost is close to zero *Missiles take around 60 bytes each
Case 2 � 5 Missiles �Some latency (RTT = 50 ms) Do. Frame Handle. Enemies Handle. Bonuses Do. Level Very expensive to offload everything Little state to offload Only offload Handle Missiles Handle. Missiles Most of the computation cost *Missiles take around 60 bytes each
Roadmap �Motivation �MAUI system design MAUI proxy MAUI profiler MAUI solver �Evaluation �Summary �Beyond MAUI
MAUI Implementation �Platform Windows Mobile 6. 5 . NET Framework 3. 5 HTC Fuze Smartphone Monsoon power monitor �Applications Chess Face Recognition Arcade Game Voice-based translator
Questions � How much can MAUI reduce energy consumption? � How much can MAUI improve performance? � Can MAUI Run Resource-Intensive Applications?
How much can MAUI reduce energy consumption? 35 Face Recognizer Smartphone only MAUI (Wi-Fi, 10 ms RTT) MAUI (Wi-Fi, 25 ms RTT) MAUI (Wi-Fi, 50 ms RTT) MAUI (Wi-Fi, 100 ms RTT) MAUI* (3 G, 220 ms RTT) 30 Energy (Joules) 25 20 15 10 5 0 An order of magnitude improvement on Wi-Fi Big savings even on 3 G
How much can MAUI improve performance? Face Recognizer 21 000 Smartphone only MAUI (Wi-Fi, 10 ms RTT) MAUI (Wi-Fi, 25 ms RTT) MAUI (Wi-Fi, 50 ms RTT) MAUI (Wi-Fi, 100 ms RTT) MAUI* (3 G, 220 ms RTT) Execution Duration (ms) 18 000 15 000 12 000 9 000 6 000 3 000 0 Improvement of around an order of magnitude
Latency to server impacts the opportunities for fine-grained offload Solver would decide not to offload Arcade Game 60 Smartphone only MAUI (Wi-Fi, 10 ms RTT) Energy (Joules) MAUI (Wi-Fi, 25 ms RTT) 40 MAUI (Wi-Fi, 50 ms RTT) MAUI (Wi. Fi, 100 ms RTT) MAUI* (3 G, 220 ms RTT) 20 0 Opportunities for MAUI nodes collocated with APs or Cell towers Up to 40% energy savings on Wi-Fi
Roadmap �Motivation �MAUI system design MAUI proxy MAUI profiler MAUI solver �Evaluation �Summary �Beyond MAUI
Summary � MAUI enables developers to: Bypass the resource limitations of handheld devices Low barrier entry: simple program annotations � For a resource-intensive application MAUI reduced energy consumed by an order of magnitude MAUI improved application performance similarly � MAUI adapts to: Changing network conditions Changing applications CPU demands
Roadmap �Motivation �MAUI system design MAUI proxy MAUI profiler MAUI solver �Evaluation �Summary �Beyond MAUI
Beyond MAUI �For a method to be offloaded The cost of the network transfer is small The computation cost is high �What to do when both costs are high? Video processing Computer graphics Games
MAUI for Games �Cloud gaming is already happening On. Live, Gaikai, etc. Thin client model Steep bandwidth requirements (6 Mbps) �How can MAUI help? Let the phone do as much as possible Reduce bandwidth consumption Allow disconnected gaming
Preliminary results �Two promising game offload mechanisms �Require 30 -70% less bandwidth Still providing the same level of quality as On. Live �Enable high-end gaming with as little as 400 kbps With a small reduction in video quality
Fidelity Aware MAUI �Current failure model On disconnection do local Simple and effective but slow �Redefined model Give the best results when connected Give results fast at the expense of accuracy �How? Allow applications to adapt Make it as simple as possible for application developers
Questions? � http: //research. microsoft. com/en-us/projects/maui/ �ecuervo@cs. duke. edu
- Slides: 67