Static Memory Management for Efficient Mobile Sensing Applications

Static Memory Management for Efficient Mobile Sensing Applications Farley Lai, Daniel Schmidt, Octav Chipara Department of Computer Science EMSOFT 2015 University of Iowa | Mobile Sensing Laboratory

Introduction Emerging Mobile Sensing Applications • A class of applications that process continuous input data streams and may produce continuous output streams – real-time processing – efficient resource management Speaker Identifier Speech Recording Sensing VAD HTTP Upload Feature Extraction Stream Processing University of Iowa | Mobile Sensing Laboratory Speaker Models 2

Introduction The Memory Management Challenge • Workload: stream operations on frames of samples – e. g. , windowing, splitting, or appending – stream operation tend to be memory intensive • Goal: implement stream operations efficiently – reduce memory footprint – reduce number of memory accesses • Challenges: – handle complex interaction between components – avoid unnecessary memory copies – enable data sharing between components University of Iowa | Mobile Sensing Laboratory 3

Introduction Approaches to Memory Management • Dynamic memory management – specialized data structures to implement memory management • e. g. , Sig. Seg [Girod, et al. 2008] – linked list of buffered samples – a level of indirection in accessing streaming data • Static memory management – no runtime overhead – requires precise knowledge of the variable live ranges • difficult to achieve in complex applications • must be time-efficient to be included in compilers [Girod 2008] L. Girod, Y. Mei, R. Newton, S. Rost, A. Thiagarajan, H. Balakrishnan, and S. Madden, “XStream: a Signal-Oriented Data Stream Management System, ” in ICDE, 2008. University of Iowa | Mobile Sensing Laboratory 4

Outline • • • Application model Static analysis Memory layout Evaluation Conclusions University of Iowa | Mobile Sensing Laboratory 5

A Model for Stream Applications • Stream. It – synchronous data flow (SDF) language – application = graph of filters connected with FIFO channels • limited memory operations: pop(), peek(), and push() • known consumption and production rates OUTPUT: INPUT: Filter: : work() pop peek University of Iowa | Mobile Sensing Laboratory push 6

A Model for Stream Applications • Stream. It – synchronous data flow language – applications are constructed hierarchically • pipeline of streams • split and joins (splitter and joiner) – pass-by-value semantics LPF 1 LPF 2 University of Iowa | Mobile Sensing Laboratory Round-Robin Source Duplicate • naïve implementation would incur significant number of copies Subtract Sink 7

Insight • SDFs may be executed in a cyclo-static schedule – the complete memory behavior of the program may be observed within one execution of the schedule • Our solution: static analysis + memory layout LPF 2, 1 RR, 1 Sub, 1 Sink Source, 1 DUP, 1 LPF 1, 1 LPF 2, 1 RR, 1 Sub, 1 Sink STEADY PHASE: Source LPF 1 LPF 2 University of Iowa | Mobile Sensing Laboratory Round. Robin LPF 1, 1 Duplicate INIT PHASE: Source, 3 DUP, 3 Subtract Sink 8

Component Analysis • Location Sharing – an output element is pushed from an unmodified input element – each I/O element is associated with a pop/push index • Temporal Sharing – an output element reuses the input element storage – each I/O element is associated with a live range [i, j] • Builds on abstract interpretation – build a Control-Flow Graph (CFG) for each filter – abstract interpretation of memory operations University of Iowa | Mobile Sensing Laboratory 9

Component Analysis • Abstract interpretation of memory operations – memory counter (MC) – relative order of operation – indexes of current push (out) and pop (in) – live range for each input (LIN) and output (LOUT) element • Indexes and live ranges represented as intervals • Subset of rules for determining live ranges: MC, out, LOUT [out] ⊔ MC, out++, MC++ push MC, in, LIN[in] ⊔ MC, in++, MC++ (MC 1, in 1, out 1) (MC 2, in 2, out 2) (MC=max(MC 1, MC 2), in= in 1 ⊔ in 2, out=out 1 ⊔ out 2) University of Iowa | Mobile Sensing Laboratory pop join 10
![Example of Component Analysis RULE: CFG: MC, LIN, in pop LIN[in] ⊔ MC, in++, Example of Component Analysis RULE: CFG: MC, LIN, in pop LIN[in] ⊔ MC, in++,](http://slidetodoc.com/presentation_image_h2/70cdf5359a0e5ed56143f5ec30b746b6/image-11.jpg)
Example of Component Analysis RULE: CFG: MC, LIN, in pop LIN[in] ⊔ MC, in++, MC++ STATE: LIN [0, 0] Example 0 MC 0 ∅ ∅ 0 1 LOUT MC 1 in 0 0 in 1 1 out 0 0 LIN[0] =LIN[0]⊔[0, 0] University of Iowa | Mobile Sensing Laboratory | 11
![Example of Component Analysis RULE: CFG: MC, LOUT, out push LOUT [out] ⊔ MC, Example of Component Analysis RULE: CFG: MC, LOUT, out push LOUT [out] ⊔ MC,](http://slidetodoc.com/presentation_image_h2/70cdf5359a0e5ed56143f5ec30b746b6/image-12.jpg)
Example of Component Analysis RULE: CFG: MC, LOUT, out push LOUT [out] ⊔ MC, out++, MC++ STATE: LIN [0, 0] Example [1, 1] 0 0 MC 1 ∅ LOUT 1 MC 2 in 1 1 out 0 0 out 1 1 LOUT[0] =LOUT[0]⊔[1, 1] University of Iowa | Mobile Sensing Laboratory | 12

Example of Component Analysis RULE: CFG: (MC 1, in 1, out 1) (MC 2, in 2, out 2) (MC=max(MC 1, MC 2), in= in 1 ⊔ in 2, out=out 1 ⊔ out 2) join STATE: LIN [0, 0] Example [1, 1] 0 MC 1 ∅ 0 1 MC 2 LOUT MC 2 in 1 1 out 0 0 out 1 1 out 0 1 University of Iowa | Mobile Sensing Laboratory | 13
![Example of Component Analysis RULE: CFG: MC, LOUT, out push LOUT [out] ⊔ MC, Example of Component Analysis RULE: CFG: MC, LOUT, out push LOUT [out] ⊔ MC,](http://slidetodoc.com/presentation_image_h2/70cdf5359a0e5ed56143f5ec30b746b6/image-14.jpg)
Example of Component Analysis RULE: CFG: MC, LOUT, out push LOUT [out] ⊔ MC, out++, MC++ STATE: LIN [0, 0] Example [1, 1] [2, 2] LOUT 0 0 MC 2 [0, 1] MC 3 in 1 1 out 0 1 out 1 2 LOUT[0, 1] =LOUT[0, 1]⊔[2, 2] University of Iowa | Mobile Sensing Laboratory | 14

Whole Program Analysis • Component analysis constructs a memory fragment – captures live ranges for temporal reuse – captures location sharing edges • Whole program analysis constructs a memory graph – stitches together memory fragments – simulates the schedule to • connect location sharing edges into paths and • extend live ranges with the phase number and invocation index • Our approach: – analysis is precise when there is no input dependency – otherwise, it is a sound approximation University of Iowa | Mobile Sensing Laboratory 15

Memory Layout • Empirical insights – split-joins can be eliminated for manipulating location shared elements – a filter usually can reuse its input memory • Heuristic approaches to resolving temporal reuse conflicts B A 0 other comps 0 A memory 0 A A 0 B No conflict 0 A B Append on Conflict (Ao. C) University of Iowa | Mobile Sensing Laboratory B memory B Insert-in-Place (IP) 16

Evaluation Experimental Setup • Intel x 86_64 on Mac OS X 10. 3 – 3 GHz Intel Xeon CPU E 5 -1680 v 2. – 32 KB L 1 instruction + 32 KB L 1 data caches – 256 KB L 2 + 25 MB L 3 caches • Stream. It Compiler – baseline default settings without optimizations – enabled cache optimizations with –cacheopt – gcc –O 3 to compile generated C/C++ code • 11 micro benchmarks from Stream. It • 3 macro benchmarks from real MSAs – Beep [Peng, C. , et al. 2007], – MFCC and Crowd [Xu, C. , et al. 2013] University of Iowa | Mobile Sensing Laboratory 17

Evaluation Memory Usage on Intel x 86_64 73% reductions on average 45% to 96% reductions – ESMS reduces both channel buffer sizes and the number memory operations from splitters, joiners and reordering filters University of Iowa | Mobile Sensing Laboratory 18

Evaluation Speedup on Intel x 86_64 – Compared with baseline Stream. It – The average speedup of AA, Ao. C, and IP are 3, 3. 1, and 3 while the average speedup of Cache. Opt is merely 1. 07. – ESMS improves the performance by eliminating unnecessary memory operations and reducing cache/memory references. University of Iowa | Mobile Sensing Laboratory 19

Conclusions • Static memory management is effective for stream languages – whole program memory behaviors may be characterized – both location and temporal sharing opportunities are exploited – performance improvement due to fewer memory operations and references • ESMS provides significant performance improvements – 45% to 96% data size reduction – 73% code size reduction – 3 X speedup University of Iowa | Mobile Sensing Laboratory 20

CSense Toolkit Acknowledgements • National Science Foundation (Ne. Ts grant #1144664 ) • Carver Foundation (grant #14 -43555 ) University of Iowa | Mobile Sensing Laboratory 21

Thank You Questions? University of Iowa | Mobile Sensing Laboratory 22
- Slides: 22