System Software for Embedded Systems Krithi Ramamritham Kavi

System Software for Embedded Systems Krithi Ramamritham Kavi Arya IIT Bombay Embedded Systems Workshop 2007 © Krithi Ramamritham / Kavi Arya 1

Embedded Systems? © Krithi Ramamritham / Kavi Arya 2

Embedded Systems • Single functional e. g. pager, mobile phone • Tightly constrained – cost, size, performance, power, etc. • Reactive & real-time – e. g. car’s cruise controller – delay in computation => failure of system © Krithi Ramamritham / Kavi Arya 3

Hardware is not the whole System !!! A Micro-Electronic System is the result of a projection of … – Architecture – Hardware – Software … distinguished by its gross Functional Behaviour ! • Software is an important part of the Product and must be part of the Design Process … or we are only designing a Component of the system. © Krithi Ramamritham / Kavi Arya 4

Why Is Embedded Software Not Just Software On Small Computers? • Embedded = Dedicated • Interaction with physical processes – sensors, actuators, processes • Critical properties are not all functional – real-time, fault recovery, power, security, robustness • Heterogeneity – hardware/software tradeoffs, mixed architectures • Concurrency – interaction with multiple processes • Reactivity Source: Edward A. Lee, UC Berkeley SRC/ETAB Summer Study 2001 – operating at the speed of the environment + These features look more like hardware! © Krithi Ramamritham / Kavi Arya 5

What is Embedded SW? One definition: “Software that is directly in contact with, or significantly affected by, the hardware that it executes on, or can directly influence the behavior of that hardware. ” © Krithi Ramamritham / Kavi Arya 6

What is Embedded SW? • What is it not? • Application software can be recompiled and executed on any number of hardware platforms so long as the basic services/libraries are provided. – It is divided by vertical market segments (application domains) – Well-established methodologies, architectures, … – HW platform independent, highly portable • Any SW that has no direct relationship with HW. © Krithi Ramamritham / Kavi Arya 7

Embedded System Challenges for HW Folks • PARADIGM CHANGE! – Designers main tasks convert from processor integration to performance analysis. Concentration on functional requirements instead of integration work – Concentration on architectural exploration (including performance analysis Re-use and Platform-based design become key! Early validation of system/solution correctness Parallel hardware and software development More effective use of previous work Faster ways to build new elements of a solution Ways to test more effectively, efficiently, and quickly © Krithi Ramamritham / Kavi Arya 8

Software Guys can Learn from Hardware Experts! • Concurrency – the synchrony abstraction – event-driven modeling • Reusability – cell libraries – interface definition • Reliability – leveraging limited abstractions – leveraging verification • Heterogeneity – mixing synchronous and asynchronous designs Source: Edward A. Lee, UC Berkeley – resource management SRC/ETAB Summer Study 2001 © Krithi Ramamritham / Kavi Arya 9

Trade-offs. Methodology ESW Architectural specifics • Portability – ESW itself is intended to provide portability for higher SW layers – (At least parts of) ESW is per definition not portable • Real-time – Restricted use of standardized Inter-process communication (IPC) mechanisms (CORBA, …) for performance reasons – Typically hard real-time requirements • RTOS dependency – Implementation of OS like services – Sometimes shielding of the RTOS to higher level SW layers – Direct dependency on RTOS implementation © Krithi Ramamritham / Kavi Arya 10

Functional Design & Mapping F 2 F 5 Source: Ian Phillips, ARM F 4 VSIA 2001 F 3 (F 2) Architectural Design HW 1 (F 5) (F 3) (F 4) HW 2 HW 3 HW 4 Thread F 1 Functional Design RTOS/Drivers Hardware Interface © Krithi Ramamritham / Kavi Arya 11

The Embedded Market: Disruptive Change Source: Jim Ready President / CEO Monta. Vista Software Traditional Embedded World Never small enough Never fast enough Headless/Character-based Standalone Boot & Run from ROM More Hardware than Software Low-Level Programming Model Application tied to hardware © Krithi Ramamritham / Kavi Arya Today’s Embedded World Never functional enough Always connected High Integration Chips (ASIC/SOC) Architectural diversity COTS & custom hardware EPROM/Flash/Rotating Media Software Intensive Web interfaces OOP Programming Model Standard applications • Time to Market Pressures • Shortage of Embed. SW Engineers 12

Plan • Embedded Systems • New Approaches to building ESW – New paradigms: Lava, Handel-C – Examples + “Engineering Returns to Software” – Build a RISC processor in 48 hrs – Advantages of reconfigurable hardware. • Real-time support for ESW © Krithi Ramamritham / Kavi Arya

Motorola Software Survey Findings • Hardware design is a software task: IC designers write code (VHDL, Verilog, Scripting)! • We must become a software-intensive embedded system solutions company, focused on integrating our platforms into users’ products in the future we’ll be neither a hardware nor a software company – Focus on developing systems capability, not just a software counterpart to our current hardware capability (though that’s needed too) – We should have software content from drivers to applications • The fundamental goal isn’t 70% margin on software products, it’s helping someone choose your total solution – Embedded systems platforms and solutions will be the key to market differentiation and profitable growth Source: Bob Altizer, BASYS VSIA 2001 © Krithi Ramamritham / Kavi Arya 14

Common Design Metrics • • • NRE (Non-recurring engineering) cost Unit cost Size (bytes, gates) Performance (execution time) Power (more power=> more heat & less battery time) Flexibility (ability to change functionality) Time to prototype Time to market Maintainability Correctness Safety (probability that system won’t cause harm) © Krithi Ramamritham / Kavi Arya 15

Time to Market Design Metric • Revenues ($) Peak revenue from delayed entry On-time Market fall Market rise • Delayed D On-time entry W Delayed entry 2 W Time • • Simplified revenue model – Product life = 2 W, peak at W – Time of market entry defines a triangle, representing market penetration – Triangle area equals revenue Loss – The difference between the ontime and delayed triangle areas Avg. time to market today = 8 mth 1 day delay may amount to $Ms – see Sony Playstation vs XBox Source: Embedded System Design: Frank Vahid/ Tony Vargis (John Wiley & Sons, Inc. 2002) © Krithi Ramamritham / Kavi Arya 16

NRE and unit cost metrics • Compare technologies by costs -- best depends on quantity – Technology A: NRE=$2, 000, unit=$100 – Technology B: NRE=$30, 000, unit=$30 – Technology C: NRE=$100, 000, unit=$2 • But, must also consider time-to-market Source: Embedded System Design: Frank Vahid/ Tony Vargis (John Wiley & Sons, Inc. 2002) © Krithi Ramamritham / Kavi Arya 17

Losses due to delayed market entry • Revenues ($) Peak revenue from delayed entry • On-time Market fall Market rise Delayed D On-time entry W Delayed entry 2 W Time • Area = 1/2 * base * height – On-time = 1/2 * 2 W * W – Delayed = 1/2 * (W-D+W)*(W-D) Percentage revenue loss = (D(3 WD)/2 W 2)*100% Try some examples – Lifetime 2 W=52 wks, delay D=4 wks – (4*(3*26 – 4)/2*26^2) = 22% – Lifetime 2 W=52 wks, delay D=10 wks – (10*(3*26 – 10)/2*26^2) = 50% – Delays are costly! Source: Embedded System Design: Frank Vahid/ Tony Vargis (John Wiley & Sons, Inc. 2002) © Krithi Ramamritham / Kavi Arya 18

Trends • Moore’s Law – IC transistor capacity doubles every 18 mths – 1981: leading edge chip had 10 k transistors – 2002: leading edge chip had 150 M transistors – 2007: leading edge chip has 1000 M+ transistors (90 nm) • Designer productivity has improved due to better tools: – Compilation/Synthesis tools – Libraries/IP – Test/verification tools – Standards – Languages and frameworks (Handel-C, Lava, Esterel, …) – 1981: designer produced 100 transistors per month – 2002 designer produces 5000 transistors per month – 2007: ? ? ? © Krithi Ramamritham / Kavi Arya 19

Our New Understanding • We have simultaneous optimisations of competing design metrics: speed, size, power, complexity, etc. • We need a “Renaissance Engineer” – with holistic view of design process and comfortable with technologies ranging from hardware, software to formal methods • Maturation of behavioral synthesis tools and other tools has enabled this kind of unified view of hardware/ software codesign. • Design efforts now focus at higher levels of abstraction => abstract specifications now refined into programs and then into gates and logic. • There is no fundamental difference of between what hardware and software can implement. © Krithi Ramamritham / Kavi Arya 20

Designer Productivity • “The Mythical Man Month” by Frederick Brooks ’ 75 • More designers on team => lower productivity because of increasing communication costs between groups • Consider 1 M transistor project: - Say, a designer has productivity of 5000 transistor/mth - Each extra designer => decrease of 100 transistor/mth productivity in group due to comm. costs – 1 designer 1 M/5000 = 200 mth – 10 designer 1 M/(10*4100) = 24. 3 mth – 25 designer 1 M/(25*2600) = 15. 3 mth – 27 designer 1 M/(27*2400) = 15. 4 mth Source: Embedded System Design: Frank Vahid/ Tony Vargis (John Wiley & Sons, Inc. 2002) • Need new design technology to shrink the design gap © Krithi Ramamritham / Kavi Arya 21

Plan • Embedded Systems • New Approaches to building ESW – New paradigms: Lava, Handel-C – Examples + “Engineering Returns to Software” – Build a RISC processor in 48 hrs – Advantages of reconfigurable hardware. • Real-time support for ESW © Krithi Ramamritham / Kavi Arya

Design Productivity Gap • Designer productivity has grown over the last decade • Rate of improvement has not kept pace with the chipcapacity growth • 1981: leading edge chip: – 100 designers * 100 trans/mth => 10 k trans complexity • 2002: leading edge chip: – 30 k designer mth * 5 k trans/mth => 150 M trans complexity • Designers at avg. of $10 k pm => cost of building leading edge chips has gone from $1 M in 1981 to $300 M in 2002 • Need paradigm shift to cope with the complexities of system design © Krithi Ramamritham / Kavi Arya 23

Lava • Not so much a hardware description language • More a style of circuit description • Emphasises connection patterns • Think of Lego © Krithi Ramamritham / Kavi Arya 24

Lava • Mary Sheeran, Koen Classen, & Satnam Singh Chalmers University (Sweden) • Based on earlier work on Mu. FP to describe circuit functionality and layout in single language • Built using functional programming paradigm © Krithi Ramamritham / Kavi Arya 25

Behaviour and Structure f g f ->- g © Krithi Ramamritham / Kavi Arya 26

Lava Properties • • • Higher-order functions – Circuits are functions – May be passed as arguments to other functions. – => Easier to produce parameterized circuits than with VHDL. Functions can return circuits as results – Circuit combinators take circuits as arguments, return circuits as results. – => Powerful glue for composing circuits to form larger systems. Circuit combinators combine behavior + layout – Combinators lay out circuits in rows, columns, triangles, trees etc. Performance of circuit – Improved by exploring the layout design space by experimenting with alternative layout combinators. Examples of circuits produced: – High speed constant coefficient multipliers, finite impulse response filters (1 D and 2 D), adder tree networks and sorting butterfly networks. © Krithi Ramamritham / Kavi Arya 27

Parallel Connection Patterns g f f -|- g © Krithi Ramamritham / Kavi Arya 28

map f f f © Krithi Ramamritham / Kavi Arya 29

Four Sided Tiles © Krithi Ramamritham / Kavi Arya 30

Column © Krithi Ramamritham / Kavi Arya 31

Full Adder cout b a fa sum cin fa (cin, (a, b)) = (sum, cout) where part_sum = xor (a, b) sum = xorcy (part_sum, cin) cout = muxcy (part_sum, (a, cin)) © Krithi Ramamritham / Kavi Arya 32

Generic Adder fa fa adder = col fa fa © Krithi Ramamritham / Kavi Arya 33

Top Level adder 16 Circuit = do a <- input. Vec ”a” (bit_vector 15 downto 0) b <- input. Vec ”b” (bit_vector 15 downto 0) (s, carry) <- adder 4 (a, b) sum <- output. Vec ”sum” s (bit_vector 16 downto 0) ? circuit 2 VHDL ”add 16” adder 16 Circuit ? circuit 2 EDIF ”add 16” adder 16 Circuit ? circuit 2 Verilog ”add 16” adder 16 Circuit © Krithi Ramamritham / Kavi Arya 34

Xilinx FPGA Implementation • 16 -bit implementation on a XCV 300 FPGA • Vertical layout required to exploit fast carry chain • No need to specify coordinates in HDL code © Krithi Ramamritham / Kavi Arya 35

16 -bit Adder Layout Source: Mary Sheeran Nov. 2002 © Krithi Ramamritham / Kavi Arya 36

Four adder trees Source: Mary Sheeran Nov. 2002 © Krithi Ramamritham / Kavi Arya 37

No Layout Information Source: Mary Sheeran Nov. 2002 © Krithi Ramamritham / Kavi Arya 38

Plan • Embedded Systems • New Approaches to building ESW – New paradigms: Lava, Handel-C – Examples + “Engineering Returns to Software” – Build a RISC processor in 48 hrs – Advantages of reconfigurable hardware. • Real-time support for ESW © Krithi Ramamritham / Kavi Arya

Handel-C • Programming language - enables compilation of programs into synchronous hardware • NOT Hardware Description Language - it’s a prog. language aimed at compiling high-level algorithms into gate-level hardware • Syntax (loosely) based on “C” • Handel-C is to hardware (gates) what “C” is to micro-assembly code © Krithi Ramamritham / Kavi Arya

Handel-C (cont. ) • Inventor - Ian Page, Programming Research Group (Oxford University/UK) • Semantics based on Hoare’s Communication Seq. Processes (CSP) model & • Occam: transputer prog. language • Industry heavyweights using tools: Marconi, Ericcson, BAe, Creative Labs, etc. © Krithi Ramamritham / Kavi Arya

What this means • Hardware design produced is exactly the hardware specified in source program • No intermediate “interpreting” layer as in assembly language targeting general purpose microprocessor • Logic gates are assembly instructions of Handel. C system • Design/re-design/optimise at software level!!! © Krithi Ramamritham / Kavi Arya

What This Means • True parallelism – not time-shared (interpreted) parallelism of gen. purpose computers • PAR {a; b} – instructions executed in // at same instant of time by 2 sep. pcs of hw • Timing – branches that complete early forced to wait for slowest branch before continuing © Krithi Ramamritham / Kavi Arya

Comparison with “C” • Similar: - Programs inherently sequential - Similar control-flow constructs: if-then-else, switch, while, for, etc. • Dissimilar : - No malloc/ dynamic store allocation - No recursion (limited rec. in macros) - No nested procedures - No stdin/stdout - “Void main()” - variable width words - PAR, etc. © Krithi Ramamritham / Kavi Arya

Handel-C is based on • ANSI-standard C without external library-functions: – I/O functions: printf(), putc(), scanf(), . . . – File functions: fopen(), fclose(), fprintf(), . . . – String-functions: length(), strcpy(), strcmp(), … – Math-functions: sin(), cos(), sqrt(), … –. . . © Krithi Ramamritham / Kavi Arya

Supported declarations statements & instructions: • Main program structure • • • Variables Arrays Switch statement FOR Loop Comments Constants Scope & Variable sharing Arithmetic, Relational Logic ops Conditional Execution While loop Do … While Loop © Krithi Ramamritham / Kavi Arya

Channel Communication • link!v … link? v – channel input is form of assignment • Provides link between parallel (‘//’) branches – One // branch outputs data onto channel – Other // branch reads data from channel • => Synchronisation – data transfers only when both processes are ready © Krithi Ramamritham / Kavi Arya

Additional Features & Statements • Channel unsigned int 8 a; chan unsigned int 8 c; c ! 5; c ? A; © Krithi Ramamritham / Kavi Arya

Additional Features & Statements • Prialt prialt { case Comms. Statement: Statement break; . . . default: Statement break; } © Krithi Ramamritham / Kavi Arya

Example 1 (sum) Void main() { unsigned int 16 sum; // variable width word unsigned int 8 data; chanin input; // input/output chanout output; } sum=0; do { input? data; sum = sum + (0@data); } while (data!=0); output!sum; © Krithi Ramamritham / Kavi Arya IMPORTANT – width!!

$Example 2 (divider) #define DATA_WIDTH 16 Void main(void) { unsigned int DATA_WIDTH a, mult,$

Example 2 (divider) #define DATA_WIDTH 16 Void main(void) { unsigned int DATA_WIDTH a, mult, result; unsigned int (DATA_WIDTH*2 -1) b; chanin input; chanout output; result = integer(a / b) while (1) { input? a; input? result; b = result @ 0; mult = 1<< (DATA_WIDTH-1) result = 0; <<<<< MAIN LOOP >>>>> output ! Result; } } © Krithi Ramamritham / Kavi Arya

Example 2 (cont. ) while (mult != 0) { if (0 @ a) >= b) par { a -= b <- width(a); result != mult; } par { b = b >> 1; mult = mult >> 1; } } © Krithi Ramamritham / Kavi Arya

Example 3 Parallel tasks Comm between tasks Array of variables Array of channels Parameterised on width Link[0] State[1] Void main(void) { chan unsigned int undefined link[2]; chanin unsigned int 8 input; chanout unsigned int 8 output unsigned int undefined state[3]; par { } © Krithi Ramamritham / Kavi Arya input } while (1) // first queue location { input ? State[0]; link[0] ! State[0]; } while (1) // second queue location { link[0] ? State[1]; link[1] ! State[1]; } while (1) // third queue location { link[1] ? State[2]; output ! State[2]; } Link[1] State[2] output

Additional Features & Statements • Timing An assignment statement takes exactly one clock cycle to execute. Everything else is free void main(void) { unsigned 8 x, y; … x = x + y; } © Krithi Ramamritham / Kavi Arya

Timing/efficiency issues • One clock source for entire program - Assignment & delay take one clock cycle - Expressions are “for free” • Handel-C designed such that experienced programmer can immediately tell which instructions execute on which clock cycles • Example x = y; x = (((y*z) + (w*v) )<<2)<-7; both statements take one clock cycle • Clock at longest logic depth => reduce the depth of logic to speed up program => pipelining © Krithi Ramamritham / Kavi Arya

Porting “C” to Handel-C • • Decide how software maps to hardware platform Partition algorithm between multiple FPGAs Port C to Handel-C & use simulator to check correctness Modify code to take advantage of extra operators in Handel-C - simulate to ensure correctness • Add fine-grain parallelism through PAR & parallel assignments or parallellise algorithm - simulate • Add hardware interfaces for target architecture & map simulator channels communications onto these interfaces - simulate • Use FPGA place & route tools to generate FPGA images © Krithi Ramamritham / Kavi Arya

Design Flow Overview Port algorithm to Handel-C Compile program to. net file for simulator Use simulator to evaluate and debug design Add interfaces to external hardware Use Handel-C compiler to target h/w netlist Use FPGA tools to place & route netlist Program FPGA with result of place & route © Krithi Ramamritham / Kavi Arya Modify/ debug program

Essence • Software approach allows us to rapidly prototype applications for a given domain • Handel-C provides a seamless approach to derive expressive and fast implementations from the software level • Cost of silicon is falling & shortage of trained engineers & high cost of programmer time => Software based, high-level approaches to solving problems become increasingly attractive. © Krithi Ramamritham / Kavi Arya

Handel-C Concepts (Recap) • Describes hardware - h/w design produced = h/w in source program • Logic gates are assembly instructions of Handel-C system • Real parallelism – not interpreted • Assignment, delay take 1 clock cycle; Expression evaluation is free • No side-effects I. e. a++ is statement (not expression as in ‘C’) • Variable width words => great performance improvement over software Min. datapath widths => minimal h/w usage © Krithi Ramamritham / Kavi Arya 59

Additional Features & Statements • Concurrency. . . par { { } … {… } } © Krithi Ramamritham / Kavi Arya 60

Concurrency (example) void main(void) { unsigned 8 x, y; unsigned 5 temp 1; unsigned 4 temp 2; . . . temp 1 = (0@(x <- 4)) + (0@(y <- 4)); temp 2 = (x \ 4) + (y \ 4); x = (temp 2 + (0@temp 1[4])) @ temp 1[3: 0]; } © Krithi Ramamritham / Kavi Arya 61

$Additional Features & Statements • Concurrency. . . par { temp 1=(0@(x<-4))+(0@(y<-4)); temp 2=(x\4)+(y\4);$

Additional Features & Statements • Concurrency. . . par { temp 1=(0@(x<-4))+(0@(y<-4)); temp 2=(x\4)+(y\4); } x=(temp 2+(0@temp 1[4]))@temp 1[3: 0]; . . . © Krithi Ramamritham / Kavi Arya 62

Features & Statements (contd. ) • Delay. . . par { x = 1; { delay; x=2; } } while (x == 0) delay; © Krithi Ramamritham / Kavi Arya 63

Additional Features & Statements • Channel unsigned int 8 a; chan unsigned int 8 c; c ! 5; c ? A; Single variable must not be accessed by >1 // branch => par { out!3; out!4 } // illegal © Krithi Ramamritham / Kavi Arya 64

Features & Statements(contd. ) • Macros(Examples - contd) – Combinatorial macro expr abs(a) = ((a) [width(a)-1] == 0 ? (a) : (-a)); shared expr incwrap(e, m) = (((e==m) ? 0 : (e)+1); – Recursive macro expr copy (e, n) = select(n==1, (e), copy(e, n/2) @ copy(e, n-(n/2))) © Krithi Ramamritham / Kavi Arya 65

Features & Statements(contd) • Operators for Bit Manipulation z = x <- 2; // Take least significant bits z = y \ 2; // Drop least significant bits z = x @ y; // Concatenation z = x[3]; // Bit selection z = y[2: 3]; // Bus selection z = width(x); // Width of expression Note: in the form y[m: n] the order is MSB: LSB Unsigned int 3 y = 4; y[0] is 0; y[2] is 1; © Krithi Ramamritham / Kavi Arya 66

Additional Features & Statements • External RAM / ROM ram unsigned int 4 Ext. RAM[8] with {offchip = 1, data = {"P 01", "P 02", "P 03", "P 04"}, addr = {"P 05", "P 06", "P 07"}, we = {"P 08"}, oe = {"P 09"}, cs = {"P 10"} }; rom unsigned int 4 Ext. ROM[8] with {offchip = 1, data = {"P 01", "P 02", "P 03", "P 04"}, addr = {"P 05", "P 06", "P 07"}, we = {}, oe = {"P 09"}, cs = {"P 10"} }; © Krithi Ramamritham / Kavi Arya 67

Additional Features & Statements • Internal RAM / ROM ram unsigned int 8 speicher[256]; rom unsigned int 8 program[] = {1, 2, 3, 4}; unsigned char i; i = 3; speicher[i] = 25; for (i = 0; i < 4; i++) stdout ! program[i]; © Krithi Ramamritham / Kavi Arya 68

Recursive Macro Expressions – Example • Illustrates the generation of large quantities of hardware from simple macros. • Multiplier whose width depends on the parameters of the macro. • Starting point for generating large regular hardware structures using macros. • Single-cycle long multiplication from single macro: macro expr multiply(x, y) = select(width(x) == 0, 0, multiply(x \ 1, y << 1) + (x[0] == 1 ? y : 0)); a = multiply (b , c); © Krithi Ramamritham / Kavi Arya 69

Timing © Krithi Ramamritham / Kavi Arya 70

Additional Features & Statements • Off-Chip Interface – Input, registered Input, latched Input – Output – Tristate Bus • Off-Chip Interface (examples) interface bus_in (int 4) In. Bus() with {data = {"P 1", "P 2", "P 3", "P 4"} }; int 4 x; x = In. Bus. in; interface bus_out () Out. Bus (x+y) with {data = {"P 11", "P 12", "P 13", "P 14"} }; © Krithi Ramamritham / Kavi Arya 71

Parallel Access to Variables • Rules of parallelism: same variable must not be accessed from two separate parallel branches. (to avoid resource conflicts on the variables) • Actually, the same variable must not be assigned to more than once on the same clock cycle but may be read as often as required (see wires!) • Allows some useful and powerful programming techniques. eg: par { } a = b; b = a; // swaps values of a and b in single clock cycle. © Krithi Ramamritham / Kavi Arya 72

Parallel Access to Variables • Four place queue: while(1) { par { int x[3]; x[0] = in; x[1] = x[0]; x[2] = x[1]; out = x[2]; // values at “out” delayed // by 4 clock cycles } } © Krithi Ramamritham / Kavi Arya 73

Time Efficiency of Handel-C Hardware • Requirement: Clock period for program to be longer than longest path thru combinatorial logic in whole program. • => once FPGA place and route is done, max. clock-rate = 1/longest-path-delay • Example: FPGA place and route tools calculate longest path delay between flip-flops in a design is 70 n. S. • The max. clock rate is 1/70 n. S = 14. 3 MHz. Speed allowed by system: 400 k. Hz - 100 MHz • BUT WHAT IF THIS IS NOT FAST ENOUGH © Krithi Ramamritham / Kavi Arya 74

Improving Time Efficiency • Reducing Logic Depth Avoid multiplication, avoid wide-adders, reduce complex expressions into stages, etc. unsigned 8 x; unsigned 8 y; unsigned 5 temp 1; unsigned 4 temp 2; par { temp 1 = (0@(x<-4)) + (0@(y<-4)); temp 2 = (x \ 4) + (y \ 4); } x = (temp 2+(0@temp 1[4])) @ temp 1[3: 0]; • Pipelining => increased latency for higher throughput © Krithi Ramamritham / Kavi Arya 75

Plan • Embedded Systems • New Approaches to building ESW – New paradigms: Lava, Handel-C – Examples (“Engineering Returns to Software” – Build a RISC processor in 48 hrs – Advantages of reconfigurable hardware. • Real-time support for ESW © Krithi Ramamritham / Kavi Arya

RISC-Processor • Features: – 16 instructions – 4 bit I/O Ports – one accumulator – Program memory (16 x 8 ROM) – Data memory (16 x 4 RAM) • Problem: Execute a program stored in ROM to calculate the first few members of the Fibonacci number sequence. 1, 2, 3, 5, 8, 13, 21, 34, … fib(n) = 1 fib(n) = fib(n-1) + fib(n-2) © Krithi Ramamritham / Kavi Arya if n>=2 if n=0 V n=1 79

RISC-Processor • Instruction Set © Krithi Ramamritham / Kavi Arya 80

RISC-Processor (cont. ) • Program: chanin input; chanout output; // Parameterisation #define dw 32 /* Data width */ #define opcw 4 /* Op-code width */ #define oprw 4 /* Operand width */ #define rom_aw 4 /* Width of ROM address bus */ #define ram_aw 4 /* Width of RAM address bus */ // The opcodes #define HALT 0 #define LOAD 1 #define LOADI 2 #define STORE 3 #define ADD 4 #define SUB 5 #define JUMP 6 #define JUMPNZ 7 #define INPUT 8 #define OUTPUT 9 // The assembler macro #define _asm_(opc, opr) (opc + (opr << opcw)) © Krithi Ramamritham / Kavi Arya 81

RISC-Processor (cont. ) • Program (cont): // Rom program data rom unsigned int undefined program[] = { _asm_(LOADI, 1), /* 0 */ /* Get a one */ _asm_(STORE, 3), /* 1 */ /* Store this */ _asm_(STORE, 1), /* 2 */ _asm_(INPUT, 0), /* 3 */ /* Read value from user */ _asm_(STORE, 2), /* 4 */ /* Store this */ _asm_(LOAD, 1), /* 5 */ /* Loop entry point */ _asm_(ADD, 0), /* 6 */ /* Make a fib number */ _asm_(STORE, 0), /* 7 */ /* Store it */ _asm_(OUTPUT, 0), /* 8 */ /* Output it */ _asm_(ADD, 1), /* 9 */ /* Make a fib number */ _asm_(STORE, 1), /* a */ /* Store it */ _asm_(OUTPUT, 0), /* b */ /* Output it */ _asm_(LOAD, 2), /* c */ /* Decrement counter */ _asm_(SUB, 3), /* d */ _asm_(JUMPNZ, 4), /* e */ /* Repeat if not zero */ _asm_(HALT, 0) /* f */ }; © Krithi Ramamritham / Kavi Arya 84

RISC-Processor (cont. ) • Program (cont): /* RAM for processor */ ram unsigned int dw data[1 << ram_aw]; /* Processor registers */ unsigned int rom_aw pc; /* Program counter */ unsigned int (opcw+oprw) ir; /* Instruction register */ unsigned int dw x; /* Accumulator */ /* Macros to extract opcode and operand fields */ #define opcode (ir <- opcw) #define operand (ir \ opcw) © Krithi Ramamritham / Kavi Arya 85

RISC-Processor (cont. ) • Program (cont): /* Main program */ void main(void) { pc = 0; // Processor loop do { // fetch par { ir = program[pc]; pc = pc + 1; } /* === MAIN DECODE/EXECUTE ===*/ } while (opcode != HALT); } /* main program */ © Krithi Ramamritham / Kavi Arya 86

RISC-Processor (cont. ) • Program (cont): // decode and execute switch (opcode) { case LOAD : x = data[operand<-ram_aw]; break; case LOADI : x = 0 @ operand; break; case STORE : data[operand<-ram_aw] = x; break; case ADD : x = x+data[operand<-ram_aw]; break; case SUB : x = x-data[operand<-ram_aw]; break; case JUMP : pc = operand<-rom_aw; break; case JUMPNZ : if (x!=0) pc=operand<-rom_aw; break; case INPUT : input ? x; break; case OUTPUT : output ! x; break; default : while(1) delay; // unknown opcode } © Krithi Ramamritham / Kavi Arya 87

RISC-Processor (cont. ) • The Final Program! © Krithi Ramamritham / Kavi Arya 88

Simulation & debugging • The simulator is integrated into the compiler. • Executing a cycle-based simulation. • Variables are traceable at any clock cycle. • Port interface will be replaced by standard I/O. • Handel-C simulator supports debugging at any clockcycle. • Highlighting of characteristic Values e. g. Area of any program line. © Krithi Ramamritham / Kavi Arya 89

Some Recent Work • “Customising Graphics Applications: Techniques & Programming Interface” Henry Styles & Wayne Luk, Proceedings of IEEE Symposium on Field Programmable Custom Computing Machines, IEEE Computer Society Press, 2000. • Exploit custom data-formats and datapath widths to optimise graphics operations such as texture mapping & hiddensurface removal. • Discusses techniques for balancing graphics pipeline • Customised architectures captured in Handel-C compiled for Xilinx Virtex FPGAs • Handel-C API based on Open. GL standard for automatic speedup of graphics applications, include Quake-2 action game. © Krithi Ramamritham / Kavi Arya 90

The Graphics Pipeline © Krithi Ramamritham / Kavi Arya 91

Performance Case Studies • Geometric Visualisation Implementation Medium Clock rate (MHz) Frame rate (FPS) Cost Software on PC 400 24 $1, 000 Xilinx XCV 1000 40 41 $4, 000 Nvidia TNT 2 Ultra 170 55 $200 Nvidia is a 3 -D graphics chipset – I. e. specialised graphics ASIC Chart => FPGA platform fast approaching performance of dedicated graphics ASICfor gen. Purpose graphics applications © Krithi Ramamritham / Kavi Arya 92

Performance Case Studies • Infrared Simulation requires custom pixel format not supported by graphics ASICs Implementation Medium Clock rate (MHz) Frame rate (FPS) Software on PC 400 96 $1, 000 Xilinx XCV 1000 40 330 $4, 000 SGI Onyx 2 Reality 180 2750 Onyx contains two 180 MHz MIPs processors, two Geometry Engine processors and two rasteriser ASICs, with a memory Bandwidth of 6. 4 GB/sec (I. e. 10 X cost & mem. b/w of FPGA © Krithi Ramamritham / Kavi Arya Cost $180, 000 93

Performance Case Studies • Quake-2 benchmark requires custom pixel format not supported by graphics ASICs Demonstration Software(fps) XCV 1000(fp s) ASIC(fps) Demo. dm 1 0. 2 14. 4 71. 6 Jail 5059. dm 2 0. 2 15. 0 72. 6 jail 3 A 020. dm 2 0. 3 15. 6 71. 5 Bottleneck is PCI-bus speed limitation. Improve performance by moving FPGA to AGP slot allowing 1 GB/sec transfers between graphics h/w and memory © Krithi Ramamritham / Kavi Arya 94

Some Observations • FPGA renderer is a low-cost platform for custom graphics applications • Development time of a customised FPGA renderer comparable to optimised software => effective to use a reconfigurable platform • Good for reconfigurable designs where ASIC is not available or too expensive • Useful in exploring desirable algorithms and architectures for ASICs • Hardware renderer may be customised to maximixe performance for each application © Krithi Ramamritham / Kavi Arya 95

Some Features of the Rapid Prototyping Board • Full length 32 bit PCI card • Virtex XCV 1000: 1. 000 system gates, • 131 k. Bit Block RAM, 393 k. Bit Select. RAM • Programmable clock 400 k. Hz to 100 MHz • 4 banks of fast asynchronous 32 bit wide SRAM, each 2 Mbytes • PCI interface: 32 bit, 33 MHz, 132 Mbytes/sec burst • 2 x PMC sites for VME grade I/O & processing modules • 50 pin Aux I/O, 8 LEDs © Krithi Ramamritham / Kavi Arya 96

Summary • Cost of silicon is falling & Products are getting more complex & Time-to-market shrinking rapidly & shortage of trained engineers & cost of programmer time is major constraint => • • • Software based, high-level approaches to solving problems become increasingly attractive. New generation of languages let us build systems at high level of abstraction. High-density FPGAs and So. Cs allow complex designs to be rapidly prototyped => reduce the development cycle of new technology – perhaps even to deploy final product as “soft cores”. Broader understanding demanded from system designer – need “Renaissance Engineer” with equal understanding of hardware and software. © Krithi Ramamritham / Kavi Arya 97

Plan • Embedded Systems • New Approaches to building ESW • Real-Time Support – Special Characteristics of Real-Time Systems – Real-Time Constraints – Canonical Real-Time Applications – Scheduling in Real-time systems – Operating System Approaches © Krithi Ramamritham / Kavi Arya 98

What is “real” about real-time? computer world e. g. , PC average response for user, interactive real world industrial system, airplane events occur in environment at own speed occasionally longer reaction too slow: deadline miss reaction: user annoyed reaction: damage, pot. loss of human life computer controls speed of user computer must follow speed of environment “computer time” © Krithi Ramamritham / Kavi Arya “real-time” 99

They Why real-time, why not simply fast? “Fast enough”: dependent on system and its environment and • turtle: fast enough to eat salad • mouse: fast to enough steal cheese • fly: fast enough to escape © Krithi Ramamritham / Kavi Arya 100

what if environment changes? “systems” not fast enough • mouse trap • fly-swatter time scale depends on - or dictated by - environment cannot slow down environment …is the real world © Krithi Ramamritham / Kavi Arya 101

Real-Time Systems I/O - data event time Real-time computing system action I/O - data A real-time system is a system that reacts to events in the environment by performing predefined actions within specified time intervals. © Krithi Ramamritham / Kavi Arya 102

Flight Avionics CLIENT SERVER Constraints on responses to pilot inputs, aircraft state updates © Krithi Ramamritham / Kavi Arya 103

Constraints: –Keep plastic at proper temperature (liquid, but not boiling) –Control injector solenoid (make sure that the motion of the piston reaches the end of its travel) © Krithi Ramamritham / Kavi Arya 104

Real-Time Systems: Properties of Interest • Safety: Nothing bad will happen. • Liveness: Something good will happen. • Timeliness: Things will happen on time -- by their deadlines, periodically, . . © Krithi Ramamritham / Kavi Arya 105

Performance Metrics in Real-Time Systems • Beyond minimizing response times and increasing the throughput: – achieve timeliness. • More precisely, how well can we predict that deadlines will be met? © Krithi Ramamritham / Kavi Arya 107

Types of RT Systems Dimensions along which real-time activities can be categorized: • how tight are the deadlines? --deadlines are tight when the laxity (deadline -- computation time) is small. • how strict are the deadlines? what is the value of executing an activity after its deadline? • what are the characteristics of the environment? how static or dynamic must the system be? Designers want their real-time system to be fast, predictable, reliable, flexible. © Krithi Ramamritham / Kavi Arya 108

Hard, soft, firm • Hard result useless or dangerous if deadline exceeded value • Soft result of some - lower value if deadline exceeded hard soft • Firm If value drops to zero at deadline time - Deadline intervals: result required not later and not before © Krithi Ramamritham / Kavi Arya + deadline (dl) 109

Examples • Hard real time systems – Aircraft – Airport landing services – Nuclear Power Stations – Chemical Plants – Life support systems © Krithi Ramamritham / Kavi Arya • Soft real time systems – Mutlimedia – Interactive video games 110

Real-Time: Items and Terms Task – program, perform service, functionality – requires resources, e. g. , execution time Deadline – specified time for completion of, e. g. , task – time interval or absolute point in time – value of result may depend on completion time © Krithi Ramamritham / Kavi Arya 111

Plan • Special Characteristics of Real-Time Systems • Real-Time Constraints • Canonical Real-Time Applications • Scheduling in Real-time systems • Operating System Approaches © Krithi Ramamritham / Kavi Arya 112

Timing Constraints Real-time means to be in time --how do we know something is “in time”? how do we express that? • Timing constraints are used to specify temporal correctness e. g. , “finish assignment by 2 pm”, “be at station before train departs”. • A system is said to be (temporally) feasible, if it meets all specified timing constraints. • Timing constraints do not come out of thin air: design process identifies events, derives, models, and finally specifies timing constraints © Krithi Ramamritham / Kavi Arya 113

• Periodic – activity occurs repeatedly – e. g. , to monitor environment values, temperature, etc. period time periodic © Krithi Ramamritham / Kavi Arya 114

• Aperiodic – can occur any time – no arrival pattern given time aperiodic © Krithi Ramamritham / Kavi Arya 115

• Sporadic – can occur any time, but – minimum time between arrivals mint time sporadic © Krithi Ramamritham / Kavi Arya 116

Who initiates (triggers) actions? Example: Chemical process – controlled so that temperature stays below danger level – warning is triggered before danger point …… so that cooling can still occur Two possibilities: – action whenever temp raises above warn; event triggered – look every int time intervals; action when temp if measures above warn time triggered © Krithi Ramamritham / Kavi Arya 117

t TT time ET © Krithi Ramamritham / Kavi Arya 118

t TT time ET © Krithi Ramamritham / Kavi Arya 119

ET vs TT • Time triggered – Stable number of invocations • Event triggered – Only invoked when needed – High number of invocation and computation demands if value changes frequently © Krithi Ramamritham / Kavi Arya 120

Other Issues to worry about • Meet requirements -- some activities may run only: – after others have completed - precedence constraints – while others are not running - mutual exclusion – within certain times - temporal constraints • Scheduling – planning of activities, such that required timing is kept • Allocation – where should a task execute? © Krithi Ramamritham / Kavi Arya 122

Plan • Special Characteristics of Real-Time Systems • Real-Time Constraints • Canonical Real-Time Applications • Scheduling in Real-time systems • Operating System Approaches © Krithi Ramamritham / Kavi Arya 123

A Typical Real time system Temperature sensor Input port CPU Memory Heater © Krithi Ramamritham / Kavi Arya Output port 124

Code for example While true do { read temperature sensor if temperature too high then turn off heater else if temperature too low then turn on heater else nothing } © Krithi Ramamritham / Kavi Arya 125

Comment on code • • Code is by Polling device (temperature sensor) Code is in form of infinite loop No other tasks can be executed Suitable for dedicated system or sub-system only © Krithi Ramamritham / Kavi Arya 126

Extended polling example Temperature Sensor 1 Conceptual link Temperature Sensor 2 Heater 1 Task 1 Heater 2 Task 2 Temperature Sensor 3 Heater 3 Task 3 Temperature Sensor 4 Heater 4 Task 4 Computer © Krithi Ramamritham / Kavi Arya 127

Polling • Problems – Arranging task priorities – Round robin is usual within a priority level – Urgent tasks are delayed © Krithi Ramamritham / Kavi Arya 128

Interrupt driven systems • Advantages – Fast – Little delay for high priority tasks • Disadvantages – Programming – Code difficult to debug – Code difficult to maintain © Krithi Ramamritham / Kavi Arya 129

How can we monitor a sensor every 100 ms Initiate a task T 1 to handle the sensor T 1: Loop {Do sensor task T 2 Schedule T 2 for +100 ms } Note that the time could be relative (as here) or could be an actual time - there would be slight differences between the methods, due to the additional time to execute the code. © Krithi Ramamritham / Kavi Arya 130

An alternative… Initiate a task to handle the sensor T 1: Do sensor task T 2 Repeat {Schedule T 2 for n * 100 ms n: =n+1} There are some subtleties here. . . © Krithi Ramamritham / Kavi Arya 131

Clock, interrupts, tasks Interrupts Clock Processor Examines Job/Task queue Task 1 Task 2 Task 3 Task 4 Tasks schedule events using the clock. . . © Krithi Ramamritham / Kavi Arya 132

Flight Simulator CLIENT © Krithi Ramamritham / Kavi Arya SERVER 133

Time Periods to meet Timing Requirements CLIENT Requirement Continuous pilot inputs should be polled at rates greater than 16 ms © Krithi Ramamritham / Kavi Arya SERVER Choice Made The time period of the writer on Client should be less than 16 ms Rationale The writer thread on the Client polls for the pilot inputs from the joystick 134

Time Periods to meet Timing Requirements… CLIENT Requirement The state of the aircraft is to be advanced at 12. 5 ms time steps © Krithi Ramamritham / Kavi Arya SERVER Choice Made Rationale The time period of the Flight Dynamics thread on the Server is 12. 5 The flight dynamics thread on the Server advances the state of the system ms 135

Time Periods to meet Timing Requirements… Requirement Response time for pilots should be less than 150 ms for commercial aircrafts and 100 ms for fighter aircrafts Choice Made Reader and Writer threads on Server, and the Reader thread on the Client should be as fast as the system permits. (Time period of 4 ms in our case) © Krithi Ramamritham / Kavi Arya Rationale • Delay in data transfer at these threads increases the response time • These threads should be interrupt driven in order to minimize the response time 136

Controlling a reaction • we know: – if temperature too high, it explodes – maximum rate of temperature increase – rate of cooling • events: – temperature change – temperature > safe threshold • we can derive: – how often we have to check temperature – when we have to finish cooling © Krithi Ramamritham / Kavi Arya 137

Example – Injection Molding (cont. ) – Timing constraints © Krithi Ramamritham / Kavi

Example – Injection Molding (cont. ) – Concurrent control tasks © Krithi Ramamritham /

Plan • Special Characteristics of Real-Time Systems • Real-Time Constraints • Canonical Real-Time Applications • Scheduling in Real-time systems • Operating System Approaches © Krithi Ramamritham / Kavi Arya 141

Why is scheduling important? Definition: A real-time system is a system that reacts to events in the environment by performing predefined actions within specified time intervals. © Krithi Ramamritham / Kavi Arya 142

Schedulability analysis a. k. a. feasibility checking: check whether tasks will meet their timing constraints. © Krithi Ramamritham / Kavi Arya 143

Scheduling Paradigms Four scheduling paradigms emerge, depending on • whether a system performs schedulability analysis • if it does, – whether it is done statically or dynamically – whether the result of the analysis itself produces a schedule or plan according to which tasks are dispatched at run-time. © Krithi Ramamritham / Kavi Arya 144

1. Static Table-Driven Approaches • Perform static schedulability analysis by checking if a schedule is derivable. • The resulting schedule (table) identifies the start times of each task. • Applicable to tasks that are periodic (or have been transformed into periodic tasks by well known techniques). • This is highly predictable but, highly inflexible. • Any change to the tasks and their characteristics may require a complete overhaul of the table. © Krithi Ramamritham / Kavi Arya 145

2. Static Priority Driven Preemptive Approaches • • Tasks have -- systematically assigned -- static priorities. Priorities take timing constraints into account: – e. g. rate-monotonic the lower the period, the higher the priority. • Perform static schedulability analysis but no explicit schedule is constructed – RMA - Sum of task Utilizations <= ln 2. – Task utilization = computation-time / Period • At run-time, tasks are executed highest-priority-first, with preemptiveresume policy. • When resources are used, need to compute worst-case blocking times. © Krithi Ramamritham / Kavi Arya 146

Static Priorities: Rate Monotonic Analysis presented by Liu and Layland in 1973 Assumptions • Tasks are periodic with deadline equal to period. Release time of tasks is the period start time. • Tasks do not suspend themselves • Tasks have bounded execution time • Tasks are independent • Scheduling overhead negligible © Krithi Ramamritham / Kavi Arya 147

RMA: Design Time vs. Run Time At Design Time: Tasks priorities are assigned according to their periods; shorter period means higher priority Schedulability test Taskset is schedulable if Very simple test, easy to implement. Run-time The ready task with the highest priority is executed. © Krithi Ramamritham / Kavi Arya 148

RMA: Example taskset: t 1, t 2, t 3, t 4 t 1 = (3, 1) t 2 = (6, 1) t 3 = (5, 1) t 4 = (10, 2) The schedulability test: 1/3 + 1/6 + 1/5 + 2/10 ≤ 4 (2(1/4) - 1) ? 0. 9 < 0. 75 ? …. not schedulable © Krithi Ramamritham / Kavi Arya 149

RMA… A schedulability test is • Sufficient: there may exist tasksets that fail the test, but are schedulable • Necessary: tasksets that fail are (definitely) not schedulable The RMA schedulability test is sufficient, but not necessary. e. g. , when periods are harmonic, i. e. , multiples of each other, utilization can be 1. © Krithi Ramamritham / Kavi Arya 150

Exact RMA by Joseph and Pandya, based on critical instance analysis (longest response time of task, when it is released at same time as all higher priority tasks) What is happening at the critical instance? • Let T 1 be the highest priority task. Its response time R 1 = C 1 since it cannot be preempted • What about T 2 ? R 2 = C 2 + delays due to interruptions by T 1. Since T 1 has higher priority, it has shorter period. That means it will interrupt T 2 at least once, probably more often. Assume T 1 has half the period of T 2, R 2 = C 2 + 2 x C 1 © Krithi Ramamritham / Kavi Arya 151

Exact RMA…. In general: Rni denotes the nth iteration of the response time of task i hp(i) is the set of tasks with higher priority as task i © Krithi Ramamritham / Kavi Arya 152

Example - Exact Analysis Let us look at our example, that failed the pure rate monotonic test, although we can schedule it Exact analysis says so. • R 1 = 1; easy • R 3, second highest priority task hp(t 3) = T 1 R 3 = 2 © Krithi Ramamritham / Kavi Arya 153

• R 2, third highest priority task hp(t 2) = {T 1 ,

• R 4, third lowest priority task hp(t 4) = {T 1 , T 3 , T 2 } R 4 = 9 Response times of first instances of all tasks < their periods => taskset feasible under RM scheduling © Krithi Ramamritham / Kavi Arya 155

3. Dynamic Planning based Approaches • Feasibility is checked at run-time -- a dynamically arriving task is accepted only if it is feasible to meet its deadline. – Such a task is said to be guaranteed to meet its time constraints • One of the results of the feasibility analysis can be a schedule or plan that determines start times • Has the flexibility of dynamic approaches with some of the predictability of static approaches • If feasibility check is done sufficiently ahead of the deadline, time is available to take alternative actions. © Krithi Ramamritham / Kavi Arya 156

4. Dynamic Best-effort Approaches • The system tries to do its best to meet deadlines. • But since no guarantees are provided, a task may be aborted during its execution. • Until the deadline arrives, or until the task finishes, whichever comes first, one does not know whether a timing constraint will be met. • Permits any reasonable scheduling approach, EDF, Highest-priority, … © Krithi Ramamritham / Kavi Arya 157

Cyclic scheduling • Ubiquitous in large-scale dynamic real-time systems • Combination of both table-driven scheduling and priority scheduling. • Tasks are assigned one of a set of harmonic periods. • Within each period, tasks are dispatched according to a table that just lists the order in which the tasks execute. • Slightly more flexible than the table-driven approach • no start times are specified • In many actual applications, rather than making worsecase assumptions, confidence in a cyclic schedule is obtained by very elaborate and extensive simulations of typical scenarios. © Krithi Ramamritham / Kavi Arya 158

Plan • Special Characteristics of Real-Time Systems • Real-Time Constraints • Canonical Real-Time Applications • Scheduling in Real-time systems • Operating System Approaches © Krithi Ramamritham / Kavi Arya 159

Real-Time Operating Systems Support process management and synchronization, memory management, interprocess communication, and I/O. Three categories of real-time operating systems: small, proprietary kernels. e. g. VRTX 32, p. SOS, Vx. Works real-time extensions to commercial timesharing operatin systems. e. g. RT-Linux, RT-NT research kernels e. g. MARS, ARTS, Spring, Polis © Krithi Ramamritham / Kavi Arya 160

Real-Time Applications Spectrum Hard Soft © Krithi Ramamritham / Kavi Arya Real-Time Operating System General-Purpose Operating System Vx. Works, Lynx, QNX, . . . Intime, Hyper. Kernel, RTX Windows CE Windows NT 161

Real-Time Applications Spectrum Hard Real-Time Operating System Vx. Works, Lynx, QNX, . . . Intime, Hyper. Kernel, RTX Windows CE Windows NT Soft © Krithi Ramamritham / Kavi Arya General-Purpose Operating System 162

Embedded (Commercial) Kernels Stripped down and optimized versions of timesharing operating systems. • Intended to be fast – – • To deal with timing requirements – – – • a fast context switch, external interrupts recognized quickly the ability to lock code and data in memory special sequential files that can accumulate data at a fast rate a real-time clock with special alarms and timeouts bounded execution time for most primitives real-time queuing disciplines such as earliest deadline first, primitives to delay/suspend/resume execution priority-driven best-effort scheduling mechanism or a table-driven mechanism. Communication and synchronization via mailboxes, events, signals, and semaphores. © Krithi Ramamritham / Kavi Arya 163

Real-Time Extensions to General Purpose Operating Systems E. g. , extending LINUX to RT-LINUX, NT to RT-NT • Advantage: – based on a set of familiar interfaces (standards) that speed development and facilitate portability. • Disadvantages – Too many basic and inappropriate underlying assumptions still exist. © Krithi Ramamritham / Kavi Arya 164

Using General Purpose Operating Systems • GPOS offer some capabilities useful for real-time system builders • RT applications can obtain leverage from existing development tools and applications • Some GPOSs accepted as de-facto standards for industrial applications © Krithi Ramamritham / Kavi Arya 165

Real Time Linux approaches 1. Modify the current Linux kernel to handle RT constraints – Used by KURT 2. Make the standard Linux kernel run as a task of the real-time kernel – Used by RT-Linux, RTAI © Krithi Ramamritham / Kavi Arya 166

Modifying Linux kernel • Advantages – Most problems, such as interrupt handling, already solved – Less initial labor • Disadvantages – No guaranteed performance – RT tasks don’t always have precedence over non. RT tasks. © Krithi Ramamritham / Kavi Arya 167

Running Linux as a process of a second RT kernel • Advantages –Can make hard real time guarantees –Easy to implement a new scheduler • Disadvantages –Initial port difficult, must know a lot about underlying hardware –Running a small real-time executive is not a substitute for a full-fledged RTOS © Krithi Ramamritham / Kavi Arya 168

Windows NT -- for RT applications? • Scheduling and priorities – Preemptive, priority-based scheduling non-degradable priorities priority adjustment – – No priority inheritance No priority tracking Limited number of priorities No explicit support for guaranteeing timing constraints © Krithi Ramamritham / Kavi Arya 170

NT Thread Priority = Process class + level 26 25 24 23 22 Time-critical Highest Above Normal Below Normal Lowest 16 Idle 31 15 Time-critical 15 14 13 12 11 High class Normal class Thread Level Dynamic classes 11 10 9 8 7 Idle class © Krithi Ramamritham / Kavi Arya Real-time class 6 5 4 3 2 1 Idle 171

Windows NT -- for RT applications? • Scheduling and priorities – Preemptive, priority-based scheduling non-degradable priorities priority adjustment – – No priority inheritance No priority tracking Limited number of priorities No explicit support for guaranteeing timing constraints © Krithi Ramamritham / Kavi Arya 172

NT Scheduling • Threads scheduled by executive. • Priority based preemptive scheduling. Interrupts Deferred Procedure Calls (DPC) System and user-level threads © Krithi Ramamritham / Kavi Arya 173

Windows NT -- for RT applications? (contd. ) • Quick recognition of external events – Priority inversion due to Deferred Procedure Calls (DPC) • • I/O management Timers granularity and accuracy – High resolution counter with resolution of 0. 8 sec. – Periodic and one shot timers with resolution of 1 msec. • Rich set of synchronization objects and communication mechanisms. – Object queues are FIFO © Krithi Ramamritham / Kavi Arya 174

Research Operating Systems • MARS – static scheduling • ARTS – static priority scheduling • Spring –dynamic guarantees © Krithi Ramamritham / Kavi Arya 175

MARS -- TU, Vienna (Kopetz) Offers support for controlling a distributed application based entirely on time events (rather than asynchronous events) from the environment. • A priori static analysis to demonstrate that all the timing requirements are met. • Uses flow control} on the maximum number of events that the system handles. • Based on the time driven model -- assume everything is periodic. • Static table-driven scheduling approach • A hardware based clock synchronization algorithm • A TDMA-like protocol to guarantee timely message delivery © Krithi Ramamritham / Kavi Arya 176

ARTS -- CMU (Tokuda, et al) • The ARTS kernel provides a distributed real-time computing environment. • Works in conjunction with the static priority driven preemptive scheduling paradigm. • Kernel is tied to various tools that a priori analyze schedulability. • The kernel supports the notion of real-time objects and real-time threads. • Each real-time object is time encapsulated -- a time fence mechanism: The time fence provides a run time check that ensures that the slack time is greater than the worst case execution time for an object invocation © Krithi Ramamritham / Kavi Arya 177

SPRING – Umass. (Ramamritham & Stankovic) • Real-time support for multiprocessors and distributed sys • Strives for a more flexible combination of off-line and on-line techniques – Safety-critical tasks are dealt with via static table-driven scheduling. – Dynamic planning based scheduling of tasks that arrive dynamically. • Takes tasks' time and resource constraints into account and avoids the need to a priori compute worst case blocking times • Reflective kernel retains a significant amount of application semantics at run time – provides flexibility and graceful degradation. © Krithi Ramamritham / Kavi Arya 178

Polis: Synthesizing OSs • • Given a FSM description of a RT application Each FSM becomes a task Signals, Interrupts, and polling Tasks with waiting inputs handled in FIFS order (priority order – TB done) • Some interrupts can be made to directly execute the corresponding task • Needed OS execute synthesized based on just what is needed © Krithi Ramamritham / Kavi Arya 179

Middleware Software • Rapid rate of innovation of hardware and software – Standard: hardware, software API and protocols • Need to reuse commercial-off-the-shelf (COTS) components • Standard-based COTS middleware: – Decouple applications from underlying technology – Facilitate rapid incorporation of hardware and software advances – Capable of evolving to new environments and requirements – Reduce time and effort to develop applications • CORBA (Common Object Request Broker Architecture) © Krithi Ramamritham / Kavi Arya 180

References 1. Lava material based on personal communication with Mary Sheeran with illustrations, Nov. 2002. Also “A Tutorial on Lava: A Hardware Description Language and Verification System”, Koen Claessen, Mary Sheeran, Aug. 2000. 2. Handel-C material based on “Handel-C v 3. 0 Language Reference Manual”, 2001, Celoxica Ltd. 3. “Embedded System Design: A Unified Hardware/ Software Introduction”, Frank Vahid, Tony Givargis, John Wiley & Sons Inc. , 2002. 4. “Customising Graphics Applications: Techniques & Programming Interface” Henry Styles & Wayne Luk, Proceedings of IEEE Symposium on Field Programmable Custom Computing Machines, IEEE Computer Society Press, 2000. © Krithi Ramamritham / Kavi Arya 181

References… 7. K. Ramamritham and J. A. Stankovic, Scheduling Algorithms and Operating Systems Support for Real-Time Systems, Proceedings of the IEEE, Jan 1994, pp. 55 -67. 8. K. Ramamritham et al. Using Windows NT for Real-Time Applications: Experimental Observations and Recommendations, IEEE Real-Time Technology and Applications Conference, June 1998. 9. RT-Linux : http: //www. rtlinux. org © Krithi Ramamritham / Kavi Arya 182

Summary • What are Embedded Systems? • What is Embedded software? • New Approaches to building ESW • Real-time support for ESW © Krithi Ramamritham / Kavi Arya