Advanced Embedded Systems Lecture 10 Embedded operating systems

Advanced Embedded Systems Lecture 10 Embedded operating systems (2) 1

Advanced Embedded Systems Intertask communication mechanisms n n n The mechanism used to implement intertask communication can affect performance and energy consumption; General purpose operating systems use intertask communication for transferring large amounts of data; while real-time systems may also have to transfer many data, they must also be optimized to meet real -time requirements; Intertask communication is made through semaphores, buffers, queues and mailboxes; A semaphore is a flag; setting a semaphore can activate a task; the semaphores are accessed by the tasks; A buffer is a memory area; a task may ask a buffer from the RTOS, put the data in it and tell the RTOS to pass the data to an other task; the output task typically receives pointers telling it where the buffer is in memory and how many bytes are there; 2

Advanced Embedded Systems n A queue is a string of buffers; a task can place a message in a buffer but the output task may be busy; the sending task asks the RTOS to put the message in a queue after other messages until the output task will be ready; q A circular queue or ring buffer is used in the same way as a first in/ first out list: n n n Ring buffers are easier to manage; In the ring buffer, simultaneous input and output to the list are achieved through head and tail pointers; data are loaded at the tail and are read from the head; Additional code is necessary to test for the overflow condition in the ring buffer; an overflow occurs when an attempt is made to write data to a full queue; 3

Advanced Embedded Systems n n A RTOS uses the mailboxes for receiving, for a task, messages from other tasks; the messages are stored until the task is ready; Next figure shows the solutions for the intertask communication: 4

Advanced Embedded Systems n Problems with semaphores: q q Problems may arise if the operation of testing and setting a semaphore is not atomic, that is uninterruptible; Ex. : procedure P(var S: boolean); begin while S = TRUE do; S : = TRUE end and the corresponding assembly code: LOAD R 1, S TEST R 1, 1 JEQ @1 S = TRUE ? STORE S, 1 S : = TRUE q q Suppose the process using this semaphore primitive is interrupted between the TEST and STORE instructions; The interrupt routine, which might use the same resource, finds S to be available and begins to use it; 5

Advanced Embedded Systems q q If this task then suspends, for example because its time slice ended, and the interrupted task resumes, it will see the device as free because the old contents of S are still in R 1; Thus, two tasks attempt to use the same resource and a collision occurs; Worse: this problem may occur infrequently being difficult to test and detect; A solution : the test-and-set instruction: n n n The instruction fetches a word from memory and tests one of its bits, for ex. the most significant; If the bit is 0, it is set to 1 and stored again, and a condition code of 0 is returned; If the bit is 1, a condition code of 1 is returned and no store is performed; The fetch, test and store operations are indivisible; Some processors have in their instruction sets test-and-set instructions (Freescale family) others do not; In the later case the feature must be implemented using specific solutions: for example the LOCK prefix (at the x 86 family) or by disabling and enabling the interrupt system; 6

Advanced Embedded Systems Power management n n n Hardware may provide different mechanisms to manage power, such as sleep modes and clock rate control; Methods that reconfigure system state to optimize power consumption are known as dynamic power management; they are managed by the operating system which provides a software interface to the system tasks; The operating system sees its own power states as a resource to be managed along with other resources; Centralizing the control of the power management mechanisms in the operating system allows the OS to ensure that all necessary components are notified of any change to the power management state; Power modes are managed in PC by the Advanced Configuration and Power Interface; The hardware solutions are more appropriate in ESs; 7

Advanced Embedded Systems File systems in embedded operating systems n n n Limitations on power consumption and code size cause embedded file systems to be designed differently than in case of workstation oriented file systems; The most important difference is caused by the fact that flash memory is often used as the storage component for embedded file systems; Embedded file systems may vary according to different criteria: q q Compatibility: some ESs do not directly expose their file systems to other computers and so the file system may use any internal structure; other devices, especially those with removable media, need compatibility with other systems; compatibility is important particularly in some types of consumer devices like CD/ MP 3 players which must play MP 3 files and audio CDs written by other devices (PC); Writeability: some devices, such as CD players, need only read files; others, like digital cameras, must be able to write files as well; 8

Advanced Embedded Systems n n Flash based memory introduces new challenges; The first major difference is that flash cannot be written at word level as RAM; flash memory must first be erased, at block level, and then written; flash memory blocks may be as large as 64 kbytes in size which is considerably large; erasing the block needs more time than reading it; The second major difference is that a program/ erase cycle wears the device; the voltages applied during the cycle stresses the circuit and may cause the fail of the memory cell; today’s flash memories can withstand a million program/ erase cycles but a careful design of the file system can decrease the number of these operations increasing the lifetime of the memory device; There are 2 technologies for flash memory: NAND and NOR; NOR flash memories can be read similar as the RAMs are read but NAND flash memories are more prone to transient read errors and must be accessed as block devices; as a consequence, different file system implementations are developed for NAND based or NOR based flash memories; 9

Advanced Embedded Systems n n n Because flash memories wears out much more quickly with writes than other types of permanent storage, wear-leveling techniques are used for maximizing the lifetime of the flash memory; Such a technique distributes writes around the memory avoiding the excessively use of one block; However a problem is with the file allocation table: whenever a file is created, destroyed or changed in size, the file allocation table must be updated and it can wear out much more quickly than the reminder of the flash memory; This is why it is recommended formatting or bulk erasing flash memories rather than deleting individual files (bulk erasure performs many fewer program/ erase cycles than individual file deletion); The fig. shows the organization of a virtual mapping based flash memory system; q q It handles wear leveling and the operations particular to flash memory based file system; The file system sees the memory as a linear array of bytes addresses with virtual addresses; 10

Advanced Embedded Systems q q q The virtual mapping system uses a virtual memory mapping table to translate the virtual addresses into physical addresses in the flash memory; The virtual mapping table may be stored entirely in the flash, or it may be cached in RAM in the host processor; The virtual mapping system can handle several tasks: n n Manage the scheduling of block program/ erase operations; Consolidate data, moving some data in a block to a new location to make an entire block empty so that the block can be erased and reused; Identify bad blocks of memory, much as a magnetic disc controller substitutes good sectors for bad sectors; Occasionally move infrequently modified data to a new location to equalize wear levels across the memory; 11

Advanced Embedded Systems Memory management in embedded operating systems n n Memory management is typically done by general purpose operating systems; Reasons for a RTOS for providing memory management: q q n n Memory mapping hardware can protect the memory spaces of the processes when outside programs are run on the ES; Memory management can allow a program to use a large virtual address space; Memory management means: memory tasks management, memory allocation, memory loading calculation and time loading calculation; Memory tasks management: q q Tasks switching needs to save and restore in and from memory the context of each task; for that, the task-control block model or the runtime stacks are used; Each task has a task-control block and a list is created; this can be either fixed or dynamic; 12

Advanced Embedded Systems q q q In the fixed case, n task-control blocks are allocated to n tasks (all in dormant state) at system generation; as tasks are created, the taskcontrol block enters the ready state and will be used when the tasks are executed; if a task is to be deleted, its task-control block is placed in the dormant state; no real-time memory management is necessary; In the dynamic case, task-control blocks are added to a dynamic data structure (e. g. a linked list) as tasks are created; when a task is deleted, its task-control block is removed from the data structure and the memory becomes unoccupied or available; real time memory management is necessary for supplying the task-control blocks; A run-time stack needs several conditions to be carried out: n n n Two routines, “save” and “restore”, are necessary; the “save” routine saves the current context into a stack; the operation must be done immediately after the interrupts have been disabled; the “restore” routine should be called just before interrupts are enabled and before returning in the main program; Maximum stack size needs to be known in advance; if it is not known, a catastrophic memory allocation can occur and the event determinism will be missed; ideally, provision for at least 1 -2 more tasks then anticipated should be allocated to the stack for spurious interrupts and time overloading; Often a single run-time task is not enough in a multitasking ES; advantages: q q It permits tasks to interrupt themselves (e. g. in case of spurious interrupts); Languages which support reentrancy and recursion (such as C) can be used; a single stack model is suited only for non-re-entrant languages, such as the assembly language; 13

Advanced Embedded Systems n The fig. shows the possibly effect of a “save” and a “restore” routine: 14

Advanced Embedded Systems n Memory allocation: q q q Dangerous allocation must be avoided; it precludes system determinism, it can destroy event determinism (e. g. overflowing the stack) or temporal determinism (e. g. by entering in a deadlock situation); Two types of memory allocation: static and dynamic; only the dynamic one will be discussed; Different types of memory may coexist in ESs; they must be known by the RTOS and by the tasks; if a task that launches a memory request doesn’t know the memory characteristics, the performance may be affected (e. g. the internal or external memory at microcontrollers); The memory allocation is made in contiguous blocks called segments or pages; the segment and page notions are specific to 16 and 32 bit processors which manage them and verify the access to them through OS and hardware; However, they exist at 8 bit processors too, no difference being made between them; the physical memory space was extended using I/O ports or I/O bits (at microcontrollers) without the processor awareness; 15

Advanced Embedded Systems q Techniques for memory allocation: n n q Swapping, MFT, MVT, Demand paging; Swapping: the simplest scheme to allocate memory to 2 processes; n n n The OS and one process co-reside in the memory space not required by the OS, called user space; when a second process needs to run, the first process is suspended and swapped, along with its context, to a secondary storage device (disk); the second process, along with its context, is loaded in the user space and initiated by the dispatcher; The solution can be used along with round-robin or preemptive techniques but the execution time of each process would be long because of the swap time; the principal component of the swap time is the access time of the secondary storage device; Overlaying is a special case of swapping; it permits that a single program is larger than the allowable user space; the program is broken up into code and data sections called overlays, which can fit into available memory, and which are swapped; 16

Advanced Embedded Systems q MFT (Multiprogramming with a Fixed number of Tasks): n n The user space is divided into a number of fixed-size partitions, allowing that more than one process is memory-resident at any one time; it is useful when the number of tasks to be executed is known and fixed, as in many ESs; MFT uses memory inefficiently because of the fixed size; external fragmentation occurs when a memory request cannot be satisfied because a contiguous block of the needed size does not exist even if the total amount of available memory is enough: internal fragmentation occurs if a process needs less memory than the partition size; it can be reduced by creating fixed partitions with different sizes and then allocating the smallest partition greater than the required amount; real time performances are degraded because of the overhead associated; 17

Advanced Embedded Systems q MVT (Multiprogramming with a Variable number of Tasks): n n n Memory is allocated in variable amounts, determined by the requirements of the process to be loaded in memory; this solution is more appropriated when the number of tasks is unknown or varies; small or no internal fragmentation occurs so the memory utilization is better than for MFT; External fragmentation can still occur because of the dynamic nature of memory allocation and deallocation and because memory must be allocated to a process contiguously; it can be mitigated by compaction, the process of compressing fragmented memory; compaction is a CPU intensive process and is not encouraged in ES; if it must be performed, it should be done in the background with the interrupt system disabled; MVT is not appropriate for ES because its context switching overhead is much higher than in MFT; 18

Advanced Embedded Systems q Demand paging: n n n n It is possible to load program segments in noncontiguous memory as they are requested in fixed-size chunks called pages; External fragmentation is minimized; Program code that is not held in main memory is swapped to secondary storage, usually a disk; if a memory request is made to a location within a page not loaded in the main memory, a page fault exception is generated; The interrupt handler searches the requested page in the secondary storage and loads it in main memory if there is free space or swaps it with an already loaded page; a replacing algorithm must be implemented, the most spread being the LRU (Least Recently Used); Paging is advantageous because it allows nonconsecutive references to pages via a page table; Paging can be used in conjunction with switching hardware to extend the virtual address space; Pointers are used to access the desired page; they may represent memorymapped locations to map into the desired hard-wired memory bank; Pointers may be implemented through associative memory or may be simple offsets into memory; 19

Advanced Embedded Systems n In the latest case, the actual address in main memory needs to be calculated with each memory reference; n The technique uses efficiently the memory but it is not appropriate in ESs because of the great overhead caused by the page swapping (the disk access time) and the associated hardware support (disk) is not usually available; Another disadvantage: the lack of predictable execution times just because of the switching overhead; the solution consists of locking into main memory certain code and data segments or pages, along with their run-time stack; they will not be swapped out so the execution times for the locked processes will decrease and, more important, the execution times will be guaranteed; but fewer pages will be available for the application; n 20

Advanced Embedded Systems n n n n Another disadvantage: thrashing, which is a very high paging activity; Example: let the execution of an instruction that needs a source operand a memory write; the code and the source operand are read but a page fault is generated at the writing because the needed location is not resident; the interrupt handler searches a page from the main memory to be put on the disk for freeing space for an incoming page; many algorithms choose the page that was not modified thus saving the disk access time; a good candidate may be even the page which contains the source operand; it is replaced with the page containing the destination and the instruction is restarted; a new page fault will be generated because the source operand is now missing and so on; Thrashing can be eliminated if the instruction continues its execution instead of restarting it but this operation is difficult because high amount of context must be saved at the swapping; Another disadvantage: memory locking; In a real time system is often desirable to lock all or certain pages of a process into memory in order to reduce the overhead involved in paging and to make the execution times more predictable; Any process with one or more locked pages is prevented from being swapped out to disk; Advantage: decreases execution times for the locked modules and guarantees execution times; 21

Advanced Embedded Systems n q Disadvantage: fewer pages are available for the application; Shortly about paging: n n n n n Paging is more efficient when supported by the appropriate hardware; Paging allows multitasking and extension of the address space; When a page is referenced that is not in main memory, a page fault occurs, causing an interrupt; The hardware registers that are used to do page frame address translation are part of a task’s context and give additional overhead when doing a context switch; If hardware page mapping is not used, then additional overhead is incurred in the physical address calculation; The least recently rule is the best nonpredictive page swapping algorithm; The main disadvantages for real time systems are: thrashing and lack of predictable execution times; All the dynamic memory allocation techniques (swapping, MFT, MVT and demand paging) must be avoided in time critical applications because the overhead they introduce and the additional hardware they need; Time critical applications must be solved by unitask ESs; 22

Advanced Embedded Systems n Memory load calculation: q q Memory load is important in an ES showing how efficient is the memory used; it may lead to savings in space, power consumption and cost which is desirable in all ESs; One can consider that in an ES the memory is divided into: stack or system area, program area and RAM area; the total memory loading is typically the sum of the three individual memory loadings, that is: MT = MPx. PP + MRx. PR + MSx. PS q q where MT is the total memory loading, MP, MR and MS are the memory loadings for the program, RAM and stack areas and PP, PR and PS are percentages of the total memory allocated for the program, RAM and stack areas, respectively; Memory mapped I/O and DMA memory where not included since they are fixed in hardware and generally need few locations; The program area contains executable code of the real time program, including the application software and the RTOS; in addition, fixed constants can be stored in this area; 23

Advanced Embedded Systems q Program memory loading is calculated as follows: MP = U P / T P , where MP is the memory loading for the program area, UP is the number of locations used in the program area and TP is the total available locations in the program area; the linker offers this numbers; q q The RAM area memorizes global variables, data and, sometimes, instructions for increased fetching speed and modifiability; although the size of this area is determined at system design time, the loading factor for this area is not determined until the application program have been completed; The RAM memory loading can be computed as: MR = U R / T R , where MR is the memory loading for the RAM area, UR is the number of locations used in the RAM area and TR is the total available locations in the RAM area; again, the linker offers this numbers; 24

Advanced Embedded Systems q q The stack area is used for context savings and automatic variables; one or more stacks may be kept in this area; The maximum stack size is: US = c. S x tmax, q where US is the stack size, c. S is the maximum number of locations for the context of a task (locations for registers, program counter, automatic variables etc. ) and tmax is maximum number of tasks that can be in the system at any time; Hence, the memory loading factor will be: MS = U S / T S , q where MS is the memory loading for the stack area, US is the number of locations used in the stack area and TS is the total available locations in the stack area; If MT ≥ 100% the memory is overloaded and the system cannot operate but the same result is obtained even if MT < 100% and MP, MR or MS ≥ 100%; 25

Advanced Embedded Systems q q There are few solutions for reducing memory loading: variable selection, memory fragmentation, reuse variables and self-modifying code; Variable selection: memory loading in one area can be reduced at the expense of another; for example all automatic variables (variables that are local to procedures) increase the loading in the stack area, whereas appear in the RAM area; by forcing variables to be either local or global the memory load can be balanced between the two areas; in addition, intermediate results calculations that are computed explicitly require a variable either in the stack or in the RAM area, depending on whether it is local or global; Memory fragmentation: reduces the memory loading by increasing the TP factor and decreasing the UP factor; the memory fragmentation favorites the memory loading; although sufficient memory is available it is not contiguous and cannot be used; Reuse variables: global variables that are used only once, for example during initialization, can be reused later for other purposes; the variable names must be generic since they will be playing a dual role; attention must be paid for preventing the destroying of the content of a variable by a process if it will be used by another too; 26

Advanced Embedded Systems q Self-modifying code: it is a dangerous method; n n It is based on the fact that the opcodes of certain instructions differ by only one bit; for example, by modifying one bit in the opcode of a JUMP instruction an ADD instruction is created; The method is based on coincidence and its main disadvantage is that it destroys the program’s determinism; In addition, many processors include on-chip caches and the cache does not update the code and executes the unmodified code; modifying code within the cache causes performance degradation; Time load calculation: q q It is necessary to know the execution time of various modules and the overall system time loading for choosing the adequate design solutions, including the hardware solutions and the testing and debugging operations; Several methods can be used to predict or measure module execution time and system time loading: n n Using specific measurements instruments; Instruction counting; 27

Advanced Embedded Systems q Using specific measurements instruments: n n n q The logic analyzer and the oscilloscope; the best results are given by the logic analyzer; It takes into account hardware latencies and other delays; its drawbacks are that the software must be coded (at least partially) and the target hardware must be available; thus it is usually employed in late stages: the testing phase, the system integration phase; The oscilloscope is cheaper but offers less information; Instruction counting: when it is too early for the logic analyzer, or if one is not available, it is the best method for determining time loading due to code execution time; it requires that the code already is written; the approach involves tracing the longest path through the code, counting the instruction types along the way and adding their execution times n The time loading can be calculated as follows: T = A 1/T 1 + A 2/T 2 + … + An/Tn, n n where T is the time loading, n is the number of tasks, Ai is the execution time for task i and Ti is the number of execution cycles in task i; The instruction execution times are required beforehand, they can be obtained from the data sheets, simulators or by direct measurements; the number of wait states, if they exist, must be known in advance; The method supposes that there is no overlapping between the instructions execution times; 28

Advanced Embedded Systems n Memory management in Windows CE q q q Windows CE is a real time, full featured operating system for lightweight consumer devices; Windows desktop applications cannot run on Windows CE directly, but the operating system is designed to simplify porting Windows applications to Windows CE; Windows CE supports virtual memory; the paging memory can be supplied by a flash memory or by a disk; The operating system supports a flat 32 bit virtual address space; the bottom 2 GB of the address space is for user processes, while the top GB is for the kernel; the kernel address space is statically mapped into the address space; The user address space is dynamically mapped; it is divided into 64 slots of 32 MB each: n n Slot 0 holds the current running processes; Slots 1 – 32 are the processes with slot 1 containing the Dynamically Linked Libraries (DLLs); only 32 processes can run at any one time; Slots 33 – 62 are for memory mapped files, operating system objects etc. Slot 63 holds resource mapping; 29

Advanced Embedded Systems q Each process slot is divided into several sections; the bottom 64 KB is used as a guard section; the user code grows from the bottom up; the memory required for DLLs that are called by the processes grows from the top of the memory space down; 30

Advanced Embedded Systems Scheduling and interrupts in RTOSs n The two key elements of a RTOS which determine the real time behavior of the OS: q q n n The scheduler; The interrupt handling mechanism; The interrupt system provides its own priorities for the interrupt handlers; the interrupt handlers can be seen as a distinct set of processes that are separate from the operating system’s regular processes; The interrupt system priorities are determined by the hardware and internally; All interrupts handlers have priority over the operating system processes, since an interrupt will be automatically fielded unless the interrupts are masked; The interrupts must not subvert the operating system’s scheduler; it is a problem of the operating system designer; 31

Advanced Embedded Systems n n Interrupts handlers that are dispatched by the hardware interrupt system must be fast; In many real time systems, there are two types of handling interrupts: q q n The Interrupt Service Routine, ISR: is dispatched by the hardware interrupt system; The Interrupt Service Thread, IST: it is a user mode process; the ISR performs the minimum work necessary to field the interrupt; it then passes data to the IST that will finish the task; Interrupts in Windows CE: q q Divides interrupt handling into an ISR and an IST; It provides two types of ISRs: n n q Static ISRs which are build into the kernel; they provide one way communication to their IST; Installable ISRs which can be dynamically loaded into the kernel; they are processed in the order in which they were installed; they use shared memory to communicate; An interrupt is processed over several stages; 32

Advanced Embedded Systems q The main components of ISR latency are: n n n q The main component of IST latency are: n n n q q The time required to turn off interrupts; The time needed for vectoring the interrupt, saving registers and accessing the ISR’s starting address; Both of these factors are depending on the CPU platform; The ISR latency; The time spent in the kernel; The thread scheduling time; The Windows CE scheduler provides two types of preemptive multitasking; In a more general purpose style of scheduling, a thread runs until the end of the time allocated; In a more real time style, a thread runs until a higher priority thread is ready to run; The WCE provides 256 priorities; within a priority level, threads run in round robin mode; 33

Advanced Embedded Systems Operating system overhead n n n Context switching time can be neglected if it is small compared to the execution time of the process; It becomes important for very short processes or at very high utilizations; The effects of context switching overhead can be studied using simulators; they model the CPU time for interrupts, operating system calls and so forth; they can be fed with traces and provide timing diagrams that show not only the total execution time but the actions performed by the CPU over time; Simulators can be very helpful in debugging real time systems; The fig. shows the effects of context switching overhead: q q q The results refer to a single processor system; 100 random tasks graphs and schedules for each were generated; Two design parameters were varied: 34

Advanced Embedded Systems n n q q The time of the interrupt service; The context switching time; The deadlines of the tasks were adjusted for providing different amounts of slack: no slack, 10 %, 20 % and 40 % slack; The results show that in each plot, the system is highly schedulable when the interrupt service and context switching times are small and heavily schedulable when the mentioned times are large; 35

Advanced Embedded Systems 36