Code Tuning Techniques Relatively small coding changes to

  • Slides: 38
Download presentation
Code Tuning Techniques • Relatively small coding changes to improve efficiency • Often a

Code Tuning Techniques • Relatively small coding changes to improve efficiency • Often a tradeoff with code size, memory, and readability • Often depends on compiler optimizations • These rules don’t always work – May need to test them Reference: Code Complete, Second Edition, Steve Mc. Connell, 2004

Stop When You Know the Answer neg. Input. Found = FALSE; for (i=0; i<count;

Stop When You Know the Answer neg. Input. Found = FALSE; for (i=0; i<count; i++){ if (input[i] < 0) { neg. Input. Found = TRUE; break; } } • Break from the loop when the item is found • 14% time savings in C++

Loops, Unswitching for (i=0; i<count; i++){ if (sum. Type == SUMTYPE_NET) net. Sum +=

Loops, Unswitching for (i=0; i<count; i++){ if (sum. Type == SUMTYPE_NET) net. Sum += amount[i]; else gross. Sum += amount[i]; } • Same comparison is performed each iteration

Loops, Unswitching if (sum. Type == SUMTYPE_NET) for (i=0; i<count; i++) net. Sum +=

Loops, Unswitching if (sum. Type == SUMTYPE_NET) for (i=0; i<count; i++) net. Sum += amount[i]; else for (i=0; i<count; i++) gross. Sum += amount[i]; • Pull comparison out of the loop • 19% time savings in C++

Loops, Unrolling i = 0; while(i < cnt){ a[i] = i; i++; } •

Loops, Unrolling i = 0; while(i < cnt){ a[i] = i; i++; } • Perform two iterations in one • Reduce loop overhead • 34% improvement in C++ i = 0; while(i < cnt - 1){ a[i] = i; a[i+1] = i+1; i += 2; } if (i == cnt-1) a[cnt-1] = cnt-1;

Minimize Work in Loops for (i=0; i<max; i++) a[i] = b[i] + str 1

Minimize Work in Loops for (i=0; i<max; i++) a[i] = b[i] + str 1 ->str 2 ->str 3 ->str 4; x = str 1 ->str 2 ->str 3 ->str 4; for (i=0; i<max; i++) a[i] = b[i] + x; • Dereferencing takes time • 19% time savings in C++

Sentinel Values • Add a marker to an array to indicate the end •

Sentinel Values • Add a marker to an array to indicate the end • Reduces loop overhead while ((!found) && (i<count)) if (item[i] == test. Val) found = TRUE else i++; • 3 comparisons needed item[count] = test. Val; while (item[i] != test. Val) i++;

Busy Loop Inside • Reduce the number of loop condition checks by putting the

Busy Loop Inside • Reduce the number of loop condition checks by putting the long loop inside the small loop for (col=0; col<100; col++) for (row=0; row<5; row++) sum += table[row][col]; • 500 inner + 100 outer = 600 loop checks for (row=0; row<5; row++) for (col=0; col<100; col++) sum += table[row][col]; • 500 inner + 5 outer = 505 loop checks • 33% time savings in C++

Strength Reduction • Replacing complex instructions with simple instructions – Multiplication with addition for

Strength Reduction • Replacing complex instructions with simple instructions – Multiplication with addition for (i=0; i<sale. Count; i++) comm[i]=(i+1) * rev * base * disc; inc. Comm = rev * base * disc; cum. Comm = inc. Comm; for (i=0; i<sale. Count; i++){ comm[i] = cum. Comm; cum. Comm += inc. Comm; } • 12% performance improvement in C++

Int vs. Float • Use integer operations rather than float – A form of

Int vs. Float • Use integer operations rather than float – A form of strength reduction • Loss of accuracy float i; for (i=0; i++; i < 99) a[i] = 0; • Change i to int • 71% performance improvement in C++

Explicit Caching • Remember results which will be reused in the future – Dynamic

Explicit Caching • Remember results which will be reused in the future – Dynamic Programming double hypotenuse (double A, double B){ return(sqrt(A*A + B*B)); } double hypotenuse (double A, double B){ if ((A==old. A) && (B==old. B)) return (old. Hyp); else { old. A=A; old. B=B; old. Hyp = sqrt(A*A+B*B); return (old. Hyp); } } • 74% performance improvement in C++

Initialize at Compile Time • Don’t call library functions when you can use a

Initialize at Compile Time • Don’t call library functions when you can use a constant unsigned int log 2(unsigned int x) { return (unsigned int) (log(x)/log(2)); } • log(2) never changes const double LOG 2 = 0. 69314718; unsigned int log 2(unsigned int x) { return (unsigned int) (log(x)/LOG 2); } • 38% performance improvement in C++ • May need tables in order to generalize

Avoid Slow Library Functions • Rewrite the function as a table lookup • Loss

Avoid Slow Library Functions • Rewrite the function as a table lookup • Loss of accuracy unsigned int log 2(unsigned int x) { if (x<2) return 0; if (x<3) return 1; if (x<4) return 2; … if (x<2147483648) return 30; return 31; } • 93% performance improvement in C++

Precompute Results • Store results in a table. Lookup at runtime • Need to

Precompute Results • Store results in a table. Lookup at runtime • Need to index the table return (loan. Amt / ( (1. 0 – exp((1. 0 + interest. Rate / 12. 0)), months)) / (interest. Rate / 12. 0))); • Need to compute integer index for the table interest. Ind = (interest. Rate – MINRATE) * 100; return (loan. Amt / loan. Table[interest. Ind][months]);

Operating Systems Allow the processor to perform several tasks at virtually the same time

Operating Systems Allow the processor to perform several tasks at virtually the same time Ex. Web Controlled Car with a camera • Car is controlled via the internet • Car has its own webserver (http: //mycar/) • Web interface allows user to control car and see camera images • Car also has “auto brake” feature to avoid collisions Fwd Left Right Back Web interface view

Multiple Tasks Assume that one microcontroller is being used At least four different tasks

Multiple Tasks Assume that one microcontroller is being used At least four different tasks must be performed 1. Send video data - This is continuous while a user is connected 2. Service motion buttons - Whenever button is pressed, may last seconds 3. Detect obstacles - This is continuous at all times 4. Auto brake - Whenever obstacle is detected, may last seconds Detect and Auto brake cannot occur together 3 tasks may need to occur concurrently

Prioritized Task Scheduling Sending Video Data and Detecting Obstacles must happen concurrently • Both

Prioritized Task Scheduling Sending Video Data and Detecting Obstacles must happen concurrently • Both tasks never complete Servicing Motion Buttons must be concurrent with Sending Video Data • Video should not stop when car moves CPU must switch between tasks quickly Some tasks must take priority • Auto Brake must have highest priority

Sharing Global Resources Global resources may be required by mulitple tasks • ADC, comparators,

Sharing Global Resources Global resources may be required by mulitple tasks • ADC, comparators, timers, I/O pins Shared access must be controlled to avoid interference Ex. Task 1 and Task 2 need to use the ADC • They cannot use the ADC at the same time • One task must wait for the other Operating system guarantees that resource conflicts are resolved

Layered OS Architecture Application Library Functions Microconrtoller System Calls Microconrtoller OS provides an abstraction

Layered OS Architecture Application Library Functions Microconrtoller System Calls Microconrtoller OS provides an abstraction to hide details of hardware Ex. delay(int) library function might setup a timer-based interrupt Using Library functions incurrs overhead

Processes vs. Threads Context of a task is its register values, program counter, and

Processes vs. Threads Context of a task is its register values, program counter, and stack All tasks have their own context Context switch is when on task stops and the next starts - Must save the old context and load the new - This is time consuming OS typically gives tasks access to memory (i. e malloc) Processes each have their own private memory - Requires memory protection Threads share memory RTOS usually implement tasks as threads

Memory Management Programs can request memory dynamically with malloc(); int valarr[10]; int *valarr; valarr

Memory Management Programs can request memory dynamically with malloc(); int valarr[10]; int *valarr; valarr = (int *) malloc(10 * sizeof(int)); Dynamically allocated memory must be explicitly released - statically allocated memory is released on function return free(valarr); Dynamic memory allocation is flexible but harder to deal with - Must free the memory manually - Cannot access freed memory

OS Memory Management A program cannot know the dynamic memory allocation - Which memory

OS Memory Management A program cannot know the dynamic memory allocation - Which memory locations are used and which are available? Operating system keeps tables describing which memory locations are available The program must request memory from the OS - OS may deny request if there is no memory available OS also protects memory - Enforce memory access permissions

Scheduler OS manages the execution state of each task 3 main states 1. Running

Scheduler OS manages the execution state of each task 3 main states 1. Running – The task is currently running 2. Ready – The task is not running but it is ready to run 3. Blocked – The task is not ready because it is waiting for an event Only one task can be running at a time A task can only run if it is first ready (not blocked) Scheduler must keep track of the state of each task Scheduler must decide which ready task should run

Preemption A non-preemptive scheduler allows a task to run until it gives up control

Preemption A non-preemptive scheduler allows a task to run until it gives up control of the CPU - Task may call a library function (sleep) to quit - Needs to be awakened by an event, like an interrupt - Not much flexibility for OS to meet deadlines A preemptive scheduler allows the OS to stop a running task and start another task - OS has the power to influence the completion of tasks - OS must be awakened periodically to make scheduling decisions - May implement the OS kernel as a high priority timerbased interrupt

Scheduling Algorithms Round-Robin: Scheduler keeps an ordered list of ready tasks First task is

Scheduling Algorithms Round-Robin: Scheduler keeps an ordered list of ready tasks First task is assigned a fixed-size time slice to execute After time slice is done, task is placed at the end of the list and next task executes for its time slice Context switch time Very simple, no priorities Task execution Task 1 Task 2

Prioritized Scheduling Fixed Priority Preemptive: Scheduler keeps an ordered list of ready tasks, ordered

Prioritized Scheduling Fixed Priority Preemptive: Scheduler keeps an ordered list of ready tasks, ordered by priority First task is assigned a fixed-size time slice to execute After time slice is done, scheduler chooses highest priority ready task for next time slice Next task might be the same as the previous task, if it is high priority High Priority Low priority Starvation may occur

Atomic Updates Tasks may need to share global data and resources For some data,

Atomic Updates Tasks may need to share global data and resources For some data, updates must be performed together to make sense Ex. Our system samples the level of water in a tank_level is level of water time_updated is last update time tank_level = // Result of computation time_updated = // Current time These updates must occur together for the data to be consistent Interrupt could see new tank_level with old time_updated

Mutual Exclusion While one task updates the shared variables, another task cannot read them

Mutual Exclusion While one task updates the shared variables, another task cannot read them Task 1 tank_level = ? ; time_updated = ? ; Task 2 printf (“%i %i”, tank_level, time_updated); Two code segments should be mutually exclusive If Task 2 is an interrupt, it must be disabled

Semaphores A semaphore is a flag which indicates that execution is safe May be

Semaphores A semaphore is a flag which indicates that execution is safe May be implemented as a binary variable, 1 continue, 0 wait Take. Semaphore(): If semaphore is available (1) then take it (set to 0) and continue If semaphore is note available (0) then block until it is available Release. Semaphore(): Set semaphore to 1 so that another task can take it Only one task can have a semaphore at one time

Critical Regions Task 1 Take. Semaphore(); tank_level = ? ; time_updated = ? ;

Critical Regions Task 1 Take. Semaphore(); tank_level = ? ; time_updated = ? ; Release. Semaphore(); Task 2 Take. Semaphore(); printf (“%i %i”, tank_level, time_updated); Release. Semaphore(); Semaphores are used to protect critical regions Two critical regions sharing a semaphore are mutually exclusive Each critical region is atomic, cannot be separated

POSIX Threads (Pthreads) • IEEE POSIX 1003. 1 c: Standard for a C language

POSIX Threads (Pthreads) • IEEE POSIX 1003. 1 c: Standard for a C language API for thread control • All pthreads in a process share, Process ID Heap File descriptors Shared libraries • Each pthread maintains its own, Stack pointer Registers Scheduling properties (such as policy or priority) Set of pending and blocked signals

Thread-safeness • Ability to execute multiple threads concurrently without making shared data inconsistent •

Thread-safeness • Ability to execute multiple threads concurrently without making shared data inconsistent • Don’t use library functions that aren’t thread-safe

Pthreads API • Four types of functions in the API 1. Thread management: Routines

Pthreads API • Four types of functions in the API 1. Thread management: Routines that work directly on threads creating, detaching, joining, etc. 2. Mutexes: Routines that deal with synchronization 3. Condition variables: Routines that address communications between threads that share a mutex. 4. Synchronization: Routines that manage read/write locks and barriers. • • pthreads. h header file needs to be included in source file gcc –pthread to compile it

Thread Management • pthread_create Creates a new thread and makes it executable Arguments −

Thread Management • pthread_create Creates a new thread and makes it executable Arguments − − Thread: pthread_t pointer to return result Attr: Initial attributes of the thread Start_routine: Code for the thread to run Arg: Argument for the code (void *) • pthread_exit Terminate a thread Does not close files on exit

Thread Management int main (int argc, char *argv[]) { pthread_t threads[NUM_THREADS]; int rc; long

Thread Management int main (int argc, char *argv[]) { pthread_t threads[NUM_THREADS]; int rc; long t; for(t=0; t<NUM_THREADS; t++){ printf("In main: creating thread %ldn", t); rc = pthread_create(&threads[t], NULL, Print. Hello, (void *)t); if (rc){ printf("ERROR; return code is %dn", rc); exit(-1); } } pthread_exit(NULL); } • Creates a set of threads, all running Print. Hello • Takes an argument, the thread number

Thread Management void *Print. Hello(void *threadid) { long tid; tid = (long)threadid; printf("Hello World!

Thread Management void *Print. Hello(void *threadid) { long tid; tid = (long)threadid; printf("Hello World! It's me, thread #%ld!n", tid); pthread_exit(NULL); } • Code run by each thread • Prints its own ID number

Joining Threads • Joining threads is a way of performing synchronization • Master blocks

Joining Threads • Joining threads is a way of performing synchronization • Master blocks on pthread_join until worker exits • Worker must be made joinable via its attributes

Joining Example int main (int argc, char *argv[]) { pthread_t a. Thread; pthread_attr_t attr;

Joining Example int main (int argc, char *argv[]) { pthread_t a. Thread; pthread_attr_t attr; int rc, *t=0; void *status; pthread_attr_init(&attr); pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE); rc = pthread_create(&thread[t], &attr, Busy. Work, (void *)t); pthread_attr_destroy(&attr); … // Do something rc = pthread_join(thread[t], &status); • pthread_attr_* define attributes of the thread (make it joinable) • pthread_attr_destroy frees the attribute structure