Daddy Where do instructions come from Program Sequencer

  • Slides: 26
Download presentation
Daddy! -- Where do instructions come from? Program Sequencer controls program flow and provides

Daddy! -- Where do instructions come from? Program Sequencer controls program flow and provides the next instruction to be executed Straight line code, jumps and loops Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada / 26

Tackled today n Program sequencer n Linear flow of instruction n Why not discuss

Tackled today n Program sequencer n Linear flow of instruction n Why not discuss idle instruction here? n Jumps n Software loops – normal and more efficient “down -counting” loops n Special Motorola MC 68 XXX software loop instructions n Loops – hardware loops n n n 2/15/2022 Subroutines -- – next lecture Interrupts and Exceptions – next lecture Idle – next lecture Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada 2

Example code n Look at moving elements from array foo. Here[ ] to far.

Example code n Look at moving elements from array foo. Here[ ] to far. Away[ ] using various instruction modes Straight line coding n In a loop – please make sure that you understand the terminology – exam question n Software loop Hardware loop In a subroutine n Via an interrupt n 2/15/2022 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada 3

Linear program flow n Program flow on the chip is mainly linear n The

Linear program flow n Program flow on the chip is mainly linear n The processor fetches and executes program instructions sequentially n Non sequential structures (instructions and supporting registers) direct the processor to execute an instruction that is not the next sequential address 2/15/2022 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada 4

Array movement. extern _foo. Here, _far. Away; P 0. H = _foo. Here; P

Array movement. extern _foo. Here, _far. Away; P 0. H = _foo. Here; P 0. L = _foo. Here; P 1. H = _far. Away; P 1. L = _far. Away; extern long foo. Here[5], far. Away[5] R 0 = [P 0]; [P 1] = R 0; far. Away[0] = foo. Here[0]; R 0 = [P 0 + ? ]; [P 1 + ? ] = R 0; far. Away[1] = foo. Here[1]; R 0 = [P 0 + ? ? ]; [P 1 + ? ? ] = R 0; far. Away[2] = foo. Here[2]; far. Away[3] = foo. Here[3]; far. Away[4] = foo. Here[4]; Question – What goes in the place of the ? and ? ? when doing loop or when doing [P 1 + ? ] = R 0; W[P 1 + ? ] = R 1; B[P 1 + ? ] = R 2; ANSWER: -- Find out the correct answer – and make sure you do it correctly all the time ANSWER: -- Why worry? DO THE CODE a different way and don’t worry 2/15/2022 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada 5

Better solution – let the processor worry about getting the indexing correct! extern long

Better solution – let the processor worry about getting the indexing correct! extern long foo. Here[5], far. Away[5] . extern _foo. Here; . extern _far. Away; P 0. H = _foo. Here; P 0. L = _foo. Here; P 1. H = _far. Away; P 1. L = _far. Away; P 1. H = _far. Away; P 0. L = _far. Away; R 0 = [P 0++]; [P 1++] = R 0; R 0 = [P 0]; [P 1] = R 0; far. Away[0] = foo. Here[0]; R 0 = [P 0++]; [P 1++] = R 0; R 0 = [P 0 + ? ]; [P 1 + ? ] = R 0; far. Away[1] = foo. Here[1]; R 0 = [P 0++]; [P 1++] = R 0; R 0 = [P 0 + ? ? ]; [P 1 + ? ? ] = R 0; far. Away[2] = foo. Here[2]; Remember -- P 0 will end up pointing PAST the end of the array 2/15/2022 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada far. Away[3] = foo. Here[3]; far. Away[4] = foo. Here[4]; 6

The C++ code we actually developed. extern _foo. Here; . extern _far. Away; extern

The C++ code we actually developed. extern _foo. Here; . extern _far. Away; extern long foo. Here[5]; extern far. Away[5]; extern long foo. Here[5], far. Away[5]; P 0. H = _foo. Here; P 0. L = _foo. Here; long *pt 0; pt 0 = foo. Here; (Actually pt 0 = &foo. Here[0]; ) P 1. H = _far. Away; P 1. L = _far. Away; long *pt 1; pt 1 = far. Away; (Actually pt 1 = &ffar. Away[0]; ) R 0 = [P 0++]; [P 1++] = R 0; *pt 1++ = *pt 0++; far. Away[0] = foo. Here[0]; R 0 = [P 0++]; [P 1++] = R 0; *pt 1++ = *pt 0++; far. Away[1] = foo. Here[1]; R 0 = [P 0++]; [P 1++] = R 0; *pt 1++ = *pt 0++; far. Away[2] = foo. Here[2]; Program sequencer Remember -- P 0 will end up pointing PAST the, end far. Away[3] = foo. Here[3]; Copyright M. Smith, ECE, University of Calgary, of the array Canada 2/15/2022 7

IDLE – Seems the next simplest! n IDLE instruction is part of a sequence

IDLE – Seems the next simplest! n IDLE instruction is part of a sequence of instructions to place the processor in a quiescent state so that something can happen n External system can change clock frequencies – power saving – high clock frequency can mean high power consumption n A ssync instruction MUST immediately follow the idle instruction n Getting out of the idle instruction sequence needs an understanding of interrupts n Will discuss more about idle later 2/15/2022 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada More info in instruction ref. manual p 11. 3 8

Jump instruction n Both JUMP and CALL instructions transfer program flow to another memory

Jump instruction n Both JUMP and CALL instructions transfer program flow to another memory location n The difference between JUMP and CALL is that the CALL automatically loads the return address into the RETS register. The return address is the next sequenctal address after the CALL instruction. n JUMPs can be conditional (depends on CC bit in ASTAT register. n Conditional JUMP instructions use static branch prediction to reduce branch latency caused by the length of the Blackfin instruction pipeline. n n What does “static” branch prediction mean? What is “dynamic” branch prediction? n When possible the assembler will use the short relative jump. The target instruction must be within -4096 to +4094 bytes of the current instruction. 2/15/2022 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada 9

Array movement. extern _foo. Here, _far. Away; P 0. H = _foo. Here; P

Array movement. extern _foo. Here, _far. Away; P 0. H = _foo. Here; P 0. L = _foo. Here; P 1. H = _far. Away; P 1. L = _far. Away; R 0 = [P 0]; [P 1] = R 0; extern long foo. Here[5], far. Away[5] for (int num = 0; num < 5 ; num++) { R 0 = [P 0 + ? ]; [P 1 + ? ] = R 0; R 0 = [P 0 + ? ? ]; [P 1 + ? ? ] = R 0; far. Away[num] = foo. Here[num]; } …… and so on …. Linear code – Straight line coding is STILL a viable solution for solving a loop. n. You don’t waste any time in incrementing a loop counter n. You don’t waste time in checking a loop counter n. You don’t waste time upsetting the processor instruction pipeline by jumping back and throwing away all prefetched instructions. 2/15/2022 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada 10

Standard software Loop The C++ code we actually developed. extern _foo. Here; . extern

Standard software Loop The C++ code we actually developed. extern _foo. Here; . extern _far. Away; P 0. H = _foo. Here; P 0. L = _foo. Here; P 1. H = _far. Away; P 1. L = _far. Away; R 1 = 0; R 2 = 5; LOOP: CC = R 2 <= R 1; IF CC JUMP LOOP_END; extern long foo. Here[5]; extern far. Away[5]; long *pt 0; pt 0 = foo. Here; long *pt 1; pt 1 = far. Away; int num = 0; for ( /* empty */; num < 5 ; num++) { R 0 = [P 0++]; [P 1++] = R 0; R 1 += 1; JUMP LOOP; LOOP_END: outside loop 2/15/2022 extern long foo. Here[5], far. Away[5]; for (int num = 0; num < 5 ; num++) { *pt 1++ = *pt 0++; } far. Away[num] = foo. Here[num]; } PREDICTED NOT TAKEN Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada 11

Program Loops n Most programs have 1 or 2 loops embedded inside each other,

Program Loops n Most programs have 1 or 2 loops embedded inside each other, occasionally 3 or more n For all images in a list For each row in each image For each column (pixel) in each row For each colour in each pixel n Important to get the maximum efficiency of the instructions that are executed the most often! 2/15/2022 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada 12

Efficiency of Standard software Loop Suppose we go round the loop N times 2

Efficiency of Standard software Loop Suppose we go round the loop N times 2 loop control instructions outside of loop + 4 * N loop control instructions inside the loop 2 * N “useful instructions” inside loop + 4 useful set up instructions Loop efficiency = 4+2*N ------------- * 100% 4+2*N+2+4*N . extern _foo. Here; . extern _far. Away; P 0. H = _foo. Here; P 0. L = _foo. Here; P 1. H = _far. Away; P 1. L = _far. Away; R 1 = 0; R 2 = 5; LOOP: CC = R 2 <= R 1; IF CC JUMP LOOP_END; extern long foo. Here[5]; extern far. Away[5]; long *pt 0; pt 0 = foo. Here; long *pt 1; pt 1 = far. Away; int num = 0; for ( /* empty */; num < 5 ; num++) { R 0 = [P 0++]; [P 1++] = R 0; R 1 += 1; JUMP LOOP; LOOP_END: outside loop *pt 1++ = *pt 0++; } If N is large 2*N ------ * 100% = 33% 6*N 2/15/2022 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada 13

Down-counting software loop. extern _foo. Here; . extern _far. Away; P 0. H =

Down-counting software loop. extern _foo. Here; . extern _far. Away; P 0. H = _foo. Here; P 0. L = _foo. Here; P 1. H = _far. Away; P 1. L = _far. Away; R 1 = ; CC = R 1 <= 0; IF CC JUMP DO_WHILE_END; DO_WHILE: R 0 = [P 0++]; [P 1++] = R 0; R 1 += -1; CC = R 1 <= 0; IF !CC JUMP DO_WHILE (BP); DO_WHILE_END: outside loop 2/15/2022 extern long foo. Here[5]; extern far. Away[5]; extern long foo. Here[5], far. Away[5]; long *pt 0; pt 0 = foo. Here; long *pt 1; pt 1 = far. Away; int num = 5 ; if (num > 0) do { // Test needed if // exact value of // num not known for (int num = 0; num < 5 ; num++) { *pt 1++ = *pt 0++; } while ( (--num) > 0) Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada far. Away[num] = foo. Here[num]; } 14

Efficiency of Down-counting software Loop Suppose we go round the loop N times 3

Efficiency of Down-counting software Loop Suppose we go round the loop N times 3 loop control instructions outside of loop + 3 * N loop control instructions inside the loop 2 * N “useful instructions” inside loop + 4 useful set up instructions Loop efficiency = 4+2*N ------------- * 100% 4+2*N+3+3*N If N is large 2*N ------ * 100% = 40% 5*N 2/15/2022 . extern _foo. Here; . extern _far. Away; P 0. H = _foo. Here; P 0. L = _foo. Here; P 1. H = _far. Away; P 1. L = _far. Away; R 1 = ; CC = R 1 <= 0; IF CC JUMP DO_WHILE_END; DO_WHILE: R 0 = [P 0++]; [P 1++] = R 0; R 1 += -1; CC = R 1 <= 0; IF !CC JUMP DO_WHILE (BP); DO_WHILE_END: outside loop Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada extern long foo. Here[5]; extern far. Away[5]; long *pt 0; pt 0 = foo. Here; long *pt 1; pt 1 = far. Away; int num = 5 ; if (num > 0) do { // Test needed if // exact value of // num not known *pt 1++ = *pt 0++; } while ( (--num) > 0) 15

Efficient loops n Motorola MC 68 XXX has specialized loop instruction – essentially n

Efficient loops n Motorola MC 68 XXX has specialized loop instruction – essentially n n Decrement the counter (data register) and start the jump occurring While the decrement is occurring, test if OLD COUNTER WAS LESS THAN ZERO. If old counter less than zero then stop the jump n Motorola has specialized memory operations WHICH TAKE MANY PROCESSOR CYCLES n Motorola has instruction [P 1++] = [P 0++] which has all the following steps – each taking 4 clock cycles n n n 2/15/2022 n Fetch instruction intern. Reg. L = W[P 0]; TOTAL OF intern. Reg. H = W[P 0+2]; 24 cycles at 8 MHz W[P 1] = intern. Reg. L; W[P 1+2] = intern. Reg. H; P 0 += 4; Program sequencer , P 1 += 4; Copyright M. Smith, ECE, University of Calgary, Canada 16

Efficiency of “Motorola-style” Down-counting software Loop with specialized branch instructions Suppose we go round

Efficiency of “Motorola-style” Down-counting software Loop with specialized branch instructions Suppose we go round the loop N times 3 loop control instructions outside of loop + 1 * N loop control instructions inside the loop 1 * N “useful instructions” inside loop + 2 useful set up instructions . extern _foo. Here; . extern _far. Away; P 0 = _foo. Here; P 1 = _far. Away; If N is large 5*N ------ * 100% = 84% 6*N 2/15/2022 long *pt 0; pt 0 = foo. Here; long *pt 1; pt 1 = far. Away; R 1 = (5 – 1); CC = R 1 < 0; IF CC JUMP DO_WHILE_END; DO_WHILE: [P 1++] = [P 0++]; Loop efficiency = 6+5*N ------------- * 100% 6+5*N+4+1*N extern long foo. Here[5]; extern far. Away[5]; IF (R 1 < 0 ) THEN CONTINUE OTHERWISE (R 1 += -1) AND JUMP DO_WHILE (BP); DO_WHILE_END: outside loop Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada int num = 5 ; if (num > 0) do { // Test needed if // exact value of // num not known *pt 1++ = *pt 0++; } while ( (--num) > 0) NOTE: NOT AVAILABLE ON BLACKFIN 17

Blackfin Hardware Loops n Blackfin supports a mechanism for zero-overhead looping n Common design

Blackfin Hardware Loops n Blackfin supports a mechanism for zero-overhead looping n Common design decision – the two inner-most loops are the most often executed – so make those the most efficient n The program sequencer contains TWO loop units, each containing three registers n Loop Top registers – LT 0, LT 1 n Loop Bottom registers – LB 0, LB 1 n Loop Count registers – LC 0, LC 1 2/15/2022 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada 18

Blackfin Hardware Loops n The program sequencer contains TWO loop units, each containing three

Blackfin Hardware Loops n The program sequencer contains TWO loop units, each containing three registers n Loop Top registers – LT 0, LT 1 n Loop Bottom registers – LB 0, LB 1 n Loop Count registers – LC 0, LC 1 n When that when an instruction at address X is executed (meaning PC = = X) n n n and if the address X matches the contents of LBn (meaning PC = = LBn) and the counter register is greater than equal to 2 (LCx >= 2) THEN the next instruction will be taken from address LTn n Note that if two loops end on the same instruction then loop 1 has the highest priority 2/15/2022 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada 19

Pseudo code example Set LT 0 = first instruction in loop -- LOOP START

Pseudo code example Set LT 0 = first instruction in loop -- LOOP START Set LB 0 = last instruction in loop; -- LOOP END: Set LC 0 = 5; LOOP_START: R 0 = [P 0++]; LOOP_END: [P 1++] = R 0; n Manual (P 4 -16) says Each loop register can be loaded individually with a register transfer, but this incurs a significant overhead if the loop count is non-zero (the loop is active) at the time of the transfer. n That sounds unpleasant – so lets find an easier way Manual (P 4 -16) says The LSETUP instruction can be used to load all three registers of a loop unit at the same time 2/15/2022 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada 20

Efficiency of Standard software Loop Suppose we go round the loop N times 2

Efficiency of Standard software Loop Suppose we go round the loop N times 2 loop control instructions outside of loop + 4 * N loop control instructions inside the loop 2 * N “useful instructions” inside loop + 4 useful set up instructions Loop efficiency = 4+2*N ------------- * 100% 4+2*N+2+4*N If N is large 2*N ------ * 100% = 33% 6*N 2/15/2022 . extern _foo. Here; . extern _far. Away; P 0. H = _foo. Here; P 0. L = _foo. Here; P 1. H = _far. Away; P 1. L = _far. Away; R 1 = 0; R 2 = 5; LOOP: CC = R 2 <= R 1; IF CC JUMP LOOP_END; extern long foo. Here[5]; extern far. Away[5]; long *pt 0; pt 0 = foo. Here; long *pt 1; pt 1 = far. Away; int num = 0; for ( /* empty */; num < 5 ; num++) { R 0 = [P 0++]; [P 1++] = R 0; R 1 += 1; JUMP LOOP; LOOP_END: outside loop *pt 1++ = *pt 0++; } WARNING: LOOP_END is an instruction that IS NOT EXECUTED INSIDE THE SOFTWARE LOOP Program sequencer , 21 Copyright M. Smith, ECE, University of Calgary, Canada

Efficiency of Hardware Loop Suppose we go round the loop N times 2 loop

Efficiency of Hardware Loop Suppose we go round the loop N times 2 loop control instructions outside of loop + 0 loop control instructions inside the loop – There are some pipeline overhead issues on leaving loop 2 * N “useful instructions” inside loop + 4 useful set up instructions Loop efficiency = 4+2*N ------------- * 100% 4+2*N+2 If N is large 2*N -----2/15/2022 * 100% = 100% 2*N . extern _foo. Here; . extern _far. Away; P 0. H = _foo. Here; P 0. L = _foo. Here; P 1. H = _far. Away; P 1. L = _far. Away; P 2 = 5; LSETUP( LOOP_START, LOOP_END) LC 1 = P 2; extern long foo. Here[5]; extern far. Away[5]; long *pt 0; pt 0 = foo. Here; long *pt 1; pt 1 = far. Away; int num = 0; for ( /* empty */; num < 5 ; num++) { LOOP_START: R 0 = [P 0++]; LOOP_END: [P 1++] = R 0; *pt 1++ = *pt 0++; } OUTSIDE_LOOP: WARNING: LOOP_END is an instruction that IS EXECUTED Program sequencer , INSIDE THE HARDWARE LOOP Copyright M. Smith, ECE, University of Calgary, Canada 22

Big warning SOFTWARE LOOP R 1 = 0; R 2 = 5; LOOP: CC

Big warning SOFTWARE LOOP R 1 = 0; R 2 = 5; LOOP: CC = R 2 <= R 1; IF CC JUMP LOOP_END; R 0 = [P 0++]; [P 1++] = R 0; HARDWARE LOOP_START: R 0 = [P 0++]; LOOP_END: [P 1++] = R 0; OUTSIDE_LOOP: R 1 += 1; JUMP LOOP; LOOP_END: outside loop 2/15/2022 LOOP_END Always executed in hardware loop Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada 23

Warning and speed issues n The distance between LSETUP instruction and LOOP_START instruction MUST

Warning and speed issues n The distance between LSETUP instruction and LOOP_START instruction MUST NOT BE MORE THAN 30 bytes (otherwise the offset description will not fit into the instruction). n There is a 4 clock cycle advantage if LSETUP is the instruction immediately before the LOOP_START instruction n The distance between LSETUP instruction and LOOP_END instruction MUST NOT BE MORE THAN 2046 bytes (otherwise the offset description will not fit into the instruction) n The processor supports a four-location instruction loop buffer. If the loop code contains four or fewer instructions, then no fetched to instruction memory are necessary for any number of loop iterations because the instructions are stored locally. n n n 2/15/2022 This eliminates instruction fetch time (especially important when accessing external memory) Really efficient loops are no more than 4 long. Have requested information if 4 instructions or 4 instructions which can be highly parallel (like 16 instructions in a non-parallel mode) Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada 24

Tackled today n Program sequencer n Linear flow of instruction n Why not discuss

Tackled today n Program sequencer n Linear flow of instruction n Why not discuss idle instruction here? n Jumps n Software loops – normal and more efficient “down -counting” loops n Special Motorola MC 68 XXX software loop instructions n Loops – hardware loops n n n 2/15/2022 Subroutines -- – next lecture Interrupts and Exceptions – next lecture Idle – next lecture Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada 25

n Information taken from Analog Devices On-line Manuals with permission http: //www. analog. com/processors/resources/technical.

n Information taken from Analog Devices On-line Manuals with permission http: //www. analog. com/processors/resources/technical. Library/manuals/ n Information furnished by Analog Devices is believed to be accurate and reliable. However, Analog Devices assumes no responsibility for its use or for any infringement of any patent other rights of any third party which may result from its use. No license is granted by implication or otherwise under any patent or patent right of Analog Devices. Copyright Analog Devices, Inc. All rights reserved. 2/15/2022 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada 26