Moving Arrays 1 Completion of ideas needed for

  • Slides: 30
Download presentation
Moving Arrays -- 1 Completion of ideas needed for a general and complete program

Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

Tackled today ¢ Declaring and initializing arrays off the stack – Review and a

Tackled today ¢ Declaring and initializing arrays off the stack – Review and a little bit of new l l ¢ Declaring arrays and variables on the stack – Review and a little bit of new l ¢ Useful for background DMA tasks Useful for minimizing total memory used in non-general program Re-entrant code and thread safe Demonstrating memory to memory DMA 2 / 29 DMA , Copyright M. Smith, ECE, University of Calgary, Canada 9/23/2021

Declaring fixed arrays in memory – not on the stack short foo_startarray[40]; short far_finalarray[40];

Declaring fixed arrays in memory – not on the stack short foo_startarray[40]; short far_finalarray[40]; void Half. Wave. Rectify. ASM( ) { // Take the signal from foo_startarray[ ] and rectify the signal // Half wave rectify – if > 0 keep the same; if < 0 make zero // Full wave rectify – if > 0 keep the same; if < 0 then abs value // Rectify startarray[ ] and place result in finalarray[ ] for (int count = 0; count < 40; count++) { if (foo_startarray[count] < 0) far_finalarray[count] = 0; else far_finalarray[count] = foo_startarray[count]; } } The program code is the same – but the data part is not 3 / 29 DMA , Copyright M. Smith, ECE, University of Calgary, Canada 9/23/2021

First attempt to get correct answer . section data 1 Tells linker to place

First attempt to get correct answer . section data 1 Tells linker to place this stuff in memory map location data 1. align 4 – adjust address to end in 0, 4, 8 or C We know processor works best when we start things on a boundary between groups of 4 bytes [N * 2] We need N short ints We know the processor works with address working in bytes. Therefore need N * 2 bytes sounds sensible 4 / 29 DMA , Copyright M. Smith, ECE, University of Calgary, Canada 9/23/2021

“wrong approach” – does not match with what C / C++ does with memory

“wrong approach” – does not match with what C / C++ does with memory 20 bytes (16 bits) for 5 / 29 DMA , Copyright M. Smith, ECE, University of Calgary, Canada N short value in C++ = N * 2 bytes 9/23/2021

“Correct approach was NOT what I expected” ASM Array with space for N long

“Correct approach was NOT what I expected” ASM Array with space for N long ints ASM Array with space for N short ints ASM Array with space 6 / 29 for N chars . var array. ASM[N]; var array. ASM[N / 2]; better . byte 4 array. ASM[N]; . byte 2 array. ASM[N}; DMA var array. ASM[N / 4]; , better. byte Copyright M. Smith, ECE, University of Calgary, Canada array. ASM[N]; 9/23/2021

Better answer is “Look at the assembler manual” 7 / 29 DMA , Copyright

Better answer is “Look at the assembler manual” 7 / 29 DMA , Copyright M. Smith, ECE, University of Calgary, Canada 9/23/2021

Improving what we did before Big warning – external array initialization occurs on “reload”

Improving what we did before Big warning – external array initialization occurs on “reload” of your program code and NOT on “restart” of your program code (WHY? ) Understanding why this is true and why it is a problem will solve DMA , 9/23/2021 8 / 29 many issues. Copyright when. M. programming Smith, ECE, University of Calgary, Canada

9 / 29 DMA , Copyright M. Smith, ECE, University of Calgary, Canada 9/23/2021

9 / 29 DMA , Copyright M. Smith, ECE, University of Calgary, Canada 9/23/2021

When DMA might be useful -- Video manipulation ¢ Program Wait for picture 1

When DMA might be useful -- Video manipulation ¢ Program Wait for picture 1 to come in – video-in l Process picture 1 – lots of mathematics perhaps l Wait for picture 1 to be transmitted – video out l ¢ Spending a lot of time waiting rather than doing 10 / 29 DMA , Copyright M. Smith, ECE, University of Calgary, Canada 9/23/2021

When DMA might be useful -- Double Buffering ¢ Program 1. 2. 3. 4.

When DMA might be useful -- Double Buffering ¢ Program 1. 2. 3. 4. 5. 6. 7. Wait for picture 2 memory to fill – video-in Picture 3 comes into memory – background DMA task from input Process picture 2 – place result into picture 0 location Picture 4 comes into memory – background DMA task from input Process picture 3 – place result into picture 1 location Transmit picture 0 – background DMA task to output Picture 0 comes into memory – background DMA task from input Process picture 4 – place result into picture 2 location Transmit picture 1– background DMA task to output Picture 1 comes into memory – background DMA task from input Process picture 0 – place result into picture 3 location Transmit picture 2 – background DMA task to output Picture 2 comes into memory – background DMA task from input Process picture 1 – place result into picture 4 location Transmit picture 3– background DMA task to output REPEAT STEPS FOR EVER 11 / 29 DMA , Copyright M. Smith, ECE, University of Calgary, Canada 9/23/2021

We are only going to look at a simple DMA task ¢ ¢ 1)

We are only going to look at a simple DMA task ¢ ¢ 1) 2) 3) 4) Normal code when trying to move data from one location to another Number of simple examples in Lab. 3 using SPI interface P 0 address of start_array[0]; P 1 address of final_array[0]; R 0 number of data items to be transferred needed to transfer INSTRUCTION BUS STALLS R 1 How many values already transferred EVERY TIME THE CODE JUMPS -- LOSE 4 CYCLES 5) R 1 = 0; LOOP: 6) CC = R 0 <= R 1 7) IF CC JUMP DONE: 8) R 2 = [P 0++]; 9) [P 1++] = R 2; 10) JUMP LOOP; DONE: Must wait to Do something else 12 / 29 VERY BIG PIPELINE LATENCY ISSUES MANY INTERNAL PROCESSOR STALLS ON DATA BUS WHILE WAIT FOR R 2 TO BE READ, STORED and then TRANSMITTED DMA , Copyright M. Smith, ECE, University of Calgary, Canada 9/23/2021

We are only going to look at a simple DMA task DMA special hardware

We are only going to look at a simple DMA task DMA special hardware that works without the processor 1) DMA_source_address_register address of start_array[0]; 2) DMA_destination_address_register address of final_array[0]; 3) DMA_max_count_register max-value needed to transfer 4) DMA_count_register How many values already transferred ¢ R 1 = 0; LOOP: CC = R 0 <= R 1 IF CC JUMP DONE: R 2 = [P 0++]; [P 1++] = R 2; JUMP LOOP; DONE: Do something else 13 / 29 5) DMA_enable = true DMA transfer happen in background Miminized pipeline issues Processor can do something else immediately while DMA hardware handles all the memory transfers WITHOUT PROCESSOR HELP. DMA , Copyright M. Smith, ECE, University of Calgary, Canada 9/23/2021

Write some tests so we know how to proceed -- Test 1 Is DMA

Write some tests so we know how to proceed -- Test 1 Is DMA useful when the arrays being moved are in the processor’s internal memory and placed on the stack as with this code 14 / 29 DMA , Copyright M. Smith, ECE, University of Calgary, Canada 9/23/2021

Write some test so we know how to proceed -- Test 2 IS DMA

Write some test so we know how to proceed -- Test 2 IS DMA useful when both the arrays are placed in external memory SDRAM is needed for large video images SDRAM -- MANY MEGS AVAILABLE SDRAM addresses hardcoded in this example 15 / 29 DMA , Copyright M. Smith, ECE, University of Calgary, Canada 9/23/2021

Write some test so we know how to proceed -- Test 3 Most probable

Write some test so we know how to proceed -- Test 3 Most probable way to use DMA – Store video arrays in SLOW external memory Move to FAST internal memory for processing, put result back into external SDRAM addresses hardcoded in this example WAIL -- Can use compiler section (“SDRAM”) syntax 16 / 29 DMA , Copyright M. Smith, ECE, University of Calgary, Canada 9/23/2021

Some results Code details later L 1 Internal memory L 1 DMA Compiler Debug

Some results Code details later L 1 Internal memory L 1 DMA Compiler Debug Mode 8748 Compiler Release Mode 625 6579 SDRAM external 39132 6477 DMA slower 28200 SDRAM DMA 12175 12090 SDRAM L 1 DMA 5265 4836 SDRAM L 1 DMA 9792, DMA 17 / 29 L 1 SDRAM DMA Copyright M. Smith, ECE, University of Calgary, Canada 9276 9/23/2021

Memory to memory move Debug Code 18 / 29 DMA , Copyright M. Smith,

Memory to memory move Debug Code 18 / 29 DMA , Copyright M. Smith, ECE, University of Calgary, Canada 9/23/2021

Review for final A) What happened here? B) What happened here? C) What happened

Review for final A) What happened here? B) What happened here? C) What happened here? D) Why did this happen? E) What happened here? F) Determine loop efficiency in terms of instructions in terms of cycles / read_write op 19 / 29 DMA , Copyright M. Smith, ECE, University of Calgary, Canada 9/23/2021

Answer questions A ¢B ¢C ¢D ¢E ¢ 20 / 29 DMA , Copyright

Answer questions A ¢B ¢C ¢D ¢E ¢ 20 / 29 DMA , Copyright M. Smith, ECE, University of Calgary, Canada 9/23/2021

Review for final Internal memory to Internal memory F) Determine loop efficiency in terms

Review for final Internal memory to Internal memory F) Determine loop efficiency in terms of cycles / read_write op internal memory -> internal memory size was 300 Useful reads 300 Useful writes 300 Cycles 8748 as measured 8748 / 600 = 14. 58 Why not an exact number? Instructions in loop? 19 Total # of reads / write 9 / loop 2700 read / writes – around 3 cycles 21 / 29 DMA , Copyright M. Smith, ECE, University of Calgary, Canada 9/23/2021

Review for final SDRAM to SDRAM F) Determine loop efficiency in terms of cycles

Review for final SDRAM to SDRAM F) Determine loop efficiency in terms of cycles / read_write op SDRAM external -> SDRAM memory Useful reads / writes 300 each Cycles 39132 as measured 39132 / 600 = 65. 22 Why not an exact number? Instructions in loop? 19 Total # of reads / write 9 / loop 7 * 300 read / writes internal 2 * 300 read / writes external Time r/w external = 39132 – 2100*3 33000 / 600 = 5. 5 cycles Factor of 2 slower 22 / 29 DMA , Copyright M. Smith, ECE, University of Calgary, Canada 9/23/2021

Memory to memory move Release Mode 23 / 29 DMA , Copyright M. Smith,

Memory to memory move Release Mode 23 / 29 DMA , Copyright M. Smith, ECE, University of Calgary, Canada 9/23/2021

Review for final A) What happened here? B) What happened here? C) What happened

Review for final A) What happened here? B) What happened here? C) What happened here? D) Why did this happen inside loop? E) What happened here? F) Determine loop efficiency in terms of instructions in terms of cycles / read_write op 24 / 29 DMA , Copyright M. Smith, ECE, University of Calgary, Canada 9/23/2021

Answer questions A ¢B ¢C ¢D ¢E ¢ 25 / 29 DMA , Copyright

Answer questions A ¢B ¢C ¢D ¢E ¢ 25 / 29 DMA , Copyright M. Smith, ECE, University of Calgary, Canada 9/23/2021

Release mode internal to internal F) Determine loop efficiency in terms of cycles /

Release mode internal to internal F) Determine loop efficiency in terms of cycles / read_write op internal memory -> internal memory size was 300 Useful reads 300 Useful writes 300 Cycles 625 as measured 625 / 600 = 1. 05 Why not an exact number? Instructions in loop? 4 300 * 4 = 1200 WE WOULD EXPECT 1200 cycles!!!! Where did the difference go? 26 / 29 DMA , Copyright M. Smith, ECE, University of Calgary, Canada 9/23/2021

Release mode external to external F) Determine loop efficiency in terms of cycles /

Release mode external to external F) Determine loop efficiency in terms of cycles / read_write op SDRAM -> internal memory size was 300 Useful reads 300 Useful writes 300 Cycles 28200 as measured 28200 / 600 = 47 SDRAM access 47 cycles L 1 memory 1 cycle Would make sense to process in L 1 memory – so move SDRAM to L 1 to process 27 / 29 DMA , Copyright M. Smith, ECE, University of Calgary, Canada 9/23/2021

External to internal F) Determine loop efficiency in terms of cycles / read_write op

External to internal F) Determine loop efficiency in terms of cycles / read_write op SDRAM -> internal memory size was 300 Useful reads 300 Useful writes 300 Cycles 4836 as measured 300 of those are L 1 writes Leaving 4500 / 300 = 15 SDRAM read before SDRAM read now L 1 -> L 1 Loads of overhead in SDRAM to SDRAM 28 / 29 47 cycles 15 cycles 1 cycle Would make sense to process in L 1 memory – so move SDRAM to L 1 to process DMA , Copyright M. Smith, ECE, University of Calgary, Canada 9/23/2021

Tackled today ¢ Review of handling external arrays (global arrays) from assembly code l

Tackled today ¢ Review of handling external arrays (global arrays) from assembly code l l l ¢ Arrays declared in another file Arrays declared in this file -- NEW Needed for arrays used by ISRs Arrays declared on the stack l l Pointers passed as parameters to a subroutine Can’t use arrays on the stack when used by ISR 29 / 29 DMA , Copyright M. Smith, ECE, University of Calgary, Canada 9/23/2021

¢ Information taken from Analog Devices On-line Manuals with permission http: //www. analog. com/processors/resources/technical.

¢ Information taken from Analog Devices On-line Manuals with permission http: //www. analog. com/processors/resources/technical. Library/manuals/ ¢ Information furnished by Analog Devices is believed to be accurate and reliable. However, Analog Devices assumes no responsibility for its use or for any infringement of any patent other rights of any third party which may result from its use. No license is granted by implication or otherwise under any patent or patent right of Analog Devices. Copyright Analog Devices, Inc. All rights reserved. 30 / 29 DMA , Copyright M. Smith, ECE, University of Calgary, Canada 9/23/2021