Andrea Michelotti Atmel WP 2 Leader Andrea Michelotti

  • Slides: 20
Download presentation
Andrea Michelotti, Atmel WP 2 Leader Andrea Michelotti - Atmel Toolchain Overview & Demo

Andrea Michelotti, Atmel WP 2 Leader Andrea Michelotti - Atmel Toolchain Overview & Demo h. Artes 2010 Final Review Toolchain Overview & Demo 1/20

 • Key Achievements • Developing an Application for a Heterogeneous Platform – Initialization

• Key Achievements • Developing an Application for a Heterogeneous Platform – Initialization – Sharing resources – Calling a DSP function – Targeting Execution – Expressing parallelism • Conclusions Andrea Michelotti - Atmel Toolchain Overview & Demo Agenda 2/20

The h. Artes toolchain… • dramatically minimizes learning curve • hides heterogeneous complexity: Øno

The h. Artes toolchain… • dramatically minimizes learning curve • hides heterogeneous complexity: Øno expert knowledge of the platform, tools or new programming languages Øuse ‘C’ or tools that produce ‘C’ (Scilab/Nutech) • automatically speeds-up application (in respect to GPP-only execution) by exploiting Processor Element capabilities Andrea Michelotti - Atmel Toolchain Overview & Demo Key Achievements (1) 3/20

Easily retarget new platforms: Øbasic OMAP porting took ~1 month Øother popular platforms like

Easily retarget new platforms: Øbasic OMAP porting took ~1 month Øother popular platforms like GPUs can be targeted as well Ømust conform to Molen machine Quickly evaluate how an application behaves on different architectures Øimportant for time to market Andrea Michelotti - Atmel Toolchain Overview & Demo Key Achievements (2) 4/20

Heterogeneous Platforms: ØAtmel DEB (Diopsis Evaluation Board): ARM 9 GPP + m. Agic. V

Heterogeneous Platforms: ØAtmel DEB (Diopsis Evaluation Board): ARM 9 GPP + m. Agic. V DSP ØUni. FE’s h. Artes HW Platform: ARM 9 GPP + m. Agic. V DSP + Xilinx FPGA ØScaleo’s h. Artes Emulation Platform: ARM 9 GPP + m. Agic. V DSP + Altera FPGA ØTI Experimenter OMAPL 138: ARM 9 GPP + C 67 TI DSP Andrea Michelotti - Atmel Toolchain Overview & Demo Targeted Platforms/Architectures 5/20

Before h. Artes, toolchains for heterogeneous hardware (Diopsis and OMAP) consist of separate subchains:

Before h. Artes, toolchains for heterogeneous hardware (Diopsis and OMAP) consist of separate subchains: one for GPP and one for DSP. MAIN PROBLEMS: • High learning curve: target tools, architecture, APIs…. • Without knowledge of the underlying software/hardware it’s difficult to use and to benefit from the existence of PEs; • Code maintainability: two distinct projects must be kept aligned; • Code portability: usually GPP code contains specific APIs to load, execute and access PEs resources; An identical C-code cannot be produce correct result across all the platforms; • Debugging: not an unified image, not an unified debugger. Andrea Michelotti - Atmel Toolchain Overview & Demo Developing an Application for a Heterogeneous Platform in a nutshell 6/20

Suppose we want to port our legacy code on a powerful but heterogeneous architecture

Suppose we want to port our legacy code on a powerful but heterogeneous architecture like OMAP or DIOPSIS… C code running on my host PC void main(){ … unsigned *shared_array=malloc(SIZE); … my_fft(int param, shared_array…); another_kernel(…); … } Seems easy, but, after a while, becomes a nightmare! … Andrea Michelotti - Atmel Toolchain Overview & Demo Writing an application for a heterogeneous platform in a nutshell 7/20

Initialization int main(int argc, char**argv){ … if (DSP_SUCCEEDED (status)) { status = PROC_setup (NULL)

Initialization int main(int argc, char**argv){ … if (DSP_SUCCEEDED (status)) { status = PROC_setup (NULL) ; } if (DSP_SUCCEEDED (status)) { status = PROC_attach (processor. Id, NULL) ; if (DSP_FAILED (status)) { RDWR_1 Print ("PROC_attach failed. Status: [0 x%x]n", status) ; } } else { RDWR_1 Print ("PROC_setup failed. Status: [0 x%x]n", status) ; } if (DSP_SUCCEEDED (status)) { args [0] = str. Num. Iterations ; { status = PROC_load (processor. Id, dsp. Executable_myfft, NUM_ARGS, args) ; } if (DSP_FAILED (status)) { RDWR_1 Print ("PROC_load failed. Status: [0 x%x]n", status) ; } } if (DSP_SUCCEEDED (status)) { status = PROC_start (processor. Id) ; if (DSP_FAILED (status)) { RDWR_1 Print ("PROC_start failed. Status: [0 x%x]n", status) ; } } Use of specific API to access DSP. The DSP image is loaded as a file. The DSP has its own main DSP code Andrea Michelotti - Atmel Toolchain Overview & Demo GPP code OMAP toolchain (through dsplink) 8/20 void myfft(); int main(int argc, char**argv){ myfft(); }

Initialization GPP code int main (int argc, char *argv[]){. . ret = m. Agic.

Initialization GPP code int main (int argc, char *argv[]){. . ret = m. Agic. V_load_PM("myfft. bin", _m_fd_extm); if(ret!=0) return ret; ret = m. Agic. V_load_DM(“myfft_datamem. bin"); if(ret!=0) return ret; ret = m. Agic. V_load_XM(“myfft_extmem. bin", (0 x 365890)); if(ret!=0) return ret; ret = m. Agic. V_init_PMU(); if(ret!=0) return ret; magic. V_start(); . . magic. V_wait(); . . } Very low API to access DSP. The DSP image is binary loaded as a file. The DSP has its own main DSP code void myfft(); int main(int argc, char**argv){ … myfft(); … } Andrea Michelotti - Atmel Toolchain Overview & Demo Diopsis toolchain case 9/20

Initialization GPP/Application code int main (int argc, char *argv[]){. . } The h. Artes

Initialization GPP/Application code int main (int argc, char *argv[]){. . } The h. Artes Toolchain and h. Artes Runtime take care of hiding all the initialization details. JUST one code. Andrea Michelotti - Atmel Toolchain Overview & Demo h. Artes toolchain case 10/20

GPP code OMAP toolchain case int main(int argc, char**argv){ SMAPOOL_Attrs pool. Attrs. buf. Sizes

GPP code OMAP toolchain case int main(int argc, char**argv){ SMAPOOL_Attrs pool. Attrs. buf. Sizes = (Uint 32 *) &size ; pool. Attrs. num. Buffers = (Uint 32 *) &num. Bufs ; pool. Attrs. num. Buf. Pools = NUM_BUF_SIZES ; pool. Attrs. exact. Match. Req = TRUE ; volatile unsigned* my_shared_array; status = POOL_open (POOL_make. Pool. Id(processor. Id, SAMPLE_POOL_ID), &pool. Attrs) ; if (DSP_FAILED (status)) { MPCSXFER_1 Print ("POOL_open () failed. Status = [0 x%x]n", status) ; } } if (DSP_SUCCEEDED (status)) { status = POOL_alloc (POOL_make. Pool. Id(processor. Id, SAMPLE_POOL_ID), &my_shared_array, SIZE, DSPLINK_BUF_ALIGN)) ; /* Get the translated DSP address to be sent to the DSP. */ if (DSP_SUCCEEDED (status)) { status = POOL_translate. Addr ( POOL_make. Pool. Id(processor. Id, SAMPLE_POOL_ID), &dsp. Ctrl. Buf, Addr. Type_Dsp, (Void *) &my_shared_array_from_dsp, Addr. Type_Usr) ; Use of specific API to create a shared area, translate the address for DSP and then pass the translated address to DSP via messages. Memory Layout must be configured by recompiling drivers!! Andrea Michelotti - Atmel Toolchain Overview & Demo Sharing resources 11/20 DSP code int main(int argc, char**argv){ DSPlink_init(); …. }

GPP code Diopsis toolchain case #define MY_SHARED_ARRAY_ADDR 2 int main(int argc, char**argv){. . unsigned

GPP code Diopsis toolchain case #define MY_SHARED_ARRAY_ADDR 2 int main(int argc, char**argv){. . unsigned local_my_shared_array[]; m. Agic. V_read_buff(local_my_shared_array, MY_SHARED_ARRAY_ADDR, si zeof(my_shared_array)); . . // modify local copy, write back m. Agic. V_write_buff(local_my_shared_array, MY_SHARED_ARRAY_ADDR, s izeof(my_shared_array)); … DSP code Very raw access, sharing directly addresses that are manually mapped, using local copy and write back. Use of specific APIs and Specific compiler directives. NO LINKER used, many problems of source alignment, debug #define MY_SHARED_ARRAY_ADDR 2 volatile long chess_storage(DATA: MY_SHARED_ARRAY_ADDR ) my_shared_array; volatile long chess_storage(DATA: MY_SHARED_ARRAY_ADDR+SIZEOF(my_shared_array ) my_other_variable; int main(int argc, char**argv){ // access the variable } Andrea Michelotti - Atmel Toolchain Overview & Demo Sharing resources 12/20

h. Artes toolchain case GPP code Intuitive and portable. int main(int argc, char**argv){. .

h. Artes toolchain case GPP code Intuitive and portable. int main(int argc, char**argv){. . Unsigned* my_shared_array; my_shared_array = malloc(MYSIZE); #pragma map call_hw dsp 0 dsp_func(my_shared_array); DSE turns automatically malloc into hmalloc (h. Artes API), that allocates and traces memory in a shared physical space of the target platform. Andrea Michelotti - Atmel Toolchain Overview & Demo Sharing resources 13/20 Very natural access

GPP code OMAP/Diopsis toolchain cases GPP code int main(int argc, char**argv){ … // initialization,

GPP code OMAP/Diopsis toolchain cases GPP code int main(int argc, char**argv){ … // initialization, see main if (DSP_SUCCEEDED (status)) { status = PROC_start (processor. Id) ; if (DSP_FAILED (status)) { RDWR_1 Print ("PROC_start failed. Status: [0 x%x]n", status) ; } } … m. Agic. V_start() … DSP code void my_fft(int pp, float*…); int main(int argc, char**argv){ … // initialization, see main my_fft(); … } There is not concept of DSP call from GPP, the GPP can start a DSP process that executes the desired function. The call can be inefficient or maybe cannot be executed correctly on the DSP. The programmer must know the underlying architecture! For example the DSP in the DIOPSIS architecture the type int is 16 bit wide. Typically is 32 bit. Andrea Michelotti - Atmel Toolchain Overview & Demo Calling a DSP routine 14/20

Calling a DSP routine GPP code void my_fft(…){ … } int main(int argc, char**argv){.

Calling a DSP routine GPP code void my_fft(…){ … } int main(int argc, char**argv){. . #pragma call_hw dsp 0 my_fft(); . . } Intuitive and portable. DSE checks if the my_fft function can be executed on the target DSP (checks parameters, used stack memory). It also estimates the cost of the call to decide if it’s convenient to move the execution on the DSP. To call the DSP function, the DSE adds a pragma to the function call and generate a C-source that can be compiled by the DSP toolchain. Andrea Michelotti - Atmel Toolchain Overview & Demo h. Artes toolchain case 15/20

Expressing Parallelism NOT KNOWN/NOT IMPLEMENTED Andrea Michelotti - Atmel Toolchain Overview & Demo OMAP/Diopsis

Expressing Parallelism NOT KNOWN/NOT IMPLEMENTED Andrea Michelotti - Atmel Toolchain Overview & Demo OMAP/Diopsis toolchain 16/20

Expressing Parallelism Void main(){ Intuitive and portable. … h. Artes supports some open. MP

Expressing Parallelism Void main(){ Intuitive and portable. … h. Artes supports some open. MP construct to express parallelism. #pragma omp parallel sections { #pragma omp section { #pragma call_hw dsp 0 my_fft(); } #pragma omp section { another_kernel(…); } } } DSE in some case automatically detects kernels that can go in parallel and adds open. MP annotations to the c-source. The parallelism can be also explicited via POSIX threads #pragma call_hw dsp 0 void my_fft(…); int main(int argc, char**argv){ … // initialization, see hthread_create(my_fft()…); Another_kernel(); hthread_join(); } Andrea Michelotti - Atmel Toolchain Overview & Demo h. Artes toolchain case 17/20

Target Execution Two separate binaries, not common symbols, not unified debugger, I/O messages (printf)

Target Execution Two separate binaries, not common symbols, not unified debugger, I/O messages (printf) often relies on jtag connection. RUN and HOPE! Andrea Michelotti - Atmel Toolchain Overview & Demo (under Linux) OMAP/Diopsis toolchain 18/20 bash$. /my_fft_arm. elf <my_fft_dsp. bin>

Target Execution Single ELF binary, common symbols, unified debugger, I/O messages (printf) on the

Target Execution Single ELF binary, common symbols, unified debugger, I/O messages (printf) on the target. RUN! Andrea Michelotti - Atmel Toolchain Overview & Demo (under Linux) h. Artes toolchain 19/20 $bash. /my_fft. elf

 • Although the original “Brain to Bit” (B 2 B) objective was very

• Although the original “Brain to Bit” (B 2 B) objective was very ambitious, the h. Artes toolchain fulfilled its original promise: to support software development of heterogeneous hardware without expert knowledge of the target platform, and therefore allowing developers to achieve high-performance applications through complete automated solutions and by abstracting low-level hardware details. • Areas of improvement regards mainly data flow analysis, automatic parallelization, AET integration in Eclipse, debugging capabilities. Andrea Michelotti - Atmel Toolchain Overview & Demo Conclusions 20/20