Selected Max Compiler Examples Sasa Stojanovic stojsasaetf rs

  • Slides: 96
Download presentation
Selected Max. Compiler Examples Sasa Stojanovic stojsasa@etf. rs

Selected Max. Compiler Examples Sasa Stojanovic stojsasa@etf. rs

Introduction How-to? What-to? � One has to know how to program Maxeler machines, in

Introduction How-to? What-to? � One has to know how to program Maxeler machines, in order to get the best possible speedup out of them! � For some applications (G), there is a large difference between what an experienced programmer achieves, and what an un-experienced one can achieve! � For some other applications (B), no matter how experienced the programmer is, the speedup will not be revolutionary (may be even <1). 2/x

Introduction Lemas � Lemas: ◦ 1. The how-to and how-not-to is important to know!

Introduction Lemas � Lemas: ◦ 1. The how-to and how-not-to is important to know! ◦ 2. The what-to and what-not-to is important to know! � N. B. ◦ The how-to is taught through most of the examples to follow (all except the introductory ones). ◦ The what-to/what-not-to is taught using a figure. 3/x

Introduction The Essential Figure: t. CPU = N * NOPS * CCPU*Tclk. CPU /Ncores.

Introduction The Essential Figure: t. CPU = N * NOPS * CCPU*Tclk. CPU /Ncores. CPU t. GPU = t. DF = NOPS * CDF * Tclk. DF + N * NOPS * CGPU*Tclk. GPU / Ncores. GPU (N – 1) * Tclk. DF / NDF Assumptions: 1. Software includes enough parallelism to keep all cores busy 2. The only limiting factor is the number of cores. 4/x

Introduction Bottomline: � When is Maxeler better? ◦ If the number of operations in

Introduction Bottomline: � When is Maxeler better? ◦ If the number of operations in a single loop iteration is above some critical value ADDITIVE SPEEDUP ENABLER ◦ Then More data items means more advantage for Maxeler. � In other words: ADDITIVE SPEEDUP MAKER ◦ More data does not mean better performance if the #operations/iteration is below a critical value. � Conclusion: ◦ If we see an application with a small #operations/iteration, it is possibly (not always) a “what-not-to” application, and we better execute it on the host; otherwise, we will (or may) have a slowdown. 5/x

Introduction To have it more concrete: � Maxeler: One new result in each cycle

Introduction To have it more concrete: � Maxeler: One new result in each cycle e. g. Clock = 100 MHz Period = 10 ns One result every 10 ns [No matter how many operations in each loop iteration] Consequently: More operations does not mean proportionally more time; however, more operations means higher latency till the first result. � � CPU: One new result after each iteration e. g. Clock=10 GHz (!? ) Period = 100 ps One result every 100 ps times #ops [If #ops > 100 => Maxeler is better, although it uses a slower clock] Also: The CPU example will feature an additional slowdown, due to memory hierarchy access and pipeline related hazards => critical #ops (bringing the same performance) is significantly below 100!!! 6/x

Introduction Don’t missunderstand! � Maxeler has no cache, but does have a memory hierarchy.

Introduction Don’t missunderstand! � Maxeler has no cache, but does have a memory hierarchy. � However, memory hierarchy access with Maxeler is carefully planed by the programmer at the program write time � As opposed to memory hierarchy access with a multicore CPU/GPU which calculates the access address at the program run time. 7/x

Introduction Teaching by Questioning � Now we are ready for examples which show how-to

Introduction Teaching by Questioning � Now we are ready for examples which show how-to � My questions, from time to time, will ask you about time consequences of how-not-to alternatives 8/x

Introduction N. B. � We have chosen many simple examples [small steps] which together

Introduction N. B. � We have chosen many simple examples [small steps] which together build a realistic application [mountain top] vs father three sons with 1 -stick bunches a 3 -stick bunch 9/x

Introduction N. B. � Java to configure Maxeler! C to program the host! �

Introduction N. B. � Java to configure Maxeler! C to program the host! � One or more kernels! Only one manager! � In theory, Simulator builder not needed if a card is used. In practice, you need it until the testing is over, since the compilation process is slow, for hardware, and fast, for software (simulator). 10/x

Content 1/2 � E#1: Hello world � E#2: Vector addition � E#3: Type mixing

Content 1/2 � E#1: Hello world � E#2: Vector addition � E#3: Type mixing � E#4: Addition of a constant and a vector � E#5: Input/output control � E#6: Conditional execution � E#7: Moving average 1 D � E#8: Moving average 2 D � E#9: Array summation � E#10: Optimization of E#9 11/x

Content � E#11: � E#12: � E#13: � E#14: � E#15: � E#16: �

Content � E#11: � E#12: � E#13: � E#14: � E#15: � E#16: � E#17: � E#18: � E#19: � E#20: 2/2 TBD TBD TBD 12/x

Example No. 1: Hello World! � � Write a program that sends the “Hello

Example No. 1: Hello World! � � Write a program that sends the “Hello World!” string to the MAX 2 card, for the MAX 2 card kernel to return it back to the host. To be learned through this example: ◦ How to make the configuration of the accelerator (MAX 2 card) using Java: � How to make a simple kernel (ops description) using Java (the only language), � How to write the standard manager (config description based on kernel(s)) using Java, ◦ How to test the kernel using a test (code+data) written in Java, ◦ How to compile the Java code for MAX 2, ◦ How to write a simple C code that runs on the host and triggers the kernel, � How to write the C code that streams data to the kernel, � How to write the C code that accepts data from the kernel, ◦ How to simulate and execute an application program in C that runs on the host and periodically calls the accelerator. 13/x

Example No. 1 Standard Files in a MAX Project � One or more kernel

Example No. 1 Standard Files in a MAX Project � One or more kernel files, to define operations of the application: ◦ <app_name>Kernel[<additional_name>]. java � One (or more) Java file, for simulation of the kernel(s): ◦ <app_name>Sim. Runner. java � One manager file for transforming the kernel(s) into the configuration of the MAX card (instantiation and connection of kernels): ◦ <app_name>Manager. java � Simulator builder: ◦ <app_name>Host. Sim. Builder. java � Hardware builder: ◦ <app_name>HWBuilder. java � Application code that uses the MAX card accelerator: ◦ <app_name>Host. Code. c � Makefile ◦ A script file that defines the compilation related commands 14/x

Example No. 1 example 1 Kernel. java package ind. z 1; import com. maxeler.

Example No. 1 example 1 Kernel. java package ind. z 1; import com. maxeler. maxcompiler. v 1. kernelcompiler. Kernel. Parameters; import com. maxeler. maxcompiler. v 1. kernelcompiler. types. base. HWVar; public class hello. Kernel extends Kernel { public hello. Kernel(Kernel. Parameters parameters) { super(parameters); // Input: HWVar x = io. input("x", hw. Int(8)); It is possible to substitute the HWVar result = x; // Output: last three lines with: io. output("z", result, hw. Int(8)); } } 15/x

Example No. 1 example 1 Sim. Runner. java package ind. z 1; import com.

Example No. 1 example 1 Sim. Runner. java package ind. z 1; import com. maxeler. maxcompiler. v 1. managers. standard. Simulation. Manager; public class hello. Sim. Runner { public static void main(String[] args) { Simulation. Manager m = new Simulation. Manager(“hello. Sim"); hello. Kernel k = new hello. Kernel( m. make. Kernel. Parameters() ); m. set. Kernel(k); m. set. Input. Data("x", 1, 2, 3, 4, 5, 6, 7, 8); m. set. Kernel. Cycles(8); m. run. Test(); m. dump. Output(); double expected. Output[] = { 1, 2, 3, 4, 5, 6, 7, 8 }; m. check. Output. Data("z", expected. Output); m. log. Msg("Test passed OK!"); } } 16/x

Example No. 1 example 1 Host. Sim. Builder. java package ind. z 1; import

Example No. 1 example 1 Host. Sim. Builder. java package ind. z 1; import static config. Board. Model. BOARDMODEL; com. maxeler. maxcompiler. v 1. kernelcompiler. Kernel; com. maxeler. maxcompiler. v 1. managers. standard. Manager. IOType; public class hello. Host. Sim. Builder { public static void main(String[] args) { Manager m = new Manager(true, ”hello. Host. Sim", BOARDMODEL); Kernel k = new hello. Kernel(m. make. Kernel. Parameters(“hello. Kernel")); m. set. Kernel(k); m. set. IO(IOType. ALL_PCIE); m. build(); } } 17/x

Example No. 1 example 1 Hw. Builder. java package ind. z 1; import static

Example No. 1 example 1 Hw. Builder. java package ind. z 1; import static config. Board. Model. BOARDMODEL; com. maxeler. maxcompiler. v 1. kernelcompiler. Kernel; com. maxeler. maxcompiler. v 1. managers. standard. Manager. IOType; public class hello. HWBuilder { public static void main(String[] args) { Manager m = new Manager(“hello", BOARDMODEL); Kernel k = new hello. Kernel( m. make. Kernel. Parameters() ); m. set. Kernel(k); m. set. IO(IOType. ALL_PCIE); m. build(); } } 18/x

Example No. 1 example 1 Host. Code. c 1/2 #include <stdio. h> #include <Max.

Example No. 1 example 1 Host. Code. c 1/2 #include <stdio. h> #include <Max. Compiler. RT. h> int main(int argc, char* argv[]) { char *device_name = (argc==2 ? argv[1] : "/dev/maxeler 0"); max_maxfile_t* maxfile; max_device_handle_t* device; char data_in 1[16] = "Hello world!"; char data_out[16]; printf("Opening and configuring FPGA. n"); maxfile = max_maxfile_init_hello(); device = max_open_device(maxfile, device_name); max_set_terminate_on_error(device); 19/x

Example No. 1 example 1 Host. Code. c 2/2 printf("Streaming data to/from FPGA. .

Example No. 1 example 1 Host. Code. c 2/2 printf("Streaming data to/from FPGA. . . n"); max_run(device, max_input("x", data_in 1, 16 * sizeof(char)), max_output("z", data_out, 16 * sizeof(char)), max_runfor(“hello. Kernel", 16), max_end()); printf("Checking data read from FPGA. n"); max_close_device(device); max_destroy(maxfile); } return 0; 20/x

Example No. 1 Makefile # Root of the project directory tree BASEDIR=. . /.

Example No. 1 Makefile # Root of the project directory tree BASEDIR=. . /. . # Java package name PACKAGE=ind/z 1 # Application name APP=example 1 # Names of your maxfiles HWMAXFILE=$(APP). max HOSTSIMMAXFILE=$(APP)Host. Sim. max # Java application builders HWBUILDER=$(APP)HWBuilder. java HOSTSIMBUILDER=$(APP)Host. Sim. Builder. java SIMRUNNER=$(APP)Sim. Runner. java # C host code HOSTCODE=$(APP)Host. Code. c # Target board BOARD_MODEL=23312 # Include the master makefile. include nullstring : = space : = $(nullstring) # comment MAXCOMPILERDIR_QUOTE: =$(subst $(space), , $(MAXCOMPILERDIR)) include $(MAXCOMPILERDIR_QUOTE)/examples/common/Makefile. include 21/x

Example No. 1 Board. Model. java package config; import com. maxeler. maxcompiler. v 1.

Example No. 1 Board. Model. java package config; import com. maxeler. maxcompiler. v 1. managers. MAX 2 Board. Model; public class Board. Model { public static final MAX 2 Board. Model BOARDMODEL = MAX 2 Board. Model. MAX 2336 B; } 22/x

Types Hardware Types 23/x

Types Hardware Types 23/x

Types Hardware Primitive Types � Floating point numbers - HWFloat: ◦ hw. Float(exponent_bits, mantissa_bits);

Types Hardware Primitive Types � Floating point numbers - HWFloat: ◦ hw. Float(exponent_bits, mantissa_bits); ◦ float ~ hw. Float(8, 24) ◦ double ~ hw. Float(11, 53) � Fixed point numbers - HWFix: ◦ hw. Fix(integer_bits, fractional_bits, sign_mode) � Sign. Mode. UNSIGNED � Sign. Mode. TWOSCOMPLEMENT � Integers - HWFix: ◦ hw. Int(bits) ~ hw. Fix(bits, 0, Sign. Mode. TWOSCOMPLEMENT) � Unsigned integers - HWFix: ◦ hw. Uint(bits) ~ hw. Fix(bits, 0, Sign. Mode. UNSIGNED) � Boolean – HWFix: ◦ hw. Bool() ~ hw. Fix(1, 0, Sign. Mode. UNSIGNED) ◦ 1 ~ true ◦ 2 ~ false � Raw bits – HWRaw. Bits: ◦ hw. Raw. Bits(width) 24/x

Example No. 2: Vector Addition � Write a program that adds two arrays of

Example No. 2: Vector Addition � Write a program that adds two arrays of floating point numbers. � Program reads the size of arrays, makes two arrays with an arbitrary content (test inputs), and adds them using a MAX card. 25/x

Example No. 2 example 2 Kernel. Java package ind. z 2; import com. maxeler.

Example No. 2 example 2 Kernel. Java package ind. z 2; import com. maxeler. maxcompiler. v 1. kernelcompiler. Kernel. Parameters; import com. maxeler. maxcompiler. v 1. kernelcompiler. types. base. HWVar; public class example 2 Kernel extends Kernel { public example 2 Kernel(Kernel. Parameters parameters) { super(parameters); // Input HWVar x = io. input("x", hw. Float(8, 24)); HWVar y = io. input("y", hw. Float(8, 24)); HWVar result = x + y; // Output io. output("z", result, hw. Float(8, 24)); } } 26/x

Example No. 2 example 2 Sim. Runner. java package ind. z 2; import com.

Example No. 2 example 2 Sim. Runner. java package ind. z 2; import com. maxeler. maxcompiler. v 1. managers. standard. Simulation. Manager; public class example 2 Sim. Runner { public static void main(String[] args) { Simulation. Manager m = new Simulation. Manager("example 2 Sim"); example 2 Kernel k = new example 2 Kernel( m. make. Kernel. Parameters() ); m. set. Kernel(k); m. set. Input. Data("x", 1, 2, 3, 4, 5, 6, 7, 8); m. set. Input. Data("y", 2, 3, 4, 5, 6, 7, 8, 9); m. set. Kernel. Cycles(8); m. run. Test(); m. dump. Output(); double expected. Output[] = { 3, 5, 7, 9, 11, 13, 15, 17 }; } } m. check. Output. Data("z", expected. Output); m. log. Msg("Test passed OK!"); 27/x

Example No. 2 example 2 Host. Sim. Builder. java package ind. z 2; import

Example No. 2 example 2 Host. Sim. Builder. java package ind. z 2; import static config. Board. Model. BOARDMODEL; import com. maxeler. maxcompiler. v 1. kernelcompiler. Kernel; import com. maxeler. maxcompiler. v 1. managers. standard. Manager. IOType; public class example 2 Host. Sim. Builder { public static void main(String[] args) { Manager m = new Manager(true, "example 2 Host. Sim", BOARDMODEL); Kernel k = new example 2 Kernel( m. make. Kernel. Parameters("example 2 Kernel") ); m. set. Kernel(k); m. set. IO(IOType. ALL_PCIE); } } m. build(); 28/x

Example No. 2 example 2 HWBuilder. java package ind. z 2; import static config.

Example No. 2 example 2 HWBuilder. java package ind. z 2; import static config. Board. Model. BOARDMODEL; import com. maxeler. maxcompiler. v 1. kernelcompiler. Kernel; import com. maxeler. maxcompiler. v 1. managers. standard. Manager. IOType; public class example 2 HWBuilder { public static void main(String[] args) { Manager m = new Manager("example 2", BOARDMODEL); Kernel k = new example 2 Kernel( m. make. Kernel. Parameters() ); m. set. Kernel(k); m. set. IO(IOType. ALL_PCIE); } } m. build(); 29/x

Example No. 2 example 2 Host. Code. c 1/2 #include <stdio. h> #include <stdlib.

Example No. 2 example 2 Host. Code. c 1/2 #include <stdio. h> #include <stdlib. h> #include <Max. Compiler. RT. h> int main(int argc, char* argv[]) { char *device_name = (argc==2 ? argv[1] : "/dev/maxeler 0"); max_maxfile_t* maxfile; max_device_handle_t* device; float *data_in 1, *data_in 2, *data_out; unsigned long N, i; printf("Enter size of array: "); scanf("%lu", &N); data_in 1 = malloc(N * sizeof(float)); data_in 2 = malloc(N * sizeof(float)); data_out = malloc(N * sizeof(float)); for(i = 0; i < N; i++){ data_in 1[i] = i%10; data_in 2[i] = i%3; } printf("Opening and configuring FPGA. n"); 30/x

Example No. 2 example 2 Host. Code. c 2/2 maxfile = max_maxfile_init_example 2(); device

Example No. 2 example 2 Host. Code. c 2/2 maxfile = max_maxfile_init_example 2(); device = max_open_device(maxfile, device_name); max_set_terminate_on_error(device); printf("Streaming data to/from FPGA. . . n"); max_run(device, max_input("x", data_in 1, N * sizeof(float)), max_input("y", data_in 2, N * sizeof(float)), max_output("z", data_out, N * sizeof(float)), max_runfor("example 2 Kernel", N), max_end()); printf("Checking data read from FPGA. n"); for(i = 0; i < N; i++) if (data_out[i] != i%10 + i%3){ printf("Error on element %d. Expected %f, but found %f. ", i, (float)(i%10+i%3), data_out[i]); break; } } max_close_device(device); max_destroy(maxfile); return 0; 31/x

Example No. 3: Type Mixing � Do the same as in the example no

Example No. 3: Type Mixing � Do the same as in the example no 2, with the following modification: one input array contains floating point numbers, and the other one contains integers. 32/x

Example No. 3 Type Conversion � � Casting here means moving data from one

Example No. 3 Type Conversion � � Casting here means moving data from one form to another, without changing their essence. Type is: ◦ specified for inputs and outputs, ◦ propagated from inputs, down the dataflow graph to outputs, ◦ used to check that output stream has correct type. � � If conversion is needed, explicit conversion (cast) is required How to do it? ◦ use the method cast in class HWVar, � Additional hardware required (especially for conversion to or from floating point numbers), ◦ introduces additional latency. � Cast between a floating point number and an integer number is done by rounding to the nearest integer! 33/x

Example No. 3 example 3 Kernel. Java package ind. z 3; import com. maxeler.

Example No. 3 example 3 Kernel. Java package ind. z 3; import com. maxeler. maxcompiler. v 1. kernelcompiler. Kernel. Parameters; import com. maxeler. maxcompiler. v 1. kernelcompiler. types. base. HWVar; public class example 3 Kernel extends Kernel { public example 3 Kernel(Kernel. Parameters parameters) { super(parameters); // Input HWVar x = io. input("x", hw. Float(8, 24)); HWVar y = io. input("y", hw. Int(32)); HWVar result = x + y. cast(hw. Float(8, 24)); // Output io. output("z", result, hw. Float(8, 24)); } } 34/x

Example No. 3 example 3 Sim. Runner. java package ind. z 3; import com.

Example No. 3 example 3 Sim. Runner. java package ind. z 3; import com. maxeler. maxcompiler. v 1. managers. standard. Simulation. Manager; public class example 3 Sim. Runner { public static void main(String[] args) { Simulation. Manager m = new Simulation. Manager("example 3 Sim"); example 3 Kernel k = new example 3 Kernel( m. make. Kernel. Parameters() ); m. set. Kernel(k); m. set. Input. Data("x", 1, 2, 3, 4, 5, 6, 7, 8); m. set. Input. Data("y", 2, 3, 4, 5, 6, 7, 8, 9); m. set. Kernel. Cycles(8); m. run. Test(); m. dump. Output(); double expected. Output[] = { 3, 5, 7, 9, 11, 13, 15, 17 }; } } m. check. Output. Data("z", expected. Output); m. log. Msg("Test passed OK!"); 35/x

Example No. 3 example 3 Host. Code. c 1/2 #include <stdio. h> #include <stdlib.

Example No. 3 example 3 Host. Code. c 1/2 #include <stdio. h> #include <stdlib. h> #include <Max. Compiler. RT. h> int main(int argc, char* argv[]) { char *device_name = (argc==2 ? argv[1] : "/dev/maxeler 0"); max_maxfile_t* maxfile; max_device_handle_t* device; float *data_in 1, *data_out; int *data_in 2; unsigned long N, i; printf("Enter size of array: "); scanf("%lu", &N); data_in 1 = malloc(N * sizeof(float)); data_in 2 = malloc(N * sizeof(int)); data_out = malloc(N * sizeof(float)); for(i = 0; i < N; i++){ data_in 1[i] = i%10; data_in 2[i] = i%3; } printf("Opening and configuring FPGA. n"); 36/x

Example No. 3 example 3 Host. Code. c 2/2 maxfile = max_maxfile_init_example 3(); device

Example No. 3 example 3 Host. Code. c 2/2 maxfile = max_maxfile_init_example 3(); device = max_open_device(maxfile, device_name); max_set_terminate_on_error(device); printf("Streaming data to/from FPGA. . . n"); max_run(device, max_input("x", data_in 1, N * sizeof(float)), max_input("y", data_in 2, N * sizeof(int)), max_output("z", data_out, N * sizeof(float)), max_runfor("example 3 Kernel", N), max_end()); printf("Checking data read from FPGA. n"); for(i = 0; i < N; i++){ if (data_out[i] != i%10 + i%3){ printf("Error on element %d. Expected %f, but found %f. ", i, (float)(i%10+i%3), data_out[i]); break; } } max_close_device(device); max_destroy(maxfile); } return 0; 37/x

Generating Graph � Command: ◦ max. Render. Graphs <build_dir> ◦ <build_dir> - directory where

Generating Graph � Command: ◦ max. Render. Graphs <build_dir> ◦ <build_dir> - directory where the design is compiled � In the virtual machine, directory “Desktop/Max. Compiler-Builds” contains the build directories. � Example for application “example 2”: ◦ max. Render. Graphs example 2 Host. Sim ◦ Renders graphs for the resulting max file 38/x

Generating Graph Final Kernel Graph for Example No 2 39/x

Generating Graph Final Kernel Graph for Example No 2 39/x

Generating Graph Final Kernel Graph for Example No 3 40/x

Generating Graph Final Kernel Graph for Example No 3 40/x

Example No. 4: Addition of a Constant and a Vector Example No. 4 �

Example No. 4: Addition of a Constant and a Vector Example No. 4 � Write a program that adds a constant to an array that contains floating point numbers. � Program: ◦ reads the size of the array and the constant that will add to elements of the array, ◦ makes one array in an arbitrary way, and ◦ adds the constant to the array using the MAX card. 41/x

Example No. 4 Example 4 Kernel. java package ind. z 4; import com. maxeler.

Example No. 4 Example 4 Kernel. java package ind. z 4; import com. maxeler. maxcompiler. v 1. kernelcompiler. Kernel. Parameters; import com. maxeler. maxcompiler. v 1. kernelcompiler. types. base. HWVar; public class example 4 Kernel extends Kernel { public example 4 Kernel(Kernel. Parameters parameters) { super(parameters); // Input HWVar x = io. input("x", hw. Float(8, 24)); HWVar y = io. scalar. Input("y", hw. Float(8, 24)); HWVar result = x + y; // Output io. output("z", result, hw. Float(8, 24)); } } 42/x

Example No. 4 Other Modifications in Example 4 � example 4 Sim. Runner. java:

Example No. 4 Other Modifications in Example 4 � example 4 Sim. Runner. java: ◦ Before the kernel run, invoke: set. Scalar. Input(“y”, 2); � example 4 Host. Code. c: ◦ Read const from standard input, ◦ After the device is opened, but before run, set scalar inputs: max_set_scalar_input_f(device, “example 4 Kernel. y”, const_add, FPGA_A); max_upload_runtime_params(device, FPGA_A); 43/x

Example No. 5: Input/Output Control � Do the same as in example no 4,

Example No. 5: Input/Output Control � Do the same as in example no 4, with the following modification: use controlled inputs and counters. 44/x

Example No. 5 example 5 Kernel. java package ind. z 5; import com. maxeler.

Example No. 5 example 5 Kernel. java package ind. z 5; import com. maxeler. maxcompiler. v 1. kernelcompiler. Kernel. Parameters; import com. maxeler. maxcompiler. v 1. kernelcompiler. types. base. HWVar; public class example 5 Kernel extends Kernel { public example 5 Kernel(Kernel. Parameters parameters) { super(parameters); HWVar ie = control. count. simple. Counter(32); // Input HWVar x = io. input("x", hw. Float(8, 24)); HWVar y = io. input("y", hw. Float(8, 24), ie. eq(0)); HWVar result = x + y; } } // Output io. output("z", result, hw. Float(8, 24)); 45/x

Example No. 5 example 5 Sim. Runner. java package ind. z 5; import com.

Example No. 5 example 5 Sim. Runner. java package ind. z 5; import com. maxeler. maxcompiler. v 1. managers. standard. Simulation. Manager; public class example 5 Sim. Runner { public static void main(String[] args) { Simulation. Manager m = new Simulation. Manager("example 5 Sim"); example 5 Kernel k = new example 5 Kernel( m. make. Kernel. Parameters() ); m. set. Kernel(k); m. set. Input. Data("x", 1, 2, 3, 4, 5, 6, 7, 8); m. set. Input. Data("y", 2); m. set. Kernel. Cycles(8); m. run. Test(); m. dump. Output(); double expected. Output[] = { 3, 4, 5, 6, 7, 8, 9, 10 }; } } m. check. Output. Data("z", expected. Output); m. log. Msg("Test passed OK!"); 46/x

Example No. 5 example 5 Host. Code. c 1/2 #include <stdio. h> #include <stdlib.

Example No. 5 example 5 Host. Code. c 1/2 #include <stdio. h> #include <stdlib. h> #include <Max. Compiler. RT. h> int main(int argc, char* argv[]) { char *device_name = (argc==2 ? argv[1] : "/dev/maxeler 0"); max_maxfile_t* maxfile; max_device_handle_t* device; float *data_in 1, data_in 2[2], *data_out; unsigned long N, i; printf("Enter size of array: "); scanf("%lu%f", &N, data_in 2); data_in 1 = malloc(N * sizeof(float)); data_out = malloc(N * sizeof(float)); for(i = 0; i < N; i++) data_in 1[i] = i%10; printf("Opening and configuring FPGA. n"); maxfile = max_maxfile_init_example 5(); device = max_open_device(maxfile, device_name); max_set_terminate_on_error(device); 47/x

Example No. 5 example 5 Host. Code. c 1/2 printf("Streaming data to/from FPGA. .

Example No. 5 example 5 Host. Code. c 1/2 printf("Streaming data to/from FPGA. . . n"); max_run(device, max_input("x", data_in 1, N * sizeof(float)), max_input("y", data_in 2, 2 * sizeof(float)), max_output("z", data_out, N * sizeof(float)), max_runfor("example 5 Kernel", N), max_end()); printf("Checking data read from FPGA. n"); for(i = 0; i < N; i++){ if (data_out[i] != i%10 + data_in 2[0]){ printf("Error on element %d. Expected %f, but found %f. ", i, (float)(i%10+data_in 2[0]), data_out[i]); break; } } max_close_device(device); max_destroy(maxfile); } return 0; 48/x

Example No. 6: Conditional Execution � Translate the following part of code for the

Example No. 6: Conditional Execution � Translate the following part of code for the Maxeler MAX 2 card: for(int i=0; i<N; i++) if(a[i] != b[i]){ c[i] = b[i]-a[i]; d[i] = a[i]*b[i]/c[i]; }else { c[i] = a[i]; d[i] = a[i]+b[i]; } 49/x

Example No. 6 example 6 Kernel. java package ind. z 6; import com. maxeler.

Example No. 6 example 6 Kernel. java package ind. z 6; import com. maxeler. maxcompiler. v 1. kernelcompiler. Kernel. Parameters; import com. maxeler. maxcompiler. v 1. kernelcompiler. types. base. HWVar; public class example 6 Kernel extends Kernel { public example 6 Kernel(Kernel. Parameters parameters) { super(parameters); // Input HWVar a = io. input("a", hw. Float(8, 24)); HWVar b = io. input("b", hw. Float(8, 24)); HWVar c = ~a. eq(b)? b-a: a; HWVar d = ~a. eq(b)? a*b/c: a+b; // Output io. output("c", c, hw. Float(8, 24)); io. output("d", d, hw. Float(8, 24)); } } 50/x

Example No. 6 example 6 Sim. Runner. java package ind. z 6; import com.

Example No. 6 example 6 Sim. Runner. java package ind. z 6; import com. maxeler. maxcompiler. v 1. managers. standard. Simulation. Manager; public class example 6 Sim. Runner { public static void main(String[] args) { Simulation. Manager m = new Simulation. Manager("example 6 Sim"); example 6 Kernel k = new example 6 Kernel( m. make. Kernel. Parameters() ); m. set. Kernel(k); m. set. Input. Data("a", 1, 3); m. set. Input. Data("b", 2, 3); m. set. Kernel. Cycles(2); m. run. Test(); m. dump. Output(); double expected. Outputc[] = { 1, 3 }; double expected. Outputd[] = { 2, 6 }; } } m. check. Output. Data("c", expected. Outputc); m. check. Output. Data("d", expected. Outputd); m. log. Msg("Test passed OK!"); 51/x

Example No. 7: Moving Average 1 D � Write a program that calculates moving

Example No. 7: Moving Average 1 D � Write a program that calculates moving average over an array, calculating the average value for each one of the three successive elements of the input array. (a[0]+a[1])/2 , avg[i] = (a[i-1]+a[i+1])/3 , (a[n-2]+a[n-3], for i = 0; for 0 < i < n-1; for i = n-1. 52/x

Example No. 7 example 7 Kernel. java package ind. z 7; import com. maxeler.

Example No. 7 example 7 Kernel. java package ind. z 7; import com. maxeler. maxcompiler. v 1. kernelcompiler. Kernel. Parameters; import com. maxeler. maxcompiler. v 1. kernelcompiler. types. base. HWVar; public class example 7 Kernel extends Kernel { public example 7 Kernel(Kernel. Parameters parameters) { super(parameters); HWVar N = io. scalar. Input("N", hw. UInt(64)); HWVar count = control. count. simple. Counter(64); // Input HWVar x = io. input("x", hw. Float(8, 24)); HWVar result = ( (count>0? stream. offset(x, -1): 0) + x + (count<N-1? stream. offset(x, 1): 0) )/ (count>0&count<N-1? constant. var(hw. Float(8, 24), 3): 2); } } // Output io. output("z", result, hw. Float(8, 24)); 53/x

Example No. 8: Moving Average 2 D � Write a program that calculates moving

Example No. 8: Moving Average 2 D � Write a program that calculates moving average along a 2 D matrix of the size Mx. N. � Transfer the matrix to the MAX 2 card through one stream, row by row. 54/x

Example No. 8 example 8 Kernel. java 1/2 package ind. z 8; import import

Example No. 8 example 8 Kernel. java 1/2 package ind. z 8; import import com. maxeler. maxcompiler. v 1. kernelcompiler. Kernel; com. maxeler. maxcompiler. v 1. kernelcompiler. Kernel. Parameters; com. maxeler. maxcompiler. v 1. kernelcompiler. stdlib. core. Counter. Chain; com. maxeler. maxcompiler. v 1. kernelcompiler. stdlib. core. Stream. Offset. Expr; com. maxeler. maxcompiler. v 1. kernelcompiler. types. base. HWVar; public class example 8 Kernel extends Kernel { public example 8 Kernel(Kernel. Parameters parameters) { super(parameters); HWVar M = io. scalar. Input("M", hw. UInt(32)); Offset. Expr Nof = stream. make. Offset. Param("Nof", 3, 128); HWVar N = io. scalar. Input("N", hw. UInt(32)); Counter. Chain cc = control. count. make. Counter. Chain(); HWVar j = cc. add. Counter(M, 1); HWVar i = cc. add. Counter(N, 1); 55/x

Example No. 8 example 8 Kernel. java 2/2 // Input HWVar mat = io.

Example No. 8 example 8 Kernel. java 2/2 // Input HWVar mat = io. input("mat", hw. Float(8, 24)); // Extract 8 point window around current point HWVar window[] = new HWVar[9]; int ii = 0; for ( int x=-1; x<=1; x++) for ( int y= -1; y<=1; y++) window[ii++] = (i. cast(hw. Int(33))+x>=0 & i. cast(hw. Int(33))+x<= N. cast(hw. Int(33))-1 & j. cast(hw. Int(33))+y >= 0 & j. cast(hw. Int(33))+y<=M. cast(hw. Int(33))-1)? stream. offset(mat, y*Nof+x): 0; // Sum points in window and divide by 9 to average HWVar sum = constant. var(hw. Float(8, 24), 0); for ( HWVar hw. Var : window) { sum = sum + hw. Var; } HWVar divider = i. eq(0)|i. eq(N-1)|j. eq(0)|j. eq(M-1)? ((i. eq(0)|i. eq(N-1))&(j. eq(0)|j. eq(M 1))? constant. var(hw. Float(8, 24), 4): 6): 9; HWVar result = sum / divider; } } // Output io. output("z", result, hw. Float(8, 24)); 56/x

Example No. 8 example 8 Sim. Runner. java package ind. z 8; import com.

Example No. 8 example 8 Sim. Runner. java package ind. z 8; import com. maxeler. maxcompiler. v 1. managers. standard. Simulation. Manager; public class example 8 Sim. Runner { public static void main(String[] args) { Simulation. Manager m = new Simulation. Manager("example 8 Sim"); example 8 Kernel k = new example 8 Kernel( m. make. Kernel. Parameters() ); m. set. Kernel(k); m. set. Input. Data("mat", 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16); m. set. Scalar. Input("M", 4); m. set. Scalar. Input("N", 4); m. set. Stream. Offset. Param("Nof", 4); m. set. Kernel. Cycles(16); m. run. Test(); } } m. dump. Output(); double expected. Output[] = { 3. 5, 4, 5, 5. 5, 6, 7, 7. 5, 9. 5, 10, 11. 5, 12, 13. 5 }; m. check. Output. Data("z", expected. Output); m. log. Msg("Test passed OK!"); 57/x

Example No. 8 example 8 Host. Code. java 1/2 #include <stdio. h> #include <stdlib.

Example No. 8 example 8 Host. Code. java 1/2 #include <stdio. h> #include <stdlib. h> #include <Max. Compiler. RT. h> int main(int argc, char* argv[]) { char *device_name = (argc==2 ? argv[1] : "/dev/maxeler 0"); max_maxfile_t* maxfile; max_device_handle_t* device; float *data_in 1, *data_in 2, *data_out; unsigned long M, N, i; printf("Enter size of matrix (Mx. N, max 1024): "); scanf("%lu%lu", &M, &N); data_in 1 = malloc(M*N * sizeof(float)); data_out = malloc(M*N * sizeof(float)); for(i = 0; i < M*N; i++){ data_in 1[i] = i%10; } printf("Opening and configuring FPGA. n"); maxfile = max_maxfile_init_example 8(); device = max_open_device(maxfile, device_name); max_set_terminate_on_error(device); 58/x

Example No. 8 example 8 Host. Code. java 2/2 max_set_scalar_input_f(device, "example 8 Kernel. M",

Example No. 8 example 8 Host. Code. java 2/2 max_set_scalar_input_f(device, "example 8 Kernel. M", M, FPGA_A); max_set_scalar_input_f(device, "example 8 Kernel. N", N, FPGA_A); max_set_runtime_param(device, "example 8 Kernel. Nof", N); max_upload_runtime_params(device, FPGA_A); printf("Streaming data to/from FPGA. . . n"); max_run(device, max_input("mat", data_in 1, M*N * sizeof(float)), max_output("z", data_out, M*N * sizeof(float)), max_runfor("example 8 Kernel", M*N), max_end()); printf("Checking data read from FPGA. n"); } for(i = 0; i < M*N; i++){ float expected=0, divider = 9; for (int ii = -1; ii<2; ii++) for(int jj = -1; jj<2; jj++) expected += i/N+ii>=0 && i/N+ii<M && i%N+jj>=0 && i%N+jj<N ? data_in 1[i+ii*N+jj]: 0; if (i/N==0 || i/N==M-1) divider = 6; if (i%N==0 || i%N==N-1) divider == 6? 4: 6; expected /= divider; if (data_out[i] != expected){ printf("Error on element %d. Expected %f, but found %f. ", i, expected, data_out[i]); break; } } max_close_device(device); max_destroy(maxfile); return 0; 59/x

Example No. 9: Array summation � Write a program that calculates the sum of

Example No. 9: Array summation � Write a program that calculates the sum of n floating point numbers. 60/x

Example No. 9 example 9 Kernel. java, try #1 package ind. z 9; import

Example No. 9 example 9 Kernel. java, try #1 package ind. z 9; import com. maxeler. maxcompiler. v 1. kernelcompiler. Kernel. Parameters; import com. maxeler. maxcompiler. v 1. kernelcompiler. types. base. HWVar; import com. maxeler. maxcompiler. v 1. kernelcompiler. types. base. HWType; public class example 9 Kernel extends Kernel { public example 9 Kernel(Kernel. Parameters parameters) { super(parameters); final HWType scalar. Type = hw. Float(8, 24); HWVar cnt = control. count. simple. Counter(64); // Input Problem? HWVar x = io. input("x", hw. Float(8, 24)); HWVar sum = scalar. Type. new. Instance(this); HWVar result = x + (cnt>0? sum: 0. 0); sum <== stream. offset(result, -1); } } // Output io. output("z", result, hw. Float(8, 24)); 61/x

Example No. 9 Graph of Dataflow for Summation 62/x

Example No. 9 Graph of Dataflow for Summation 62/x

Example No. 9 example 9 Kernel. java #2 package ind. z 9; import com.

Example No. 9 example 9 Kernel. java #2 package ind. z 9; import com. maxeler. maxcompiler. v 1. kernelcompiler. Kernel. Parameters; import com. maxeler. maxcompiler. v 1. kernelcompiler. types. base. HWVar; import com. maxeler. maxcompiler. v 1. kernelcompiler. types. base. HWType; public class example 9 Kernel extends Kernel { public example 9 Kernel(Kernel. Parameters parameters) { super(parameters); final HWType scalar. Type = hw. Float(8, 24); HWVar cnt = control. count. simple. Counter(64); // Input HWVar x = io. input("x", hw. Float(8, 24)); HWVar sum = scalar. Type. new. Instance(this); HWVar result = x + (cnt>12? sum: 0. 0); Solution: New offset = Depth of pipeline loop sum <== stream. offset(result, -13); } } // Output io. output("z", result, hw. Float(8, 24)); 63/x

Example No. 9 example 9 Sim. Runner. java package ind. z 9; import com.

Example No. 9 example 9 Sim. Runner. java package ind. z 9; import com. maxeler. maxcompiler. v 1. managers. standard. Simulation. Manager; public class example 9 Sim. Runner { public static void main(String[] args) { Simulation. Manager m = new Simulation. Manager("example 9 Sim"); example 9 Kernel k = new example 9 Kernel( m. make. Kernel. Parameters() ); #2 Still, we need to send 13 times mor data then needed m. set. Kernel(k); m. set. Input. Data("x", 1, 0, 0, 0, 3 , 0, 0, 0, 9 , 0, 0, 2 , 0, 0, 0, 3); m. set. Kernel. Cycles(27); m. run. Test(); 12 unnecessarily data m. dump. Output(); double expected. Output[] = { 1, 3, 6 }; } } m. check. Output. Data("z", expected. Output); m. log. Msg("Test passed OK!"); 64/x

Example No. 9 example 9 Host. Code. c 1/2 #2 #include <stdio. h> #include

Example No. 9 example 9 Host. Code. c 1/2 #2 #include <stdio. h> #include <stdlib. h> #include <Max. Compiler. RT. h> int main(int argc, char* argv[]) { char *device_name = (argc==2 ? argv[1] : "/dev/maxeler 0"); max_maxfile_t* maxfile; max_device_handle_t* device; float *data_in 1, *data_out, expected = 0; unsigned long N, i; printf("Enter size of array: "); scanf("%lu", &N); data_in 1 = malloc(N * 13 * sizeof(float)); data_out = malloc(N * 13 * sizeof(float)); for(i = 0; i < N; i++) for( int j=0; j<13; j++) data_in 1[13*i+j] = i%10; printf("Opening and configuring FPGA. n"); 65/x

Example No. 9 example 9 Host. Code. c 2/2 #2 maxfile = max_maxfile_init_example 9();

Example No. 9 example 9 Host. Code. c 2/2 #2 maxfile = max_maxfile_init_example 9(); device = max_open_device(maxfile, device_name); max_set_terminate_on_error(device); printf("Streaming data to/from FPGA. . . n"); max_run(device, max_input("x", data_in 1, N * 13 * sizeof(float)), max_output("z", data_out, N * 13* sizeof(float)), max_runfor("example 9 Kernel", N * 13), max_end()); printf("Checking data read from FPGA. n"); for(i = 0; i < N; i++){ expected += !(i%13) ? i%10 : 0; if (data_out[i] != expected){ printf("Error on element %d. Expected %f, but found %f. ", i, expected, data_out[i]); break; } } max_close_device(device); max_destroy(maxfile); } return 0; 66/x

Example No. 9 example 9 Kernel. java #3 package ind. z 9; import com.

Example No. 9 example 9 Kernel. java #3 package ind. z 9; import com. maxeler. maxcompiler. v 1. kernelcompiler. Kernel. Parameters; import com. maxeler. maxcompiler. v 1. kernelcompiler. types. base. HWVar; import com. maxeler. maxcompiler. v 1. kernelcompiler. types. base. HWType; import com. maxeler. maxcompiler. v 1. kernelcompiler. stdlib. core. Counter. Chain; public class example 9 Kernel extends Kernel { public example 9 Kernel(Kernel. Parameters parameters) { super(parameters); final HWType scalar. Type = hw. Float(8, 24); Counter. Chain cc = control. count. make. Counter. Chain(); HWVar cnt = cc. add. Counter(1000000, 1); HWVar depth = cc. add. Counter(13, 1); // Input HWVar x = io. input("x", hw. Float(8, 24), depth. eq(0) ); HWVar sum = scalar. Type. new. Instance(this); HWVar result = x + (cnt>0? sum: 0. 0); sum <== stream. offset(result, -13); } } // Output io. output("z", result, hw. Float(8, 24), depth. eq(0)); 67/x

Example No. 9 example 9 Sim. Runner. java #3 package ind. z 9; import

Example No. 9 example 9 Sim. Runner. java #3 package ind. z 9; import com. maxeler. maxcompiler. v 1. managers. standard. Simulation. Manager; public class example 9 Sim. Runner { public static void main(String[] args) { Simulation. Manager m = new Simulation. Manager("example 9 Sim"); example 9 Kernel k = new example 9 Kernel( m. make. Kernel. Parameters() ); m. set. Kernel(k); m. set. Input. Data("x", 1, 2 , 3); m. set. Kernel. Cycles(27); We still need at least 27 cycles. m. run. Test(); m. dump. Output(); double expected. Output[] = { 1, 3, 6 }; } } m. check. Output. Data("z", expected. Output); m. log. Msg("Test passed OK!"); 68/x

Example No. 9 example 9 Host. Code. c 1/2 #3 #include <stdio. h> #include

Example No. 9 example 9 Host. Code. c 1/2 #3 #include <stdio. h> #include <stdlib. h> #include <Max. Compiler. RT. h> int main(int argc, char* argv[]) { char *device_name = (argc==2 ? argv[1] : "/dev/maxeler 0"); max_maxfile_t* maxfile; max_device_handle_t* device; float *data_in 1, *data_out, expected = 0; unsigned long N, i; printf("Enter size of array: "); scanf("%lu", &N); data_in 1 = malloc(N * sizeof(float)); data_out = malloc(N * sizeof(float)); for(i = 0; i < N; i++) data_in 1[i] = i%10; printf("Opening and configuring FPGA. n"); 69/x

Example No. 9 example 9 Host. Code. c 2/2 #3 maxfile = max_maxfile_init_example 9();

Example No. 9 example 9 Host. Code. c 2/2 #3 maxfile = max_maxfile_init_example 9(); device = max_open_device(maxfile, device_name); max_set_terminate_on_error(device); printf("Streaming data to/from FPGA. . . n"); max_run(device, max_input("x", data_in 1, N * sizeof(float)), max_output("z", data_out, N * sizeof(float)), max_runfor("example 9 Kernel", N * 13 - 12), max_end()); printf("Checking data read from FPGA. n"); for(i = 0; i < N; i++){ expected += i%10; if (data_out[i] != expected){ printf("Error on element %d. Expected %f, but found %f. ", i, expected, data_out[i]); break; } } max_close_device(device); max_destroy(maxfile); } return 0; 70/x

Example No. 10: Optimized Array Summation Example No. 10 � Write an optimized program

Example No. 10: Optimized Array Summation Example No. 10 � Write an optimized program that calculates the sum of numbers in an input array � First, calculate several parallel/partial sums; then, add them at the end 71/x

Example No. 10 example 10 Kernel 1. java package ind. z 10; import com.

Example No. 10 example 10 Kernel 1. java package ind. z 10; import com. maxeler. maxcompiler. v 1. kernelcompiler. Kernel; com. maxeler. maxcompiler. v 1. kernelcompiler. Kernel. Parameters; com. maxeler. maxcompiler. v 1. kernelcompiler. types. base. HWVar; com. maxeler. maxcompiler. v 1. kernelcompiler. types. base. HWType; public class example 10 Kernel 1 extends Kernel { public example 10 Kernel 1(Kernel. Parameters parameters) { super(parameters); final HWType scalar. Type = hw. Float(8, 24); HWVar cnt = control. count. simple. Counter(64); // Input HWVar N = io. scalar. Input("N", hw. UInt(64)); HWVar x = io. input("x", hw. Float(8, 24) ); HWVar sum = scalar. Type. new. Instance(this); HWVar result = x + (cnt>0? sum: 0. 0); sum <== stream. offset(result, -13); } } // Output io. output("z", result, hw. Float(8, 24), cnt > N-14); 72/x

Example No. 10 example 10 Kernel 2. java package ind. z 10; import com.

Example No. 10 example 10 Kernel 2. java package ind. z 10; import com. maxeler. maxcompiler. v 1. kernelcompiler. Kernel. Parameters; import com. maxeler. maxcompiler. v 1. kernelcompiler. types. base. HWVar; import com. maxeler. maxcompiler. v 1. kernelcompiler. types. base. HWType; import com. maxeler. maxcompiler. v 1. kernelcompiler. stdlib. core. Counter. Chain; public class example 10 Kernel 2 extends Kernel { public example 10 Kernel 2(Kernel. Parameters parameters) { super(parameters); final HWType scalar. Type = hw. Float(8, 24); Counter. Chain cc = control. count. make. Counter. Chain(); HWVar cnt = cc. add. Counter(14, 1); HWVar depth = cc. add. Counter(13, 1); // Input HWVar x = io. input("x", hw. Float(8, 24), depth. eq(0) ); HWVar sum = scalar. Type. new. Instance(this); HWVar result = x + (cnt>0? sum: 0. 0); sum <== stream. offset(result, -13); } } // Output io. output("z", result, hw. Float(8, 24), cnt. eq(12)); 73/x

Example No. 10 example 10 Sim. Runner. java package ind. z 10; import com.

Example No. 10 example 10 Sim. Runner. java package ind. z 10; import com. maxeler. maxcompiler. v 1. managers. standard. Simulation. Manager; public class example 10 Sim. Runner { public static void main(String[] args) { Simulation. Manager m = new Simulation. Manager("example 10 Sim"); example 10 Kernel 1 k = new example 10 Kernel 1( m. make. Kernel. Parameters() ); m. set. Kernel(k); m. set. Input. Data("x", 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26); m. set. Kernel. Cycles(26); m. run. Test(); m. dump. Output(); double ex. Output[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39 }; } } m. check. Output. Data("z", ex. Output); m. log. Msg("Test passed OK!"); 74/x

Example No. 10 example 10 Manager. java package ind. z 10; import com. maxeler.

Example No. 10 example 10 Manager. java package ind. z 10; import com. maxeler. maxcompiler. v 1. managers. custom. blocks. Kernel. Block; import com. maxeler. maxcompiler. v 1. managers. custom. Custom. Manager; import com. maxeler. maxcompiler. v 1. managers. MAXBoard. Model; class example 10 Manager extends Custom. Manager { public example 10 Manager(boolean is_simulation, String name, MAXBoard. Model board_model ){ super(is_simulation, board_model, name); Kernel. Block kb 1 = add. Kernel(new example 10 Kernel 1(make. Kernel. Parameters("example 10 Kernel 1"))); Kernel. Block kb 2 = add. Kernel(new example 10 Kernel 2(make. Kernel. Parameters("example 10 Kernel 2"))); } } kb 1. get. Input("x") <== add. Stream. From. Host("x"); kb 2. get. Input("x") <== kb 1. get. Output("z"); add. Stream. To. Host("z") <== kb 2. get. Output("z"); 75/x

Example No. 10 example 10 Host. Sim. Builder. java package ind. z 10; import

Example No. 10 example 10 Host. Sim. Builder. java package ind. z 10; import static config. Board. Model. BOARDMODEL; import com. maxeler. maxcompiler. v 1. managers. Build. Config. Level; public class example 10 Host. Sim. Builder { public static void main(String[] args) { example 10 Manager m = new example 10 Manager(true, "example 10 Host. Sim", BOARDMODEL); m. set. Build. Config(new Build. Config(Level. FULL_BUILD)); } } m. build(); 76/x

Example No. 10 example 10 HWBuilder. java package ind. z 10; import static config.

Example No. 10 example 10 HWBuilder. java package ind. z 10; import static config. Board. Model. BOARDMODEL; import com. maxeler. maxcompiler. v 1. kernelcompiler. Kernel; import com. maxeler. maxcompiler. v 1. managers. standard. Manager. IOType; public class example 10 HWBuilder { public static void main(String[] args) { example 10 Manager m = new example 10 Manager(false, "example 10 Host. Sim", BOARDMODEL); m. set. Build. Config(new Build. Config(Level. FULL_BUILD)); } } m. build(); 77/x

Example No. 10 example 10 Host. Code. c 1/2 #include <stdio. h> #include <stdlib.

Example No. 10 example 10 Host. Code. c 1/2 #include <stdio. h> #include <stdlib. h> #include <Max. Compiler. RT. h> int main(int argc, char* argv[]) { char *device_name = (argc==2 ? argv[1] : "/dev/maxeler 0"); max_maxfile_t* maxfile; max_device_handle_t* device; float *data_in 1, *data_out, expected = 0; unsigned long N, i; printf("Enter size of array (it will be truncated to the firs lower number dividable with 13): "); scanf("%lu", &N); N /= 13; N *= 13; data_in 1 = malloc(N * sizeof(float)); data_out = malloc(1 * sizeof(float)); for(i = 0; i < N; i++){ data_in 1[i] = i%10; expected += data_in 1[i]; } 78/x

Example No. 10 example 10 Host. Code. c 2/2 printf("Opening and configuring FPGA. n");

Example No. 10 example 10 Host. Code. c 2/2 printf("Opening and configuring FPGA. n"); maxfile = max_maxfile_init_example 10(); device = max_open_device(maxfile, device_name); max_set_terminate_on_error(device); max_set_scalar_input_f(device, "example 10 Kernel 1. N", N, FPGA_A); max_upload_runtime_params(device, FPGA_A); printf("Streaming data to/from FPGA. . . n"); max_run(device, max_input("x", data_in 1, N * sizeof(float)), max_output("z", data_out, 2 * sizeof(float)), max_runfor("example 10 Kernel 1", N), max_runfor("example 10 Kernel 2", 13*12+2), max_end()); printf("Checking data read from FPGA. n"); printf("Expected: %f, returned: %fn", expected, *data_out); max_close_device(device); max_destroy(maxfile); } return 0; 79/x

Example No. 11 � Making a custom manager for a simple example 80/x

Example No. 11 � Making a custom manager for a simple example 80/x

Example No. 12 � Asynchronous starting 81/x

Example No. 12 � Asynchronous starting 81/x

Example No. 12 example 12 Host. Code. c printf("Streaming data to/from FPGA. . .

Example No. 12 example 12 Host. Code. c printf("Streaming data to/from FPGA. . . n"); to. Card = max_get_pcie_stream(device, "x"); from. Card = max_get_pcie_stream(device, "z"); max_kernel_set_cycles(device, "example 11 Kernel", 16, FPGA_A); max_reset_device(device); max_queue_pcie_stream(device, to. Card, data_in 1, 16, 0); max_queue_pcie_stream(device, from. Card, data_out, 16, 1); //do other calculations: . . . //wait for data from card max_sync_pcie_stream(device, from. Card); printf("Checking data read from FPGA. n"); 82/x

E � Access to DRAM memory from a host (da li moze paralelno da

E � Access to DRAM memory from a host (da li moze paralelno da se ucitava u memoriju i da se salje ka kernelu …. ) 83/x

Example No. 14 � Storing data to memory and data processing directly from the

Example No. 14 � Storing data to memory and data processing directly from the memory 84/x

Example No. 14 example 14 Kernel. java package ind. z 14; import com. maxeler.

Example No. 14 example 14 Kernel. java package ind. z 14; import com. maxeler. maxcompiler. v 1. kernelcompiler. Kernel. Parameters; import com. maxeler. maxcompiler. v 1. kernelcompiler. types. base. HWVar; public class example 14 Kernel extends Kernel { public example 14 Kernel(Kernel. Parameters parameters) { super(parameters); // Input HWVar x = io. input("x", hw. Float(8, 24)); HWVar y = io. input("y", hw. Float(8, 24)); HWVar result = x + y; // Output io. output("z", result, hw. Float(8, 24)); } } 85/x

Example No. 14 example 14 Manager. java 1/2 package ind. z 14; import com.

Example No. 14 example 14 Manager. java 1/2 package ind. z 14; import com. maxeler. maxcompiler. v 1. managers. custom. blocks. Kernel. Block; import com. maxeler. maxcompiler. v 1. managers. custom. Custom. Manager; import com. maxeler. maxcompiler. v 1. managers. MAXBoard. Model; import com. maxeler. maxcompiler. v 1. managers. custom. Stream; import com. maxeler. maxcompiler. v 1. managers. custom. stdlib. Memory. Control. Group; class example 14 Manager extends Custom. Manager { public example 14 Manager(boolean is_simulation, String name, MAXBoard. Model board_model ){ super(is_simulation, board_model, name); Kernel. Block kb = add. Kernel(new example 14 Kernel(make. Kernel. Parameters("kernel 14"))); Memory. Control. Group mg. To. Mem = add. Memory. Control. Group("mg. To. Mem", Memory. Control. Group. Memory. Access. Pattern. LINEAR_1 D); Memory. Control. Group mg. From. Mem = add. Memory. Control. Group("mg. From. Mem", Memory. Control. Group. Memory. Access. Pattern. LINEAR_1 D); Stream from. Mem = add. Stream. From. On. Card. Memory("from. Mem", mg. From. Mem); Stream to. Mem = add. Stream. To. On. Card. Memory("to. Mem", mg. To. Mem); 86/x

Example No. 14 example 14 Manager. java 1/2 to. Mem <== add. Stream. From.

Example No. 14 example 14 Manager. java 1/2 to. Mem <== add. Stream. From. Host("x"); add. Stream. To. Host("z") <== from. Mem; Memory. Control. Group mg = add. Memory. Control. Group("mg", Memory. Control. Group. Memory. Access. Pattern. LINEAR_1 D); Stream to. Ker 1 = add. Stream. From. On. Card. Memory("to. Ker 1", mg); Stream to. Ker 2 = add. Stream. From. On. Card. Memory("to. Ker 2", mg); Stream from. Ker = add. Stream. To. On. Card. Memory("from. Ker", mg); kb. get. Input("x") <== to. Ker 1; kb. get. Input("y") <== to. Ker 2; from. Ker <== kb. get. Output("z"); } } 87/x

Example No. 14 example 14 Host. Code. c 1/5 void setup_memory_stream_host( max_device_handle_t* md, max_maxfile_t*

Example No. 14 example 14 Host. Code. c 1/5 void setup_memory_stream_host( max_device_handle_t* md, max_maxfile_t* mf, long bs, char* strcmdn, //mg name char* strn, //stream name unsigned int addr, int len, intr ){ struct max_memory_setting *mms = new_max_memory_setting(mf); max_memory_stream_set_start_address(mms, strn, addr/bs); max_memory_stream_set_access_pattern_linear 1 d(mms, strcmdn, len/bs, 0, len/bs); max_memory_stream_set_enable(mms, strn, 1); if (intr) max_memory_stream_interrupt_on(mms, strn, NULL); } max_memory_commit_setting(md, mms, FPGA_A); delete_max_memory_setting(mms); 88/x

Example No. 14 example 14 Host. Code. c 2/5 void setup_memory_stream_kernel(max_device_handle_t* md, max_maxfile_t* mf,

Example No. 14 example 14 Host. Code. c 2/5 void setup_memory_stream_kernel(max_device_handle_t* md, max_maxfile_t* mf, long bs, char* strcmdn, //mg name char* strn 1, //stream name char* strn 2, //stream name char* strn 3, //stream name unsigned int addr, int len ){ struct max_memory_setting *mms = new_max_memory_setting(mf); max_memory_stream_set_start_address(mms, strn 1, addr/bs); max_memory_stream_set_start_address(mms, strn 2, (addr+len)/bs); max_memory_stream_set_start_address(mms, strn 3, (addr+2*len)/bs); max_memory_stream_set_access_pattern_linear 1 d(mms, strcmdn, len/bs, 0, len/bs); max_memory_stream_set_enable(mms, strn 1, 1); max_memory_stream_set_enable(mms, strn 2, 1); max_memory_stream_set_enable(mms, strn 3, 1); max_memory_stream_interrupt_on(mms, strn 3, NULL); } max_memory_commit_setting(md, mms, FPGA_A); delete_max_memory_setting(mms); 89/x

Example No. 14 example 14 Host. Code. c 3/5 int main(int argc, char* argv[]){

Example No. 14 example 14 Host. Code. c 3/5 int main(int argc, char* argv[]){ char *device_name = (argc==2 ? argv[1] : "/dev/maxeler 0"); max_maxfile_t* maxfile; max_device_handle_t* device; max_stream_handle_t *to. Card, *from. Card; float *data_in, *data_out, expected = 0; unsigned long N, i; long bs; printf("Enter size of array: "); scanf("%lu", &N); printf("Opening and configuring FPGA. n"); maxfile = max_maxfile_init_example 14(); bs = max_group_burst_length(maxfile, "mg. From. Mem"); if ( bs != max_group_burst_length(maxfile, "mg. To. Mem")) { printf("Example is intended for mg controllers with same burst sizes"); exit(1); } if ( bs != max_group_burst_length(maxfile, "mg")) { printf("Example is intended for mg controllers with same burst sizes"); exit(1); } 90/x

Example No. 14 example 14 Host. Code. c 4/5 N = (N / (2*bs))

Example No. 14 example 14 Host. Code. c 4/5 N = (N / (2*bs)) * 2 * bs; data_in = malloc(N * sizeof(float)); data_out = malloc(N * sizeof(float)); for(i = 0; i < N; i++) data_in[i] = i%10; device = max_open_device(maxfile, device_name); max_set_terminate_on_error(device); printf("Streaming data to the on card memory. . . n"); to. Card = max_get_pcie_stream(device, "x"); from. Card = max_get_pcie_stream(device, "z"); setup_memory_stream_host(device, maxfile, bs, "mg. To. Mem", "to. Mem", 0, N*sizeof(float), 1); max_reset_device(device); max_queue_pcie_stream(device, to. Card, data_in, N * sizeof(float), 1); max_wait_for_interrupt(device, FPGA_A); printf("Streaming data from the on card memory. . . n"); setup_memory_stream_kernel(device, maxfile, bs, "mg", "to. Ker 1", "to. Ker 2", "from. Ker", 0, N/2*sizeof(float)); max_kernel_set_cycles(device, "kernel 14", N/2, FPGA_A); max_reset_device(device); 91/x

Example No. 14 example 14 Host. Code. c 5/5 printf("Processing. . . n"); //here

Example No. 14 example 14 Host. Code. c 5/5 printf("Processing. . . n"); //here we can do something useful max_wait_for_interrupt(device, FPGA_A); setup_memory_stream_host(device, maxfile, bs, "mg. From. Mem", "from. Mem", N*sizeof(float), N/2*sizeof(float), 1); max_reset_device(device); max_queue_pcie_stream(device, from. Card, data_out, N/2 * sizeof(float), 1); max_sync_pcie_stream(device, from. Card); printf("Checking data read from FPGA. n"); for(i = 0; i < N/2; i++){ expected = i%10 + (i+N/2) %10; if (data_out[i] != expected){ printf("Error on element %d. Expected %f, but found %f. ", i, expected, data_out[i]); break; } } max_close_device(device); max_destroy(maxfile); } return 0; 92/x

E � Strides: Block access 93/x

E � Strides: Block access 93/x

E � Block RAM 94/x

E � Block RAM 94/x

E � Multipipe and an example 95/x

E � Multipipe and an example 95/x

Instead of the Conclusion � Who? � Why? � What? � Where? � When?

Instead of the Conclusion � Who? � Why? � What? � Where? � When? � Whom? � Whaw? 96/x