Java Flowpaths Efficiently Generating Circuits for Embedded Systems
Java Flowpaths: Efficiently Generating Circuits for Embedded Systems from Java World. Comp ESA 2006 Las Vegas, Nevada EXCERPT Darrin Hanna, Michael Du. Chene, Girma Tewolde, Jay Sattler Oakland University, Rochester, Michigan Kettering University, Flint, Michigan June 27, 2006
Overview • • • Motivation Background Some examples Class Instantiation in Flowpaths Implementing Parallel Flowpaths Results
Background • Flowpaths – a type of SPP Generated using a Particular Method for Translating Stack-based Programs Directly to FPGAs – Java Byte Codes (Stack-based IR) – Forth • Words as Flowpaths: Ops, Connections, and State Machines • Converting Flowpaths to VHDL – Euclid’s Greatest Common Divisor Algorithm • Sieve of Eratosthenes: A performance comparison
Flowpaths – a type of SPP Generated using a Particular Method for Translating Stack-based Programs Directly to FPGAs Executing the algorithm as an SPP without a Microprocessor
Java Byte Codes as Flowpaths: Ops, Connections, and State Machines FRAME Local Variable Array Operand Stack • Each thread has a JVM stack that stores frames • A frame is created each time a method is invoked Constant Pool from Class invoking the method
Java Byte Codes as Flowpaths: Ops, Connections, and State Machines Flowpath LOAD-EXECUTE-STORE STACK MANIPULATION FRAME Datapath Controller OP OP OP MUX OP Local Variable Array Operand Stack Local Variables Operand Stack
Java Byte Codes as Flowpaths: Ops, Connections, and State Machines TRADITIONAL LOAD-EXECUTE-STORE STACK MANIPULATION … iload_1 iload_2 iadd istore_1 … 4 clock cycles FLOWPATHS • Operations – Sequential • isub, iadd, etc… • Data Manipulation – Connections • iload, istore • ZERO clock cycles Only 1 clock cycle
Converting Flowpaths to VHDL Euclid’s GCD Algorithm:
Converting Flowpaths to VHDL Euclid’s GCD Algorithm: Methods that implements each variable as a register results in over-crowded routing
Converting Flowpaths to VHDL Euclid’s GCD Algorithm
Sieve of Eratosthenes
Sieve of Eratosthenes A circuit and state machine developed “by hand” observing the behavior of the algorithm Serves as an optimal implementation
Sieve of Eratosthenes Experiments using a Xilinx Spartan IIE FPGA-VHDL (hand implementation) took 233 Slices Flowpath took 295 Slices
Experimental Results Quick Sort algorithm sorting 4096 random numbers
Experimental Results Genetic Algorithm - population size of 50, probability of mutation 10%, and a probability of cross-over 20% run for 10 generations
Multi-threaded Experimental Results (Parallel) Pentium 4 PC Module Prod/Cons Test Clock Cycles (rounded) 314, 400, 000 Producer 1 145, 000 Producer 2 1, 926, 000 Consumer 9, 600, 000 JStamp 121, 200 clock cycles (Microcontroller that executes Java bytecode directly) ~20, 000 gates The Producer/Consumer Test took 40 clock cycles in the Flowpath!
Conclusion • Hardware can be generated directly from Java programs using Flowpaths • There are enormous performance benefits to using Flowpaths instead of a JVM on a microprocessor • Parallel algorithms with or without shared resources can easily be developed. These will truly execute in parallel, in the hardware sense
Oakland University Rochester, Michigan June 27, 2006 Thank You! Darrin Hanna, Michael Du. Chene, Girma Tewolde, Jay Sattler
- Slides: 18