An Implementation of the Discrete Fourier Transform on

  • Slides: 20
Download presentation
An Implementation of the Discrete Fourier Transform on a Reconfigurable Processor By Michael J.

An Implementation of the Discrete Fourier Transform on a Reconfigurable Processor By Michael J. White 1, 2* and Clay Gloster, Jr. , Ph. D. , P. E. 1 1 Department of Electrical & Computer Engineering Howard University 2300 Sixth Street, NW Washington, DC 20059 2 NASA/ Goddard Space Flight Center Code 564 Greenbelt, MD 20771 Michael. J. White@nasa. gov, cgloster@howard. edu *Member, AIAA MAPLD Conference Washington, DC September 9 -11, 2003 White and Gloster 1 P 74

Outline of the Presentation • Introduction • The Discrete Fourier Transform (DFT) • A

Outline of the Presentation • Introduction • The Discrete Fourier Transform (DFT) • A Sample Reconfigurable Processor • A Floating Point DFT Core • Experimental Results • Conclusions and Future Work White and Gloster 2 P 74

Introduction • A reconfigurable computing (RC) system is a hardware/software data processing system that

Introduction • A reconfigurable computing (RC) system is a hardware/software data processing system that combines the flexibility of a general purpose processors with the speed of application specific processors. • Several applications have been mapped onto RC systems demonstrating an order of magnitude speedup over existing solutions running on a general purpose processor. • In the past, RC systems contained very limited hardware resources. As a result, few complex applications, i. e. floating point arithmetic, could benefit from the potential speedup offered by RC systems. • To the knowledge of the authors, few have published papers on implementing the DFT on a Field Programmable Gate Array(FPGA) using floating point arithmetic. White and Gloster 3 P 74

Motivation • At Goddard, there is an interest in control algorithms, that in part

Motivation • At Goddard, there is an interest in control algorithms, that in part use the DFT. • These algorithm should not be constrained to require the input data to be of size 2^n. • The goal is to be able to process a 512 x 512 floating point array in 0. 01 seconds. White and Gloster 4 P 74

Problem Statement • Given: A software implementation of the DFT • Find: An RC

Problem Statement • Given: A software implementation of the DFT • Find: An RC system implementation of the DFT that uses floating point arithmetic such that it : 1) fits on a single FPGA 2) can handle on the order of 1000 points 3) execute the DFT significantly faster than the software implementation 4) can compute a 2 D DFT more efficiently, i. e. compute the 2 D DFT of a 512 x 512 array in 0. 01 seconds White and Gloster 5 P 74

The Discrete Fourier Transform (DFT) The Discrete Fourier Transform(DFT) is defined as: X(k) =

The Discrete Fourier Transform (DFT) The Discrete Fourier Transform(DFT) is defined as: X(k) = Σ c(n)*exp(-j*2*π*n*k/N) where: » c is the complex input sample » N is the total number of input samples » c(n) is the nth input sample » X(k) is the kth output sample White and Gloster 6 P 74

A Sample Reconfigurable Processor PECORE(FPGA) To Output To Input Memory Control Unit Data Unit

A Sample Reconfigurable Processor PECORE(FPGA) To Output To Input Memory Control Unit Data Unit Memory DFT Function Core White and Gloster 7 P 74

Function Core - Has one or more 32 -bit inputs - Has Simple Control

Function Core - Has one or more 32 -bit inputs - Has Simple Control - Perform floating point vector operations. - Can be built using other Fun. Cores. White and Gloster 8 P 74

DATA and CONTROL UNIT • DATA UNIT • Contains a register file (8 32

DATA and CONTROL UNIT • DATA UNIT • Contains a register file (8 32 -bit registers) and counters for determining when vector instructions are complete. • • Contains several memory address registers/counters for indexing through input/output vectors. Contains up to 7 Function Cores White and Gloster • CONTROL UNIT • Manages memory read/write transactions. • Initiates instruction fetch/decode/execution • Determines when instruction processing is complete and turns control back over to the Host/Memory Interface. • One controller handles processing for all hardware modules/instructions 9 P 74

DFT Floating Point Core INPUTS XREALIN XIMAGIN K DFT/IDFT OUTPUTS 32 32 32 10

DFT Floating Point Core INPUTS XREALIN XIMAGIN K DFT/IDFT OUTPUTS 32 32 32 10 32 XREALOUT 32 XIMAGOUT DFT READYTOEMPTY DONE ENABLE EMPTY –Xrealin/Ximagin are real and imaginary inputs –K output index –DFT/IDFT flag is – 1 for DFT or 1 for Inverse DFT –Enable tells the FPGA to begin processing –Empty tells the FPGA the input buffer is empty White and Gloster –Xrealout/Ximagout are real and imaginary outputs. –Readytoempty says FPGA processing completed –Done tells the pipeline has been “flushed” and all outputs are in the buffer. 10 P 74

The DFT Core Block Diagram XREALIN XIMAGIN N ENABLE K 10 10 THETA UNIT

The DFT Core Block Diagram XREALIN XIMAGIN N ENABLE K 10 10 THETA UNIT * Xr * 32 Xi ADDRESS SINθ Yr 32 Yi 32 32 DONE COMPLEX ACCUMLATOR REALOUT White and Gloster 32 COSθ COMPLEX MULTIPLY DFT EMPTY SIN/COS TABLE 32 SELECT 10 32 32 11 IMAGOUT P 74

Complex Multiply Xr COS θ Xi Xi * COS θ SIN θ Xr *

Complex Multiply Xr COS θ Xi Xi * COS θ SIN θ Xr * * SIN θ * DFT Select DFT Delay * Xr. COSθ Xi. SINθ Xr. SINθ * SIGOUT 0 White and Gloster * Xi. COSθ * Select SIGOUT 1 12 P 74

Theta and Sin/Cos Units A counter is used to generate n In executing the

Theta and Sin/Cos Units A counter is used to generate n In executing the DFT, K(output index is given), that is to say we know what frequency component we to examine. Counter K n 10 10 THETA UNIT ADDRESS 10 SIN/COS TABLE SINθ White and Gloster 32 COSθ 13 32 P 74

Complex Accumulator Yr 32 Yi 32 IMAGINARY REAL ACCUMULATOR COMPLEX ACCUMULATOR 32 White and

Complex Accumulator Yr 32 Yi 32 IMAGINARY REAL ACCUMULATOR COMPLEX ACCUMULATOR 32 White and Gloster REALOUT 32 14 IMAGOUT P 74

Experimental Setup • VHDL Modeling and Simulation • Logic Synthesis • Place and Route

Experimental Setup • VHDL Modeling and Simulation • Logic Synthesis • Place and Route • Execute on FPGA White and Gloster 15 P 74

FPGA Runtime Environment RC System General Purpose Processor FPGA Board Interpreter Session File White

FPGA Runtime Environment RC System General Purpose Processor FPGA Board Interpreter Session File White and Gloster Definition File 16 P 74

Output of DFT FPGA and Simulation The graph shows the outputs of a 10

Output of DFT FPGA and Simulation The graph shows the outputs of a 10 pt floating point DFT ran on the FPGA and the output of a 10 pt DFT ran on a commercially simulation tool. White and Gloster 17 P 74

Conclusion • VHDL modeling and synthesis are completed. • Place and Route tool give

Conclusion • VHDL modeling and synthesis are completed. • Place and Route tool give a max clock frequency of 13. 4 MHz. and 53% of FPGA is utilizes White and Gloster 18 P 74

Future Work • The results of FPGA implementation demonstrated an excellent correction with standard

Future Work • The results of FPGA implementation demonstrated an excellent correction with standard simulation tool. • Next step is to perform more checks wit DFT with larger size sample blocks and find execution speed • Start work on Floating Point Fast Fourier Transform White and Gloster 19 P 74

Acknowledgement • The authors would like to thank NASA/ Goddard Space Flight Center for

Acknowledgement • The authors would like to thank NASA/ Goddard Space Flight Center for its support of this project. In particular, we give thanks to: • Mr. Thomas Flatley and Mr. Semion Kizhner for initiating the project. • Mr. Robert Kasa and Mr. Wesley Powell for their management support. • Dr. John Day for providing the spark that put everything together. White and Gloster 20 P 74