A MultiProcessor System on Chip Architecture for Real

Outline n n n Introduction Previous Work MPSo. C via the HW/SW Co-design Case

Introduction: Radar Imagery, Facts n n The initial problem of this proposition for the

Introduction: HW implementation, Facts n Why Multiprocessor System on a Chip? Because MPSo. Cs

MOTIVATION n To efficiently conceptualize and implement an architecture with the aggregation of parallel

CONTRIBUTIONS: n First, a high-speed robust Bayesian regularization hardware accelerator for the realtime enhancement

Algorithmic ref. Implementation School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 8

Algorithmic ref. Implementation School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 9

Algorithmic ref. Implementation Method → SNR [d. B] → IOSNR Metrics [d. B] PIOSNR

Partitioning Stage School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 11

No. C oriented structure of the proposed coprocessors (a) Robust SS vector School of

No. C oriented structure of the proposed coprocessors (b) RBR estimator School of Engineering,

Aggregation of parallel computing techniques Application for (tile=0, tile< L, tile++){ for (i=0, i<

Tiling technique Large-Scale Real -World Image Fixed-Size Systolic Array School of Engineering, Autonomous. University

Tiling technique Large-Scale Real -World Image Fixed-Size Systolic Array (1, 2) (2, 1) School

Fixed-Sized No. C-PAs-based Robust SS vector co-processor Stage 1: School of Engineering, Autonomous. University

Fixed-Sized No. C-PAs-based Robust SS vector co-processor Stage 2: School of Engineering, Autonomous. University

Fixed-Sized No. C-PAs-based Robust SS vector co-processor Stage 3: School of Engineering, Autonomous. University

Fixed-Sized No. C-PAs-based RBR estimator co-processor School of Engineering, Autonomous. University of Yucatan, Merida,

New Perspective: VLSI-FPGA Platforms Ø Novel VLSI-FPGA platform represents a new perspective for real

VLSI-FPGA Platform School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 22

Performance Analysis: FPGA Synthesis Metrics HW co-processors → Robust SS vector RBR estimator Slices

Performance Analysis: FPGA Processing time (seconds) Implementation → RBR Evaluated PC-Oriented Implementation 19. 7

Conclusions n The implementation results of the proposed No. C-PAoriented architecture helps to drastically

Recent Selected Journal Papers n A. Castillo Atoche, D. Torres, Yuriy V. Shkvarko, “Towards

Thanks for your attention. Dr. Alejandro Castillo Atoche Email: acastill@uady. mx School of Engineering,

Slides: 26

Download presentation

A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing Presenter: Dr. Alejandro Castillo Atoche 2011/07/25 IGARSS’ 11 School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 1

Outline n n n Introduction Previous Work MPSo. C via the HW/SW Co-design Case Study: RBR Algorithms Ø Algorithm Analysis Ø n Network on Chip (No. C)-based Accelerator Ø n n New Perspective: Network of FPGA-VLSI architectures Hardware Implementation Results Ø n Integration in a Co-design scheme Performance Analysis Conclusions School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 2

Introduction: Radar Imagery, Facts n n The initial problem of this proposition for the Geospatial RS imagery consist in to solve the illconditioned inverse spatial spectrum pattern (SSP) estimation problem with model uncertainties via the Bayesian minimum risk (BMR) estimation strategy. In previous works, alternatives of MPSo. C propositions have been developed but without systolic arrays techniques or Network on a Chip structures. School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 3

Introduction: HW implementation, Facts n Why Multiprocessor System on a Chip? Because MPSo. Cs are single-chip multiprocessor designed for real time signal processing applications. Why Network on a Chip Accelerators? Networks-on-chips (No. Cs) are multiprocessor interconnection networks designed to achieved real time SP. Avoids Bottlenecks in HW/SW co-designs. n School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 4

MOTIVATION n To efficiently conceptualize and implement an architecture with the aggregation of parallel computing and systolic array mapping techniques in a novel network on a chip (No. C) accelerator scheme via the HW/SW co-design paradigm. School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 5

CONTRIBUTIONS: n First, a high-speed robust Bayesian regularization hardware accelerator for the realtime enhancement of the large scale Geospatial imagery is designed. n Second, the use of High Performance Computing techniques in an efficient architecture based on Network on a Chip (No. C) is also developed. School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 6

Algorithmic ref. Implementation School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 8

Algorithmic ref. Implementation School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 9

Algorithmic ref. Implementation Method → SNR [d. B] → IOSNR Metrics [d. B] PIOSNR (%) MSE RSF RBR 15 20 25 15 10. 15 15. 32 20. 25 6. 15 10. 62 13. 04 81. 37 86. 62 85. 24 95. 18 90. 29 98. 24 0. 16 0. 46 0. 57 0. 03 0. 29 School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 20 25 0. 34 10

Partitioning Stage School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 11

No. C oriented structure of the proposed coprocessors (a) Robust SS vector School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 12

No. C oriented structure of the proposed coprocessors (b) RBR estimator School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 13

Aggregation of parallel computing techniques Application for (tile=0, tile< L, tile++){ for (i=0, i< m, i++){ for (j=0, j< n, j++){ for (k=0, k< r, k++){ a(i, j, k)=a(i, j-1, k); b(i, j, k)=b(i-1, j, k); c(i, j, k)=c(i, j, k-1) + a(i, j, k)*b(i, j, k); } } } 3 -D Dependance Graph (DG) a[i, j] c[i, j] b[i-1, j] b[i, j] Linear Schedule: set of parallel and uniformely spaced hyperplanes. a[i, j-1] SFG Projection } School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 14

Tiling technique Large-Scale Real -World Image Fixed-Size Systolic Array School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 15

Tiling technique Large-Scale Real -World Image Fixed-Size Systolic Array (1, 2) (2, 1) School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 16

Fixed-Sized No. C-PAs-based Robust SS vector co-processor Stage 1: School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 17

Fixed-Sized No. C-PAs-based Robust SS vector co-processor Stage 2: School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 18

Fixed-Sized No. C-PAs-based Robust SS vector co-processor Stage 3: School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 19

Fixed-Sized No. C-PAs-based RBR estimator co-processor School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 20

New Perspective: VLSI-FPGA Platforms Ø Novel VLSI-FPGA platform represents a new perspective for real time processing of newer RS applications. School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 21

VLSI-FPGA Platform School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 22

Performance Analysis: FPGA Synthesis Metrics HW co-processors → Robust SS vector RBR estimator Slices 8158 3289 *DSP’ 48 144 32 ^LUTs 7539 2278 Flip-Flops 6304 2788 School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 23

Performance Analysis: FPGA Processing time (seconds) Implementation → RBR Evaluated PC-Oriented Implementation 19. 7 Proposed Efficient RBR architecture 1. 26 School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 24

Conclusions n The implementation results of the proposed No. C-PAoriented architecture helps to drastically reduce the overall processing time of the RBR algorithm. In fact, the presented architecture is efficiently implemented in MPSo. C mode in spite of employing systems based on traditional DSPs or PC-Clusters platforms. n The implementation of the RBR algorithm using the proposed architecture takes only 1. 26 seconds for the large-scale RS image reconstruction in contrast to 19. 7 seconds required with the C++ implementation. Thus, the achieved processing time is approximately 16 times less than the corresponding processing time with the conventional C++ PC-based implementation. School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 25

Recent Selected Journal Papers n A. Castillo Atoche, D. Torres, Yuriy V. Shkvarko, “Towards Real Time Implementation of Reconstructive Signal Processing Algorithms Using Systolic Arrays Coprocessors”, JOURNAL OF SYSTEMS ARCHITECTURE (JSA), Edit. ELSEVIER, Volume 56, Issue 8, August 2010, Pages 327 -339, ISSN: 1383 -7621, doi: 10. 1016/j. sysarc. 2010. 05. 004. JCR. n A. Castillo Atoche, D. Torres, Yuriy V. Shkvarko, “Descriptive Regularization-Based Hardware/Software Co-Design for Real-Time Enhanced Imaging in Uncertain Remote Sensing Environment”, EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING (JASP), Edit. HINDAWI, Volume 2010, 31 pages, 2010. ISSN: 16876172, e-ISSN: 1687 -6180, doi: 10. 1155/ASP. JCR. n Yuriy V. Shkvarko, A. Castillo Atoche, D. Torres, “Near Real Time Enhancement of Geospatial Imagery via Systolic Implementation of Neural Network-Adapted Convex Regularization Techniques”, JOURNAL OF PATTERN RECOGNITION LETTERS, Edit. ELSEVIER, 2011. JCR. In Press School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 26

Thanks for your attention. Dr. Alejandro Castillo Atoche Email: acastill@uady. mx School of Engineering, Autonomous. University of Yucatan, Merida, Mexico. 27