Wavelet Spectral Dimension Reduction of Hyperspectral Imagery on

  • Slides: 32
Download presentation
Wavelet Spectral Dimension Reduction of Hyperspectral Imagery on a Reconfigurable Computer Tarek El-Ghazawi 1,

Wavelet Spectral Dimension Reduction of Hyperspectral Imagery on a Reconfigurable Computer Tarek El-Ghazawi 1, Esam El-Araby 1, Abhishek Agarwal 1, Jacqueline Le Moigne 2, and Kris Gaj 3 1 The George Washington University, Space Flight Center, 3 George Mason University {tarek, esam, agarwala}@gwu. edu, lemoigne@backserv. gsfc. nasa. gov, kgaj@gmu. edu 2 NASA/Goddard

Objectives and Introduction Investigate Use of Reconfigurable Computing for On-Board Automatic Processing of Remote

Objectives and Introduction Investigate Use of Reconfigurable Computing for On-Board Automatic Processing of Remote Sensing Data u Remote Sensing Image Classification u Applications: 0 Land Classification, Mining, Geology, Forestry, Agriculture, Environmental Management, Global Atmospheric Profiling (e. g. water vapor and temperature profiles), and Planetary Space missions u Types of Carriers: Spaceborne Airborne El-Ghazawi 2 E 229 / MAPLD 2004

Types of Sensing u Mono-Spectral Imagery 1 band (SPOT ≡ panchromatic) u Multi-Spectral Imagery

Types of Sensing u Mono-Spectral Imagery 1 band (SPOT ≡ panchromatic) u Multi-Spectral Imagery 10 s of bands (MODIS ≡ 36 bands, Sea. Wi. FS ≡ 8 bands, IKONOS ≡ 5 bands) u Hyperspectral Imagery 100 s- 1000 s of bands (AVIRIS ≡ 224 bands, AIRS ≡ 2378 bands) Multispectral / Hyperspectral Imagery Comparison El-Ghazawi 3 E 229 / MAPLD 2004

Different Airborne Hyperspectral Systems AISA AURORA AVIRIS GER 4 E 229 / MAPLD 2004

Different Airborne Hyperspectral Systems AISA AURORA AVIRIS GER 4 E 229 / MAPLD 2004

Why On-Board Processing? Problems u Complex Pre- processing Steps: 0 Image Registration / Fusion

Why On-Board Processing? Problems u Complex Pre- processing Steps: 0 Image Registration / Fusion Solutions u Automatic On-Board Processing 0 Reduces the cost and the complexity of the On-The-Ground/Earth processing system u u Large Data Volumes 0 Large cost and complexity of the On. The-Ground / Earth processing systems larger utilization for broader community, including educational institutions 0 Enables autonomous decisions to be taken on-board faster critical decisions u Applications: » Future reconfigurable web sensors missions » Future Mars and planetary exploration missions 0 Large critical decisions latency 0 Dimension Reduction* 0 Large data downlink u bandwidth requirements u Reduction of communication bandwidth Simpler and faster subsequent computations * Investigated Pre-Processing Step El-Ghazawi 5 E 229 / MAPLD 2004

Why Reconfigurable Computers? On-Board Processing Problems Solutions u Reconfigurable Computers (RCs) 0 Higher performance

Why Reconfigurable Computers? On-Board Processing Problems Solutions u Reconfigurable Computers (RCs) 0 Higher performance (throughput and u High Computational Complexities processing power) compared to conventional processors 0 Low performance for traditional processing platforms 0 Lower form / wrap factors compared 0 High form / wrap factors (size to parallel computers and weight) for parallel computing systems 0 Higher flexibility (reconfigurability) compared to ASICs 0 Low flexibility for traditional ASIC-Based solutions 0 Less costs and shorter time-tosolution compared to ASICs 0 High costs and long design cycles for traditional ASICBased solutions El-Ghazawi 6 E 229 / MAPLD 2004

Introduction

Introduction

512 pixels Data Arrangement 224 bands B an d s Rows Pixels ≡ (Rows

512 pixels Data Arrangement 224 bands B an d s Rows Pixels ≡ (Rows x Columns) Parallel Computing Scope, Reconfigurable Computing 2 nd Scope 512 pixels El-Ghazawi Columns Bands Reconfigurable Computing 1 st Scope Hyper Image Matrix Form 8 E 229 / MAPLD 2004

Data Arrangement (cnt’d) 0 1 2. . Bands-1 (0, 0) (0, 1) (0, cols-1)

Data Arrangement (cnt’d) 0 1 2. . Bands-1 (0, 0) (0, 1) (0, cols-1) s Rows (0, 0) (0, 1) 0 1 2. . Bands-1 Columns an d (rows-1, cols-1) B (rows-1, 0) (Pixels-1) Pixels = Rows X Columns Hyper Image El-Ghazawi 0 1 2. . Bands-1 (rows-1, cols-1) 8 Bits Array Form 9 E 229 / MAPLD 2004

Examples of Hyperspectral Datasets AVIRIS: INDIAN PINES’ 92 (400 x 400 by 192 bands)

Examples of Hyperspectral Datasets AVIRIS: INDIAN PINES’ 92 (400 x 400 by 192 bands) AVIRIS: SALINAS’ 98 (217 x 512 by 192 bands) El-Ghazawi 10 E 229 / MAPLD 2004

Dimension Reduction Techniques u Principal Component Analysis (PCA): 0 Most Common Method Dimension Reduction

Dimension Reduction Techniques u Principal Component Analysis (PCA): 0 Most Common Method Dimension Reduction 0 Does Not Preserve Spectral Signatures 0 Complex and Global computations: difficult for parallel processing and hardware implementations u Wavelet-Based Dimension Reduction: 0 Preserves Spectral Signatures 0 High-Performance Implementation 0 Simple and Local Operations El-Ghazawi Multi-Resolution Wavelet Decomposition of Each Pixel 1 -D Spectral Signature (Preservation of Spectral Locality) 11 E 229 / MAPLD 2004

2 -D DWT (1 -level Decomposition) L H H 2 L 2 H 2

2 -D DWT (1 -level Decomposition) L H H 2 L 2 H 2 LL HL LH HH 2 1 -D DWT El-Ghazawi 2 2 L H L 12 E 229 / MAPLD 2004

2 -D DWT (2 -level Decomposition) L L L H H 2 2 2

2 -D DWT (2 -level Decomposition) L L L H H 2 2 2 L 2 H 2 HL 2 LH HH 2 Second Level First Level El-Ghazawi 2 2 2 H L 13 E 229 / MAPLD 2004

Wavelet-Based vs. PCA (Execution Time, 500 MHz P 3) Complexity: Wavelet-Based = O(MN) ;

Wavelet-Based vs. PCA (Execution Time, 500 MHz P 3) Complexity: Wavelet-Based = O(MN) ; PCA = O(MN 2+N 3) El-Ghazawi 14 E 229 / MAPLD 2004

Wavelet-Based vs. PCA (cnt’d) (Execution Time, 500 MHz P 3) Complexity: Wavelet-Based = O(MN)

Wavelet-Based vs. PCA (cnt’d) (Execution Time, 500 MHz P 3) Complexity: Wavelet-Based = O(MN) ; PCA = O(MN 2+N 3) WAVELET El-Ghazawi PCA 15 E 229 / MAPLD 2004

Wavelet-Based vs. PCA (cnt’d) (Classification Accuracy) u Implemented on the HIVE (8 Pentium Xeon/Beowulfs-Type

Wavelet-Based vs. PCA (cnt’d) (Classification Accuracy) u Implemented on the HIVE (8 Pentium Xeon/Beowulfs-Type System) 6. 5 times faster than sequential implementation u Classification Accuracy Similar or Better than PCA u Faster than PCA El-Ghazawi 16 E 229 / MAPLD 2004

The Algorithm PIXEL LEVEL OVERALL Save Current Level [a] of Wavelet Coefficients Read Data

The Algorithm PIXEL LEVEL OVERALL Save Current Level [a] of Wavelet Coefficients Read Data Decompose Spectral Pixel Read Threshold (Th) DWT Coefficients (the Approximation) Reconstruct Individual Pixel to Original Stage Compute Level for Each Individual Pixel (PIXEL LEVEL) Reconstructed Approximation Compute Correlation (Corr) between Orig and Recon. Remove Outlier Pixels Corr < Th Get Lowest Level (L) from Global Histogram Yes Get Current Level [a] of Wavelet Coefficients Decompose Each Pixel to Level L Add Contribution of the Pixel to Global Histogram Write Data El-Ghazawi No 17 E 229 / MAPLD 2004

Prototyping Wavelet-Based Dimension Reduction of Hyperspectral Imagery on a Reconfigurable Computer, the SRC-6 E

Prototyping Wavelet-Based Dimension Reduction of Hyperspectral Imagery on a Reconfigurable Computer, the SRC-6 E

Hardware Architecture of SRC-6 E El-Ghazawi 19 E 229 / MAPLD 2004

Hardware Architecture of SRC-6 E El-Ghazawi 19 E 229 / MAPLD 2004

SRC Compilation Process El-Ghazawi 20 E 229 / MAPLD 2004

SRC Compilation Process El-Ghazawi 20 E 229 / MAPLD 2004

Top Hierarchy Module X L 1: L 5 DWT_IDWT TH MUX Llevel Y 1:

Top Hierarchy Module X L 1: L 5 DWT_IDWT TH MUX Llevel Y 1: Y 5 Correlator GTE_1: GTE_5 Histogram Level N El-Ghazawi 21 E 229 / MAPLD 2004

Decomposition and Reconstruction Levels of Dimension Reduction (DWT_IDWT) Level_1 Level_2 Level_3 Level_4 L Level_5

Decomposition and Reconstruction Levels of Dimension Reduction (DWT_IDWT) Level_1 Level_2 Level_3 Level_4 L Level_5 L 5 2 2 L L 2 2 L 4 L’ 2 L 3 2 L’ L’ 2 2 2 L’ L 2 L 1 L’ 2 L 0 2 L’ 2 2 X L’ 2 L’ L’ 2 2 L’ D Y 1 El-Ghazawi L’ D D Y 3 Y 2 22 D Y 4 L’ Y 5 E 229 / MAPLD 2004

FIR Filters (L, L’) Implementation Register C(1) Register C(2) Register C(3) X X X

FIR Filters (L, L’) Implementation Register C(1) Register C(2) Register C(3) X X X + Output Image F(i) … Register C(n) X Input Image D(i) El-Ghazawi 23 E 229 / MAPLD 2004

Correlator Module term. AB termxx X N term. AB termxy MULT term 2 xy

Correlator Module term. AB termxx X N term. AB termxy MULT term 2 xy Shift Left (32 bits) Yi Compare term. AB termyy MULT termxxtermyy GTE_i (Increment Histogrami) MULT TH El-Ghazawi MULT 24 TH 2 E 229 / MAPLD 2004

Histogram Module GTE_1 GTE_2 GTE_3 GTE_4 GTE_5 El-Ghazawi Update Histogram Counters cnt_1 cnt_2 cnt_3

Histogram Module GTE_1 GTE_2 GTE_3 GTE_4 GTE_5 El-Ghazawi Update Histogram Counters cnt_1 cnt_2 cnt_3 cnt_4 Level Selector Level cnt_5 25 E 229 / MAPLD 2004

Resource Utilization and Operating Frequency El-Ghazawi 26 E 229 / MAPLD 2004

Resource Utilization and Operating Frequency El-Ghazawi 26 E 229 / MAPLD 2004

Measurements Scenarios µP Functions MAP Function MAP Alloc. Read Data CM to OBM Compute

Measurements Scenarios µP Functions MAP Function MAP Alloc. Read Data CM to OBM Compute OBM Transfer-In Computations to CM Write Data MAP Free Transfer-Out Repeat nstreams times End-to-End time (HW) Configuration + End-to-End time (SW) Allocation time El-Ghazawi End-to-End time with I/O 27 Release time E 229 / MAPLD 2004

SRC Experiment Setup and Results u Salinas’ 98 0 217 X 512 Pixels, 192

SRC Experiment Setup and Results u Salinas’ 98 0 217 X 512 Pixels, 192 Bands = 162. 75 MB 0 Number of Streams = 41 0 Stream Size = 2730 voxels ≈ 4 MB DMAIN Compute DMAOUT DMAIN Compute DMA-IN TDMA-IN Compute TCOMPUTATIONS DMA-OUT TTOTAL DMAOUT u Non-Overlapped Streams 0 TDMA-IN = 13. 040 msec 0 TCOMP = 0. 62428 msec 0 TDMA-OUT = 22. 712 msec 0 TTotal = 1. 49 sec 0 Throughput = 109. 23 MB/Sec DMA-IN DMA-OUT Compute u Overlapped Streams 0 TDMA = 35. 752 msec 0 TCOMP = 0. 62428 msec 0 Xc = 0. 0175 0 Throughput = 111. 14 MB/Sec u Speedupnon-overlapped = (1+ Xc) = 1. 0175 (insignificant) El-Ghazawi 28 E 229 / MAPLD 2004

Execution Time El-Ghazawi 29 E 229 / MAPLD 2004

Execution Time El-Ghazawi 29 E 229 / MAPLD 2004

Distribution of Execution Times El-Ghazawi 30 E 229 / MAPLD 2004

Distribution of Execution Times El-Ghazawi 30 E 229 / MAPLD 2004

Speedup Results El-Ghazawi 31 E 229 / MAPLD 2004

Speedup Results El-Ghazawi 31 E 229 / MAPLD 2004

Concluding Remarks u We prototyped the automatic wavelet-based dimension reduction algorithm on a reconfigurable

Concluding Remarks u We prototyped the automatic wavelet-based dimension reduction algorithm on a reconfigurable architecture u Both coarse-grain and fine-grain parallelism are exploited u We observed a 10 x speedup using the P 3 version of SRC- 6 E. From our previous experience we expect this speedup to double using the P 4 version of SRC machine u These speedup figures were obtained while I/O is still dominating. The speedup can be increased by improving I/O Bandwidth of the reconfigurable platforms El-Ghazawi 32 E 229 / MAPLD 2004