Adaptive Image Filtering Using RunTime Reconfiguration Nitin Srivastava

The Problem: Adaptive Image Filtering • A filtering window moves over the image pixel

• Spatially invariant filter — does not change values of its coefficients with

Use of Run-Time Reconfiguration • For fixed filter coefficients, could use constant coefficient multipliers

Use of FPGAs • Circuit design tailored to the problem — Filtering exhibits regular,

Solution Approach • Gray scale image of size 256, using a filtering window of

Solution Approach, cont. • Basic component is a module. • Sixteen pipelined modules act

Module Algorithm Procedure Step 0: Step 1: Step 2: THREEPIX(r, in, out) Adder(r) KCM(r)

Overall Algorithm for i 0 to 255 for k 0 to 255 in steps

Module filtering window pixel coefficient value KCM zero register 0 To output mux KCM

5, 22 5, 23 5, 24 5, 25 5, 26 5, 27 5, 28

block memory block memories block memory 5, 22 5, 23 5, 24 5, 25

v[7, 23] 0 0 pd[8, 24, -1] v[7, 25] v[7, 24] 0 0 pd[8,

Nitin: block memory Slide added block memories block memory 5, 22 5, 23 5,

v[7, 23] 0 0 pd[8, 24, -1] pd[8, 23, -1, 0] v[7, 25] v[7,

block memory block memories block memory 9 rs(-1)[8, 24] 5, 22 5, 23 5,

v[7, 23] 0 0 pd[8, 24, -1] pd[8, 23, -1, 0] pd[8, 23, -1,

block memory 11 block memories block memory 9 newv[6, 24] rs(-1)[8, 24] rs(-1)[7. 24]+

v[6, 23] 0 0 pd[7, 24, -1] v[6, 25] v[6, 24] 0 pd[7, 25,

v[6, 23] 0 0 pd[7, 24, -1] pd[7, 23, -1, 0] v[6, 25] v[6,

v[8, 23] 0 0 pd[9, 24, -1] v[8, 25] v[8, 24] 0 0 pd[9,

Evaluation • • Xilinx Virtex-E FPGA XCV 200 E Number of CLB slices required

Comparison System description Running time Speedup 866 Pentium system MHz III 20 ms 31

Slides: 30

Download presentation

Adaptive Image Filtering Using Run-Time Reconfiguration Nitin Srivastava Jerry L. Trahan Ramachandran Vaidyanathan Suresh Rai Department of Electrical and Computer Engineering Louisiana State University

The Problem: Adaptive Image Filtering • A filtering window moves over the image pixel by pixel. Window size is usually 3 3, 5 5, or 7 7. • The filter multiplies the intensity values of pixels that the window overlaps with its coefficients and sums the products to produce the new value of the pixel at the center of the window.

Working of a 3 3 size filter

• Spatially invariant filter — does not change values of its coefficients with the position of the filtering window over the image. • Adaptive filter — adjusts values of its coefficients according to the nature of the image. For instance, handles uniform regions differently than edges.

Use of Run-Time Reconfiguration • For fixed filter coefficients, could use constant coefficient multipliers (KCMs) configured for coefficients. • For adaptive coefficients, we use KCMs configured for pixel values. Flow of data regular, but more involved. • Inspired by 1 D adaptive filtering technique of Wojko and El. Gindy (RAW’ 99).

Use of FPGAs • Circuit design tailored to the problem — Filtering exhibits regular, repeated operations, taking an inner product among the same number of elements at each pixel position. • Problem size-specific components and datapaths. • Advantages for this problem even without reconfiguration.

Image Filtering Details

Solution Approach • Gray scale image of size 256, using a filtering window of size 3 3. • Can tailor to different image size — changes some register sizes and memory requirements. • Can tailor to different window size — changes memory requirements. • Can extend to video — window is 3 D across frames.

Solution Approach, cont. • Basic component is a module. • Sixteen pipelined modules act on 16 contiguous pixels at a time from the same row. • Three sets of three steps each, corresponding to the three rows in a 3 3 window and the three positions in each row. • For each of these nine steps, a module contributes to one of the nine window computations in which its pixel participates.

Module Algorithm Procedure Step 0: Step 1: Step 2: THREEPIX(r, in, out) Adder(r) KCM(r) + in Adder(r) KCM(r) + Adder(r 1) out Adder(r) • Contributes to three pixel values on the same row.

Overall Algorithm for i 0 to 255 for k 0 to 255 in steps of 16 for all j, where k j k+15 r = j mod 16 /* module r has v(i, j) */ THREEPIX(r, 0, memory) THREEPIX(r, memory, I/O pins)

Module filtering window pixel coefficient value KCM zero register 0 To output mux KCM To block memory write register module mux previous module memory read register From block memory KCM output mux pipeline register module adder step counter next module

First vantage point: one module

5, 22 5, 23 5, 24 5, 25 5, 26 5, 27 5, 28 5, 29 6, 22 6, 23 6, 24 6, 25 6, 26 6, 27 6, 28 6, 29 7, 22 7, 23 7, 24 7, 25 7, 26 7, 27 7, 28 7, 29 8, 22 8, 23 8, 24 8, 25 8, 26 8, 27 8, 28 8, 29 9, 22 9, 23 9, 24 9, 26 9, 27 9, 28 9, 29 9, 25

block memory block memories block memory 5, 22 5, 23 5, 24 5, 25 5, 26 5, 27 5, 28 5, 29 6, 22 6, 23 6, 24 6, 25 6, 26 6, 27 6, 28 6, 29 7, 22 7, 23 7, 24 7, 25 7, 26 7, 27 7, 28 7, 29 8, 22 8, 23 8, 24 8, 26 8, 27 8, 28 8, 29 9, 22 9, 23 9, 24 9, 26 9, 27 9, 28 9, 29 8, 25 9, 25

v[7, 23] 0 0 pd[8, 24, -1] v[7, 25] v[7, 24] 0 0 pd[8, 25, -1] v[7, 26] pd[8, 26, -1] pd[8, 27, -1] pd[8, 26, -1, 0] v[7, 27] 0 pd[8, 28, -1]

Nitin: block memory Slide added block memories block memory 5, 22 5, 23 5, 24 5, 25 5, 26 5, 27 5, 28 5, 29 6, 22 6, 23 6, 24 6, 25 6, 26 6, 27 6, 28 6, 29 7, 22 7, 23 7, 24 7, 25 7, 26 7, 27 7, 28 7, 29 8, 22 8, 23 8, 24 8, 26 8, 27 8, 28 8, 29 9, 22 9, 23 9, 24 9, 26 9, 27 9, 28 9, 29 8, 25 9, 25

v[7, 23] 0 0 pd[8, 24, -1] pd[8, 23, -1, 0] v[7, 25] v[7, 24] 0 0 pd[8, 25, -1] pd[8, 24, -1, 0] v[7, 26] pd[8, 26, -1] pd[8, 25, -1, 0] pd[8, 27, -1] pd[8, 26, -1, 0] pd[8, 25, -1, 1] v[7, 27] 0 pd[8, 28, -1] pd[8, 27, -1, 0]

block memory block memories block memory 9 rs(-1)[8, 24] 5, 22 5, 23 5, 24 5, 25 5, 26 5, 27 5, 28 5, 29 6, 22 6, 23 6, 24 6, 25 6, 26 6, 27 6, 28 6, 29 7, 22 7, 23 7, 24 7, 25 7, 26 7, 27 7, 28 7, 29 8, 22 8, 23 8, 24 8, 26 8, 27 8, 28 8, 29 9, 22 9, 23 9, 24 9, 26 9, 27 9, 28 9, 29 8, 25 9, 25

v[7, 23] 0 0 pd[8, 24, -1] pd[8, 23, -1, 0] pd[8, 23, -1, 1] v[7, 25] v[7, 24] 0 0 pd[8, 25, -1] v[7, 26] pd[8, 26, -1] pd[8, 27, -1] v[7, 27] 0 pd[8, 28, -1] pd[8, 24, -1, 0] pd[8, 25, -1, 0] pd[8, 26, -1, 0] pd[8, 27, -1, 0] pd[8, 22, -1, 1] pd[8, 24, -1, 1] pd[8, 25, -1, 1] rs(-1)[8, 24] pd[8, 26, -1, 1]

block memory 11 block memories block memory 9 newv[6, 24] rs(-1)[8, 24] rs(-1)[7. 24]+ rs(0)[7, 24] rs(-1)[7, 26] rs(-1)[6, 26]+ rs(0)[6. 26] 5, 22 5, 23 5, 24 5, 25 5, 26 5, 27 5, 28 5, 29 6, 22 6, 23 6, 24 6, 25 6, 26 6, 27 6, 28 6, 29 7, 22 7, 23 7, 24 7, 25 7, 26 7, 27 7, 28 7, 29 8, 22 8, 23 8, 24 8, 26 8, 27 8, 28 8, 29 9, 22 9, 23 9, 24 9, 26 9, 27 9, 28 9, 29 8, 25 9, 25

v[7, 23] 0 0 pd[8, 24, -1] pd[8, 23, -1, 0] pd[8, 23, -1, 1] pd[7, 24, 0, -1] v[7, 25] v[7, 24] 0 0 pd[8, 25, -1] pd[8, 24, -1, 0] pd[8, 22, -1, 1] rs(-1)[7, 26] pd[7, 25, 0, -1] v[7, 26] pd[8, 26, -1] pd[8, 27, -1] v[7, 27] 0 pd[8, 28, -1] pd[8, 25, -1, 0] pd[8, 26, -1, 0] pd[8, 27, -1, 0] pd[8, 24, -1, 1] pd[8, 25, -1, 1] rs(-1)[8, 24] pd[7, 27, 0, -1] pd[8, 26, -1, 1] pd[7, 25, 0, 0] pd[7, 26, 0, 0] pd[7, 27, 0, 0] pd[7, 24, 0, 1] pd[7, 25, 0, 1] pd[7, 26, 0, 1] rs(-1)[7, 24]+rs(0)[7, 24] pd[6, 27, 1, -1] pd[6, 28, 1, -1] pd[7, 26, 0, -1] pd[7, 28, 0, -1] pd[6, 24, 1, -1] pd[7, 24, 0, 0] pd[7, 23, 0, 1] rs(-1)[6, 26]+ rs(0)[6, 26] pd[6, 25, 1, -1] pd[6, 26, 1, -1] pd[6, 23, 1, 0] pd[6, 24, 1, 0] pd[6, 25, 1, 0] pd[6, 26, 1, 0] pd[6, 27, 1, 0] pd[6, 24, 1, 1] pd[6, 25, 1, 1] pd[6, 26, 1, 1] pd[7, 23, 0, 0] pd[7, 22, 0, 1] pd[6, 22, 1, 1] pd[6, 23, 1, 1] newv[6, 24]

Second vantage point: one pixel

v[6, 23] 0 0 pd[7, 24, -1] v[6, 25] v[6, 24] 0 pd[7, 25, -1] v[6, 26] 0 pd[7, 26, -1] v[6. 27] 0 pd[7, 27, -1] pd[7, 28 – 1, -1]

v[6, 23] 0 0 pd[7, 24, -1] pd[7, 23, -1, 0] v[6, 25] v[6, 24] 0 pd[7, 25, -1] pd[7, 24, -1, 0] v[6, 26] 0 pd[7, 26, -1] pd[7, 25, -1, 0] pd[7, 27, -1] pd[7, 26, -1, 0] v[6. 27] 0 pd[7, 28, -1] pd[7, 27, -1, 0]

v[6, 23] 0 0 pd[7, 24, -1] v[6, 25] v[6, 24] 0 pd[7, 25, -1] v[6, 26] v[6. 27] 0 pd[7, 26, -1] pd[7, 27, -1] 0 pd[7, 28, -1] pd[7, 23, -1, 0] pd[7, 24, -1, 0] pd[7, 25, -1, 0] pd[7, 26, -1, 0] pd[7, 27, -1, 0] pd[7, 22, -1, 1] pd[7, 23, -1, 1] pd[7, 24, -1, 1] pd[7, 25, -1, 1] pd[7, 26, -1, 1] rs(-1)[7, 25]

v[8, 23] 0 0 pd[9, 24, -1] v[8, 25] v[8, 24] 0 0 pd[9, 25, -1] v[8, 26] pd[9, 26, -1] pd[9, 27, -1] v[8, 27] 0 pd[9, 28, -1] pd[9, 23, -1, 0] pd[9, 24, -1, 0] pd[9, 25, -1, 0] pd[9, 26, -1, 0] pd[9, 27, -1, 0] pd[9, 22, -1, 1] pd[9, 23, -1, 1] pd[9, 24, -1, 1] pd[9, 25, -1, 1] pd[9, 26, -1, 1] pd[8, 25, 0, -1] pd[8, 26, 0, -1] pd[8, 27, 0, -1] pd[8, 28, 0, -1] pd[8, 24, 0, 0] pd[8, 25, 0, 0] pd[8, 26, 0, 0] pd[8, 27, 0, 0] pd[8, 22, 0, 1] rs(-1)[7, 25]+ rs(0)[7, 25] pd[8, 23, 0, 1] pd[8, 24, 0, 1] pd[8, 25, 0, 1] pd[8, 26, 0, 1] pd[7, 24, 1, -1] pd[7, 25, 1, -1] pd[7, 26, 1, -1] pd[7, 23, 1, 0] pd[7, 24, 1, 0] pd[8, 24, 0, -1] pd[8, 23, 0, 0] pd[7, 22, 1, 1] pd[7, 23, 1, 1] pd[7, 25, 1, 0] pd[7, 24, 1, 1] pd[7, 27, 1, -1] pd[7, 28, 1, -1] pd[7, 26, 1, 0] pd[7, 27, 1, 0] pd[7, 26, 1, 1] pd[7, 25, 1, 1] newv[7, 25]

Evaluation • • Xilinx Virtex-E FPGA XCV 200 E Number of CLB slices required = 492 Clock frequency = 101. 9 MHz Time spent in filtering a 256 image = 642 s

Comparison System description Running time Speedup 866 Pentium system MHz III 20 ms 31 400 MHz Sun Ultra 5 system 53 ms 84