Image processing language Halide Dmitry Kurtaev Internet of
- Slides: 36
Image processing language Halide Dmitry Kurtaev Internet of Things Group
Agenda § Halide’s basics § Samples – RGB → Gray – Box filter (blurring) – Histogram computation – K-means (color reduction) § Scheduling § Gamma correction on GPU in Halide § Ahead-of-time compilation Internet of Things Group 2
Halide § Programming language embedded in C++ (C++11 or higher) § Designed for image processing tasks § LLVM compiler backend § Generates C code, Open. CL/CUDA kernels, Open. GL shaders § Cross-platform compilation with AVX/SSE/FMA/F 16 features § High-level scheduling § It isn’t Turing complete Internet of Things Group 3
Halide Algorithm, Scheduling Halide Intermediate representation (IR) LLVM Machine code Open. CL compiler CUDA Internet of Things Group 4
Halide’s basic entities § Variables Halide: : Var x("x"), xo("xo"), xi("xi"); Halide: : RDom r(-1, 3); // r in [-1, 1] § Reduction domains Halide: : Expr e = sin((x + r) * 0. 1 f); § Expressions Halide: : Func f("f"); f(x) = sum(e); § Functions § Scheduling directives § Buffers f. split(x, xo, xi, 16). parallel(xo). vectorize(xi); f. compile_jit(Halide: : get_host_target()); Halide: : Buffer<float> output(1000); f. realize(output); Internet of Things Group 5
Halide pipeline § Write algorithm Halide: : Var x("x"), xo("xo"), xi("xi"); Halide: : RDom r(-1, 3); // r in [-1, 1] § Schedule Halide: : Expr e = sin((x + r) * 0. 1 f); § Choose OS, architecture, features Halide: : Func f("f"); f(x) = sum(e); § Compile § Realize f. split(x, xo, xi, 16). parallel(xo). vectorize(xi); f. compile_jit(Halide: : get_host_target()); Halide: : Buffer<float> output(1000); f. realize(output); Internet of Things Group 6
RGB 2 Gray void rgb 2 gray(const uint 8_t* src, uint 8_t* dst, int height, int width) { for (int i = 0; i < width * height; ++i) { dst[i] = (uint 8_t)(0. 299 f * src[i * 3 + 0] + // red 0. 587 f * src[i * 3 + 1] + // green 0. 114 f * src[i * 3 + 2]); // blue } } Internet of Things Group 7
RGB 2 Gray using TBB tbb: : parallel_for( tbb: : blocked_range<int>(0, height * width), [&](tbb: : blocked_range<int> r) { uint 16_t red, green, blue, R 2 GRAY = 77, G 2 GRAY = 150, B 2 GRAY = 29; int begin = r. begin(); int end = r. end(); const uint 8_t* __restrict__ src. Data = src; uint 8_t* __restrict__ dst. Data = dst; for (int i = begin; i < end; ++i) { red = src. Data[i * 3]; green = src. Data[i * 3 + 1]; blue = src. Data[i * 3 + 2]; dst. Data[i] = (R 2 GRAY * red + G 2 GRAY * green + B 2 GRAY * blue) >> 8; } }); Internet of Things Group 8
RGB 2 Gray in Halide uint 16_t R 2 GRAY = 77, G 2 GRAY = 150, B 2 GRAY = 29; Func f("rgb 2 gray"); auto input = Buffer<uint 8_t>: : make_interleaved(src, width, height, 3); Var x("x"), y("y"); Expr r = cast<uint 16_t>(input(x, y, Expr g = cast<uint 16_t>(input(x, y, Expr b = cast<uint 16_t>(input(x, y, f(x, y) = cast<uint 8_t>((R 2 GRAY * r 0)); 1)); 2)); + G 2 GRAY * g + B 2 GRAY * b) >> 8); Buffer<uint 8_t> output(dst, {width, height}); f. realize(output); Internet of Things Group 9
RGB 2 Gray efficiency comparison (1920 x 1280) Intel® Core™ i 5 -4460 CPU @ 3. 20 GHz x 4 Open. CV (GNU 5. 4. 0) 0. 76 ms Internet of Things Group TBB (Intel® C++ Compiler) 0. 674 ms Halide (LLVM 5. 0. 1) 1. 315 ms 10
Scheduling Var x("x"), y("y"); Func f("f"); f(x, y) = 0; f. print_loop_nest(); f. trace_stores(); f. realize(10, 10); produce f: for y: for x: f(. . . ) =. . . Internet of Things Group 11
Scheduling Var x("x"), y("y"), yo("yo"), yi("yi"); Func f("f"); f(x, y) = 0; f. bound(x, 0, 10). bound(y, 0, 10). split(y, yo, yi, 5). parallel(yo); f. print_loop_nest(); f. trace_stores(); f. realize(10, 10); Internet of Things Group produce f: parallel y. yo: for y. yi in [0, 4]: for x: f(. . . ) =. . . 12
Scheduling Var x("x"), y("y"), yo("yo"), yi("yi"), xo("xo"), xi("xi"), tile("tile"); Func f("f"); f(x, y) = 0; f. bound(x, 0, 10). bound(y, 0, 10). split(y, yo, yi, 5). split(x, xo, xi, 5). reorder(xi, yi, xo, yo). fuse(xo, yo, tile). parallel(tile); f. print_loop_nest(); f. trace_stores(); f. realize(10, 10); Internet of Things Group produce f: parallel x. xo. tile: for y. yi in [0, 4]: for x. xi in [0, 4]: f(. . . ) =. . . 13
Scheduling Var x("x"), y("y"), yo("yo"), yi("yi"); Func f("f"); f(x, y) = 0; f. bound(x, 0, 10). bound(y, 0, 10). split(y, yo, yi, 5). parallel(yo). vectorize(x, 4); f. print_loop_nest(); f. trace_stores(); f. realize(10, 10); Internet of Things Group produce f: parallel y. yo: for y. yi in [0, 4]: for x. x: vectorized x. v 8 in [0, 3]: f(. . . ) =. . . 14
Scheduling Var x("x"), y("y"), yo("yo"), yi("yi"); Func f("f"); f(x, y) = 0; f. estimate(x, 0, 10). estimate(y, 0, 10); Pipeline(f). auto_schedule(get_host_target()); f. print_loop_nest(); f. trace_stores(); f. realize(10, 10); Internet of Things Group produce f: parallel y: parallel x. x_vo: vectorized x. x_vi in [0, 7]: f(. . . ) =. . . 15
RGB 2 Gray efficiency comparison (1920 x 1280) Intel® Core™ i 5 -4460 CPU @ 3. 20 GHz x 4 Open. CV (GNU 5. 4. 0) 0. 76 ms TBB (Intel® C++ Compiler) 0. 674 ms f. split(y, yo, yi, 64). parallel(yo). vectorize(x, 8); 0. 796 ms Internet of Things Group Halide (LLVM 5. 0. 1) 1. 315 ms (serial) f. split(y, yo, yi, 64). split(x, xo, xi, 64). reorder(xi, yi, xo, yo). fuse(xo, yo, tile). parallel(tile). vectorize(x, 8); 1. 221 ms f. parallel(y). vectorize(x, 32); (auto scheduling) 0. 869 ms 16
Box filter in Halide Func f("box_filter"); auto input = Buffer<uint 8_t>: : make_interleaved(src, width, height, 3); Func padded = Boundary. Conditions: : constant_exterior(input, 0); Var x("x"), y("y"), c("c"); Func input_uint 16("input_uint 16"); input_uint 16(x, y, c) = cast<uint 16_t>(padded(x, y, c)); RDom r(-1, 3, -1, 3); Expr s = sum(input_uint 16(x + r. x, y + r. y, c)); float ratio = 1. 0 f / 9; f(x, y, c) = cast<uint 8_t>(s * ratio); f. output_buffer(). dim(0). set_stride(3). set_bounds(0, width); f. output_buffer(). dim(1). set_stride(3 * width). set_bounds(0, height); f. output_buffer(). dim(2). set_stride(1). set_bounds(0, 3); Internet of Things Group 17
Box filter efficiency comparison (1920 x 1280) Intel® Core™ i 5 -4460 CPU @ 3. 20 GHz x 4 Open. CV (GNU 5. 4. 0) 3. 603 ms Internet of Things Group TBB (Intel® C++ Compiler) 4. 779 ms (is not well auto-vectorized) Halide (LLVM 5. 0. 1) 3. 784 ms 18
Scheduling: producer-consumer 81. 5 ms @ 1920 x 1280 Var x("x"), y("y"); Func producer("producer"), consumer("consumer"); producer(x, y) = sin(x + y); consumer(x, y) = producer(x, y - 1) + producer(x - 1, y) + producer(x + 1, y) + producer(x, y + 1); consumer. realize(5, 5); producer is inlided to consumer ⇒ #sin – 125! Var x("x"), y("y"); Func consumer("consumer"); consumer(x, y) = sin(x + y - 1) + sin(x – 1 + y) + sin(x + y + 1); consumer. realize(5, 5); Internet of Things Group 19
Scheduling: producer-consumer 33. 4 ms @ 1920 x 1280 Var x("x"), y("y"); Func producer("producer"), consumer("consumer"); producer(x, y) = sin(x + y); consumer(x, y) = producer(x, y - 1) + producer(x - 1, y) + producer(x + 1, y) + producer(x, y + 1); producer. compute_root(); producer. trace_loads(); producer. trace_stores(); consumer. realize(5, 5); producer Internet of Things Group consumer 20
Scheduling: producer-consumer 82. 5 ms @ 1920 x 1280 Var x("x"), y("y"); Func producer("producer"), consumer("consumer"); producer(x, y) = sin(x + y); consumer(x, y) = producer(x, y - 1) + producer(x - 1, y) + producer(x + 1, y) + producer(x, y + 1); producer. compute_at(consumer, y); producer. trace_loads(); producer. trace_stores(); consumer. realize(5, 5); producer Internet of Things Group consumer 21
Scheduling: producer-consumer 28. 1 ms @ 1920 x 1280 Var x("x"), y("y"); Func producer("producer"), consumer("consumer"); producer(x, y) = sin(x + y); consumer(x, y) = producer(x, y - 1) + producer(x - 1, y) + producer(x + 1, y) + producer(x, y + 1); producer. store_root(); producer. compute_at(consumer, y); producer. trace_loads(); producer. trace_stores(); consumer. realize(5, 5); producer Internet of Things Group consumer 22
Scheduling: producer-consumer 50. 5 ms @ 1920 x 1280 (need to be parallelized) Var x("x"), y("y"); Func producer("producer"), consumer("consumer"); producer(x, y) = sin(x + y); consumer(x, y) = producer(x, y - 1) + producer(x - 1, y) + producer(x + 1, y) + producer(x, y + 1); producer. store_root(); producer. compute_at(consumer, x); producer. trace_loads(); producer. trace_stores(); consumer. realize(5, 5); producer Internet of Things Group consumer 23
Histogram computation in Halide void histogram(uint 8_t* src, int* dst, int height, int width) { static Func f("hist"); static Buffer<int> output(dst, {256, 3}); if (!f. defined()) { auto input = Buffer<uint 8_t>: : make_interleaved(src, width, height, 3); Var c("c"), i("i"); RDom r(0, width, 0, height); f(i, c) = 0; Expr lum = clamp(input(r. x, r. y, c), 0, 255); f(lum, c) += 1; f. estimate(i, 0, 256). estimate(c, 0, 3); Pipeline(f). auto_schedule(get_host_target()); } f. realize(output); } Internet of Things Group 24
K-means in Halide 256 colors Internet of Things Group 4 colors 25
K-means in Halide Func clusters. Func("clusters. Func"), kmeans("kmeans"); Buffer<uint 8_t> clusters({k}); Buffer<uint 8_t> input((uint 8_t*)src, width * height); Func clusters. Map("clusters. Map"); Var x("x"); RDom r(0, k); clusters. Map(x) = argmin(abs(cast<int 16_t>(input(x)) cast<int 16_t>(clusters(r))))[0]; Internet of Things Group 26
K-means in Halide Var i("i"); RDom im(input); Func count("count"); count(i) = 0; Expr cluster. Id = clamp(clusters. Map(im), 0, k - 1); count(cluster. Id) += 1; Func s("s"); s(i) = 0; s(cluster. Id) += cast<uint 32_t>(input(im)); Internet of Things Group 27
K-means in Halide clusters. Func(i) = cast<uint 8_t>(s(i) / max(count(i), 1)); clusters. Func. estimate(i, 0, k); kmeans(x) = clusters. Func(clamp(clusters. Map(x), 0, k - 1)); kmeans. estimate(x, 0, width * height); Pipeline({clusters. Func, kmeans}). auto_schedule(get_host_target()); for (int i = 0; i < k; ++i) clusters(i) = rand() % 256; for (int i = 0; i < 14; ++i) clusters. Func. realize(clusters); Buffer<uint 8_t> output((uint 8_t*)dst, width * height); kmeans. realize(output); Internet of Things Group 28
Run Halide code on GPU Internet of Things Group 29
void gamma(const uint 8_t* src, uint 8_t* dst, int height, int width, float gamma, bool gpu) { static Func f("correction"); static Buffer<uint 8_t> inp((uint 8_t*)src, 3, width, height), out(dst, 3, width, height); if (!f. defined()) { Var x("x"), y("y"), c("c"), xo("xo"), xi("xi"), yo("yo"), yi("yi"); f(c, x, y) = Halide: : cast<uint 8_t>(255 * pow(inp(c, x, y) * 1. 0 f / 255, gamma)); Halide: : Target t = Halide: : get_host_target(); if (gpu) { t. set_feature(Halide: : Target: : Open. CL); f. bound(x, 0, width). bound(y, 0, height). bound(c, 0, 3). split(x, xo, xi, 16). split(y, yo, yi, 16). reorder(xi, yi, c, xo, yo). gpu_blocks(c, xo, yo). gpu_threads(xi, yi); } else { f. estimate(x, 0, width). estimate(y, 0, height). estimate(c, 0, 3); Pipeline(f). auto_schedule(t); } f. compile_jit(t); } inp. set_host_dirty(); f. realize(out); out. copy_to_host(); } Internet of Things Group 30
Ahead-of-time compilation Func f("bgr 2 gray") Halide Internet of Things Group Intermediate representation (IR) bgr 2 gray. a (*. lib) bgr 2 gray. o (*. obj) LLVM bgr 2 gray. h 31
Ahead-of-time compilation uint 16_t R 2 GRAY = 77, G 2 GRAY = 150, B 2 GRAY = 29; Func f("bgr 2 gray"); Image. Param input(Halide: : UInt(8), 3, "input"); input. dim(0). set_bounds_estimate(0, 3). set_stride(1); input. dim(1). set_bounds_estimate(0, 640). set_stride(3); input. dim(2). set_bounds_estimate(0, 480). set_stride(3*640); Var x("x"), y("y"); Expr b = cast<uint 16_t>(input(0, x, Expr g = cast<uint 16_t>(input(1, x, Expr r = cast<uint 16_t>(input(2, x, f(x, y) = cast<uint 8_t>((R 2 GRAY * r y)); + G 2 GRAY * g + B 2 GRAY * b) >> 8); f. estimate(x, 0, 640). estimate(y, 0, 480); Pipeline(f). auto_schedule(get_host_target()); f. compile_to_static_library("bgr 2 gray", {input}, "bgr 2 gray"); Internet of Things Group 32
#include <opencv 2/opencv. hpp> #include "bgr 2 gray. h" int main(int argc, char** argv) { cv: : Mat frame(480, 640, CV_8 UC 3), res(480, 640, CV_8 UC 1); halide_buffer_t inp. Buffer; inp. Buffer. type = halide_type_t(halide_type_uint, 8); inp. Buffer. host = frame. data; halide_dimension_t inp. Dims[] = {halide_dimension_t(0, 3, 1), halide_dimension_t(0, 640, 3), halide_dimension_t(0, 480, 3*640)}; inp. Buffer. dim = &inp. Dims[0]; inp. Buffer. dimensions = 3; halide_buffer_t out. Buffer; out. Buffer. type = halide_type_t(halide_type_uint, 8); out. Buffer. host = res. data; halide_dimension_t out. Dims[] = {halide_dimension_t(0, 640, 1), halide_dimension_t(0, 480, 640)}; out. Buffer. dim = &out. Dims[0]; out. Buffer. dimensions = 2; cv: : Video. Capture cap(0); while (cv: : wait. Key(1) < 0) { cap >> frame; if (frame. empty()) break; bgr 2 gray(&inp. Buffer, &out. Buffer); cv: : imshow("Output", res); } return 0; } Internet of Things Group 33
Summary § One code – many devices § Split algorithm and optimization § High-level computations management § Domain-specific compilation with no Halide dependency § Write algorithms “by definition” Internet of Things Group 34
Q&A Internet of Things Group 35
Internet of Things Group 36
- Seviye talip kimin eseri
- Halide image processing
- Neighborhood processing
- Point processing in image processing example
- Histogram processing in digital image processing
- Nonlinear image processing
- Point processing in image processing
- Morphological
- Translate
- Optimum notch filter in image processing
- Compression models in digital image processing
- Key stages in digital image processing
- Subjective fidelity criteria in digital image processing
- Image sharpening in digital image processing
- Geometric transformation in digital image processing
- Zooming and shrinking in digital image processing
- Image transforms in digital image processing
- Maketform matlab
- Noise
- Dmitry zagryazhsky
- Dmitry chikhachev
- Dmitry shatilov
- Dmitry lyalin
- Dmitry koshelev
- Dmitry mendeleyev
- Future dmitry
- Write a note on elimination reaction
- Hydrogen halide
- Symmetrical alkene and unsymmetrical alkene
- Ethyl benzoate ir
- Test for halide
- Addition reaction of alkenes
- Primary alkyl halide
- Functional group of acyl chloride
- T trimpe 2008 http sciencespot net answers
- Alcohol to chloride
- Aryl halide formula