1 Building a Visual Hierarchy Andrew Smith University

2 Outline Building A Visual Hierarchy Learning layer-by-layer Inference – filling in a missing segment of an image Examples Applications/Products & Future work © 2009 Robert Hecht-Nielsen. All rights reserved.

3 Choosing an appropriate problem We want to: Model human visual processes. Understand Build practical applications. Begin vision in terms of Confabulation Theory. basis for much deeper research. Answer: Build image modeling system. Represent images in terms of textural components (low statistical order). Represent © 2009 Robert Hecht-Nielsen. All rights reserved. images as symbolic (discrete) tuples.

Machine Vision vs. Biological Vision Machine Vision Pixels --- local representation. Orthogonal Biological Vision Filter/Feature Massively © 2009 Robert Hecht-Nielsen. All rights reserved. responses overcomplete/non-orthogonal 4

Confabulation & vision (Pixels → Modules & Symbols) Features (symbols) develop in a layer of the hierarchy as commonly seen inputs from their inputs. Knowledge links are simple conditional probabilities between symbols: p( | ) where and are symbols in connected modules All knowledge can therefore be learned by simple co-occurrence counting. p( | ) = C( , ) / C( ) Confabulation operations: evidence, find the answer that maximizes: Given p( | ) © 2009 Robert Hecht-Nielsen. All rights reserved. 5

6 Building a vision hierarchy • Can no longer use SSE to evaluate model [ SSE maximizes p( | , , ) ] • Instead, make use of generative model: – Always be able to generate a plausible image. © 2009 Robert Hecht-Nielsen. All rights reserved.

7 Data set • 4, 300 1. 5 Mpix natural images (BW) © 2009

8 Vision Hierarchy – level “ 0” We know the first transformation from neuroscience research: simple cells approximate Gabor filters. 5 scales, 16 orientations (odd + even) Parameters Same © 2009 Robert Hecht-Nielsen. All rights reserved. picked to closely resemble feline simple cells. approach is used elsewhere in lab. [Minnett, et al[.

9 Vision Hierarchy – level “ 0” • Does the full convolution preserve information

10 Vision Hierarchy – level “ 0” • We can do even better by

11 Vision Hierarchy – level “ 0” • Supersampling RMSE: 1 x: 2 x:

12 Inverting Gabor Representations Studied by Daugman Simple cells (found in 1950 s) re-represent “pixel” data, were first characterized by Daugman as Gabor Logons in 1980's. Attempted “not to answer “How much information is lost? ” much!” -- Able to completely reconstruct images. (i. e. what we've just seen in previous few slides) Frame Analysis can show: Can mathematically prove when complete inversion is possible. Optimal © 2009 Robert Hecht-Nielsen. All rights reserved. linear inverse.

13 Vision Hierarchy – level 1 • We now have a simple-cell like representation. • How to create a symbolic representation (“Complex Cells”)? • Apply principle of Confabulation Theory: Collect common sets of inputs from simple cells: similar to a Vector Quantizer. • Keep the 5 -scales separate – (quantize 16 -dimensions, not 80) © 2009 Robert Hecht-Nielsen. All rights reserved.

14 Vision Hierarchy – level 1 • To create actual symbols, we use a vector quantizer – Trade-offs (threshold of quantizer) : Number of symbols Preservation of information Probability accuracy • Solution Use angular distance metric (dot-product) – Keep only symbols that occurred in training set more than 200 times, to get accurate p( ). – After training, ~95% of samples should be within threshold of at least one symbol. – Pick a threshold so images can be plausibly generated. © 2009 Robert Hecht-Nielsen. All rights reserved.

16 Vision Hierarchy – level 1 • Symbolic representation can generate plausible images: • A theory of animal vision that actually demonstrates that animals can see! © 2009 Robert Hecht-Nielsen. All rights reserved.

17 Vision Hierarchy – level 1 • ~8, 000 symbols are learned for each of the 5 scales. • Complex local features develop. (unlike PCA rerepresentations & ICA representations) © 2009 Robert Hecht-Nielsen. All rights reserved.

18 Vision Hierarchy – level 1 • Now image is rerepresented as 5 “planes”

19 Knowledge links: • Learn which symbols may be next to which symbols (conditional probabilities) • Learn which symbols may be over/under which symbols. • Go out to ‘radius’ 7. Consistent with cortical representation of knowledge Very large (10 s of GB) set of knowledge. © 2009 Robert Hecht-Nielsen. All rights reserved.

20 Texture modeling – (inference) What if a portion of our image symbol representation is damaged? Blind spot CCD defect brain lesion We can use confabulation (generation) to infer a plausible replacement. © 2009 Robert Hecht-Nielsen. All rights reserved.

21 Texture modeling – Inference 1 • Fill in missing region by confabulating from

22 Texture modeling © 2009 Robert Hecht-Nielsen. All rights reserved.

24 Texture modeling © 2009 Robert Hecht-Nielsen. All rights reserved.

25 More Examples 1/7 © 2009 Robert Hecht-Nielsen. All rights reserved. (find the replacements)

26 More Examples 1/7 © 2009 Robert Hecht-Nielsen. All rights reserved. (replacement locations)

27 More Examples 2/7 © 2009 Robert Hecht-Nielsen. All rights reserved. (find the replacements)

28 More Examples 2/7 © 2009 Robert Hecht-Nielsen. All rights reserved. (replacement locations)

29 More Examples 3/7 © 2009 Robert Hecht-Nielsen. All rights reserved. (find the replacements)

30 More Examples 3/7 © 2009 Robert Hecht-Nielsen. All rights reserved. (replacement locations)

31 More Examples 4/7 © 2009 Robert Hecht-Nielsen. All rights reserved. (find the replacements)

32 More Examples 4/7 © 2009 Robert Hecht-Nielsen. All rights reserved. (replacement locations)

33 More Examples 5/7 © 2009 Robert Hecht-Nielsen. All rights reserved. (find the replacements)

34 More Examples 5/7 © 2009 Robert Hecht-Nielsen. All rights reserved. (replacement locations)

35 More Examples 6/7 © 2009 Robert Hecht-Nielsen. All rights reserved. (find the replacements)

36 More Examples 6/7 © 2009 Robert Hecht-Nielsen. All rights reserved. (replacement locations)

37 Texture modeling Conclusions This visual hierarchy does an excellent job at capturing an image up to a certain order of complexity. Given this visual hierarchy and its learned knowledge links, missing regions could plausibly be filled in. This could be a reasonable explanation for what animals do. Preparing for publication (IEEE Transactions on Image Processing), with help from Professor Serge Belongie (CSE). Last hurdle to graduation! © 2009 Robert Hecht-Nielsen. All rights reserved.

44 The next level… Level 2 symbol hierarchy • Collect commonly recurring regions of level 1 symbols. • Symbols at Level 2 will fit together like puzzle pieces. © 2009 Robert Hecht-Nielsen. All rights reserved. Thank you!