Computer Vision Part I Background Introduction Vision History

Computer Vision • Part I – Background: Introduction, Vision, History, Application, Development and etc. • Part II – Low-Level Vision: Image Representation, Acquisition, Image Feature Extraction, Stereo Vision and etc. • Part III – High-Level Vision: Pattern Analysis, Image Segmentation, CBIR, Object Recognition, Video Analysis and etc.

Computer Vision Lecturer One -- Introduction • What is Computer Vision – use some examples to illustrate and explain “Computer Vision”. • “Vision” – the history of the understanding of vision, and David Marr’s Vision theory • Computer Vision System – a typical Computer Vision System, and an example to explain how the system works.

Computer Vision – Part I What is Computer Vision • What is Computer Vision – definitions (descriptions) – The science and technology concerned with computational understanding and use of the information present in visual images. • Examples of computer vision

Computer Vision • Purpose of Computer Vision: extract useful information from digital media data. • Input computer vision algorithms Output • Input: digital camera images, video sequences, medical images and etc. • Output: depends on different situations.

Computer Vision • Examples: Face Recognition

Computer Vision • Examples: Video Surveillance

Computer Vision • Examples: Medical Image Analysis

Computer Vision • Limitations: an example from google image search

Special effects: shape and motion capture Source: S. Seitz

3 D urban modeling Bing maps, Google Streetview Source: S. Seitz

3 D urban modeling: Microsoft Photosynth http: //labs. live. com/photosynth/ Source: S. Seitz

Face detection • Many new digital cameras now detect faces – Canon, Sony, Fuji, … Source: S. Seitz

Smile detection Sony Cyber-shot® T 70 Digital Still Camera Source: S. Seitz

Face recognition: Apple i. Photo software http: //www. apple. com/ilife/iphoto/

Biometrics How the Afghan Girl was Identified by Her Iris Patterns Source: S. Seitz

Biometrics Fingerprint scanners on many new laptops, other devices Face recognition systems now beginning to appear more widely http: //www. sensiblevision. com/ Source: S. Seitz

Optical character recognition Technology to convert scanned docs to text • If you have a scanner, (OCR) it probably came with OCR software Digit recognition, AT&T labs License plate readers http: //en. wikipedia. org/wiki/Automatic_number_plate_recognition Source: S. Seitz

Mobile visual search: Google Goggles

Mobile visual search: i. Phone Apps

Automotive safety • Mobileye: Vision systems in high-end BMW, GM, Volvo models – “In mid 2010 Mobileye will launch a world's first application of full emergency braking. Source: for. A. collision Shashua, S. Seitz

Vision in supermarkets Lane. Hawk by Evolution. Robotics “A smart camera is flush-mounted in the checkout lane, continuously watching for items. When an item is detected and recognized, the cashier verifies the quantity of items that were found under the basket, and continues to close the transaction. The item can remain under the basket, and with Lane. Hawk, you are assured to get paid for it… “ Source: S. Seitz

Vision-based interaction (and games) Sony Eye. Toy Nintendo Wii has camera-based IR tracking built in. See Lee’s work at CMU on clever tricks on using it to create a multi-touch display! Assistive technologies Source: S. Seitz

Vision for robotics, space exploration NASA'S Mars Exploration Rover Spirit captured this westward view from atop a low plateau where Spirit spent the closing months of 2007. Vision systems (JPL) used for several tasks • • Panorama stitching 3 D terrain modeling Obstacle detection, position tracking For more, read “Computer Vision on Mars” by Matthies et al. Source: S. Seitz

Computer Vision • Conclusion: – Computer Vision community try to make the machines to have human vision ability and intelligence. – Many useful algorithms, methods and applications has been proposed. However, this is not enough. It still has a long way to go. – Computer Vision is a fast developing research area. – Computer Vision has played important role in our society and will become more important in the future. • Related Area: – Mathematics, Pattern Recognition, Machine Learning, Graphics, Artificial Intelligence, Signal Processing, Neurobiology, Physics, Optics.

The goal of computer vision • To bridge the gap between pixels and “meaning” What we see What a computer sees Source: S. Narasimhan

Vision as measurement device Real-time stereo Structure from motion Reconstruction from Internet photo collections NASA Mars Rover Pollefeys et al. Goesele et al.

Vision as a source of semantic information slide credit: Fei-Fei, Fergus & Torralba

Object categorization sky building flag banner bus face street lamp cars wall bus slide credit: Fei-Fei, Fergus & Torralba

Scene and context categorization • outdoor • city • traffic • … slide credit: Fei-Fei, Fergus & Torralba

Why study computer vision? • Vision is useful: Images and video are everywhere! Personal photo albums Surveillance and security Movies, news, sports Medical and scientific images

Computer Vision • Typical Computer Vision Tasks: – Recognition related: Object Recognition, Identification, Detection – Applications: Face Recognition, Fingerprint Recognition, Content Based Image Retrieval, Character Recognition. – Motion Analysis: Tracking, Optical Flow, Egomotion, 3 D Reconstruction. – Scene Analysis: Image Scene Analysis, Video Scene Analysis – Entertainment related: Movie Special Effect, Image Restoration. – Other Areas: Medical Image, Robotics, Aeronautics, Astronautics, mobile, even our normal life. • History of Computer Vision – A relatively new science, about 30 years old. – Some famous groups: Berkeley, Caltech, UIUC, Oxford, Cambridge, INRIA, Microsoft, Mitsubishi… – So many excellent, diligent scientists and students.

Ridiculously brief history of computer vision • 1966: Minsky assigns computer vision as an undergrad summer project • 1960’s: interpretation of synthetic worlds • 1970’s: some progress on interpreting selected images • 1980’s: ANNs come and go; shift toward geometry and increased mathematical rigor • 1990’s: face recognition; statistical analysis in vogue • 2000’s: broader recognition; large annotated datasets available; video processing starts Guzman ‘ 68 Ohta Kanade ‘ 78 Turk and Pentland ‘ 91

Computer Vision • Since it is a research area: – Journals: IEEE: Pattern Analysis and Machine Intelligence; Image Processing; Multimedia; System, Man, Cybernetic… Springer: International Journal of Computer Vision; Neurocomputation… Elsevier: Pattern Recognition; Computer Vision and Image Understanding; Image and Vision Computing; Neurocomputing… – Conference: Computer Vision and Pattern Recognition; International Conference on Computer Vision; International Conference on Image Processing; International Conference on Pattern Recognition; European Conference on Computer Vision; British Machine Vision Conference… – Most computer vision technologies are from the papers mentioned above. – In our course, we will have some reading assignment.

Computer Vision – Part II Vision • Human Visual System – We try to let the machines to have human vision ability, so it is important for us have an understanding about human visual system. – The understanding of human visual system is still an developing research. – In this part, we briefly introduce how human percept the environment through their visual system. Optical nerve 视神经 Iris 虹膜 Lens 晶状体 Cornea 角膜 Retina 视网膜

Computer Vision • Visual Perception Optical nerve 视神经 Iris 虹膜 Lens 晶状体 Cornea 角膜 Retina 视网膜 – Visual perception is the ability to interpret information and surroundings from visible light reaching the eye. The resulting perception is also known as eyesight, sight or vision. – The components involved in vision are referred to collectively as the visual system.

Computer Vision • Visual Perception Optical nerve 视神经 Iris 虹膜 Lens 晶状体 Cornea 角膜 Retina 视网膜 – The act of seeing starts when the lens of the eye focuses an image of its surroundings onto a light-sensitive membrane in the back of the eye, called the retina. The retina is actually part of the brain that is isolated to serve as a transducer for the conversion of patterns of light into neuronal signals. The lens of the eye focuses light on the photoreceptive cells of the retina, which detect the photons of light and respond by producing neural impulses. These signals are processed in a hierarchical fashion by different parts of the brain, from the retina to the lateral geniculate nucleus, to the primary and secondary visual cortex of the brain.

Computer Vision • Study of Visual Perception • Early studies on visual perception – ancient Greek (2500 years) – “emission theory”: vision occurs when rays emanate from the eye and are intercepted by visual objects. – “intromission” vision is coming from something entering the eyes representative of the object. • Ibn al-Haytham theory, foundation of modern research (1021) – – – Vision is due to light from objects entering the eyes. Vision occurs in the brain rather than the eyes. Personal experience has an effect on what people see and how they see. Vision and perception are subjective. He proved his arguments by experiments. Hermann von Helmholtz (1850) – vision could only be the result of some form of unconscious inferences: a matter of making assumptions and conclusions from incomplete data, based on previous experience.

Computer Vision • Geatalt Principal (20 century) – “Gestalt” in German means “essence or shape of an entity’s complete form” – Properties: Emergence, Reification, Multistability, Invariance – Emergence（整体化） – Emergence is the process of complex pattern formation from simpler rules. It demonstrated by the perception of the Dog Picture, which depicts a Dalmatian dog sniffing the ground in the shade of overhanging trees. The dog is not recognized by first identifying its parts (feet, ears, nose, tail, etc. ), and then inferring the dog from those component parts. Instead, the dog is perceived as a whole, all at once.

Computer Vision • Geatalt Principal (20 century) – Reification（物化，具体化） – Reification is the constructive or generative aspect of perception, by which the experienced percept contains more explicit spatial information than the sensory stimulus on which it is based. For instance, a triangle will be perceived in picture A, although no triangle has actually been drawn. In pictures B and D the eye will recognize disparate shapes as "belonging" to a single shape, in C a complete three dimensional shape is seen, where in actuality no such thing is drawn.

Computer Vision • Geatalt Principal (20 century) – Multistability (or multistable perception) is the tendency of ambiguous perceptual experiences to pop back and forth unstably between two or more alternative interpretations. This is seen for example in the Necker cube, and in Rubin's Figure/Vase illusion shown here.

Computer Vision • Geatalt Principal (20 century) – Invariance is the property of perception whereby simple geometrical objects are recognized independent of rotation, translation, and scale; as well as several other variations such as elastic deformations, different lighting, and different component features. For example, the objects in A in the figure all immediately recognized as the same basic shape, which are immediately distinguishable from the forms in B. They are even recognized despite perspective and elastic deformations as in C, and when depicted using different graphic elements as in D.

Computer Vision • Geatalt Principal (20 century) – Emergence, reification, multistability, and invariance are not necessarily separable modules to be modeled individually, but they could be different aspects of a single unified dynamic mechanism. – The fundamental principle of gestalt perception is the law of prägnanz (German for pithiness) which says that we tend to order our experience in a manner that is regular, orderly, symmetric, and simple. – Gestalt psychology also has applications in computer vision for trying to make computers "see" the same things as humans do. – Law of Closure Law of Proximity

Computer Vision • David Marr’s theory – The major problem of the Gestalt principals is that they are descriptive not explanatory. For example, you can’t explain how humans see continuous contours by simple stating that the brain “prefers good continuity”. – In the 1970 s David Marr (U. K. ) developed a multi-level theory of vision, which analyzed the process of vision at different levels of abstraction. – Marr suggested that it is possible to investigate vision at any of these levels independently. – Marr described vision as from a two-dimensional visual array (on the retina) to a three-dimensional description of the world as output.

Computer Vision • David Marr’s theory – a 2 D or primal sketch of the scene, based on feature extraction of fundamental components of the scene, including edges, regions, etc. Note the similarity in concept to a pencil sketch drawn quickly by an artist as an impression. – a 2. 5 D sketch of the scene, where textures are acknowledged, etc. Note the similarity in concept to the stage in drawing where an artist highlights or shades areas of a scene, to provide depth. – a 3 D model, where the scene is visualized in a continuous, 3 -dimensional map. – 2. 5 D sketch is related to stereopsis, optic flow, and motion parallax. The 2. 5 D sketch represents that in reality we do not see all of our surroundings but construct the viewer-centered three dimensional view of our environment.

Computer Vision – Part III Computer Vision System • Image acquisition – Collect images from digital equipments. • Pre-processing – Pre-process the data in order to assure that it satisfies certain assumptions implied by the method. i. e. noise reduction, contrast enhancement, re-sampling. . • Feature Extraction – Invariant features for further processing, i. e. lines, edges, corners. . . • Mid Level Processing – Detection, segmentation. . For example: selection of a specific set of interest points. • High Level Processing – Normally this step’s input is a small set of data. Verification, classification, estimation parameters. .

Feature Extraction：After set of process, we use histogram features for iris recognition. Reduce the data dimensional.

Identification by using pattern recognition method. ？？？

Conclusion • • • We briefly understand the Computer Vision Read the book (the first chapter), and some introduction materials Computer Vision try to understand digital images. Computer Vision has many important applications now. Computer Vision is still fast going area. This course is certainly not enough if you are interested in Computer Vision. • We need to try our best to study and work.