3 D Computer Vision 3 D Vision and

  • Slides: 23
Download presentation
3 D Computer Vision 3 D Vision and Video Computing CSC I 6716 Spring

3 D Computer Vision 3 D Vision and Video Computing CSC I 6716 Spring 2004 Topic 8 of Part 2 Visual Motion (I) Zhigang Zhu, NAC 8/203 A http: //www-cs. engr. ccny. cuny. edu/~zhu/Vision. Course-2004. html Cover Image/video credits: Rick Szeliski, MSR

3 D Computer Vision and Video Computing n Problems and Applications (Topic 8 Motion

3 D Computer Vision and Video Computing n Problems and Applications (Topic 8 Motion I) l l n l l Optical flow equation and the aperture problem Estimating optical flow 3 D motion & structure from optical flow Feature-based Approach (Topic 8 Motion II) l l l n Basics – Notations and Equations Three Important Special Cases: Translation, Rotation and Moving Plane Motion Parallax Optical Flow (Topic 8 Motion II) l n The importance of visual motion Problem Statement The Motion Field of Rigid Motion (Topic 8 Motion I) l n Outline of Motion Two-frame algorithm Multi-frame algorithm Structure from motion – Factorization method Advanced Topics (Topic 8 Motion II; Part 3) l l l Spatio-Temporal Image and Epipolar Plane Image Video Mosaicing and Panorama Generation Motion-based Segmentation and Layered Representation

3 D Computer Vision and Video Computing The Importance of Visual Motion n Structure

3 D Computer Vision and Video Computing The Importance of Visual Motion n Structure from Motion l Apparent motion is a strong visual clue for 3 D reconstruction n n Recognition by motion (only) l Biological visual systems use visual motion to infer properties of 3 D world with little a priori knowledge of it n n More than a multi-camera stereo system Blurred image sequence Visual Motion = Video ! [Go to CVPR 2004 for Workshops] l Video Coding and Compression: MPEG 1, 2, 4, 7… l Video Mosaicing and Layered Representation for IBR l Surveillance (Human Tracking and Traffic Monitoring) l HCI using Human Gesture (video camera) l Automated Production of Video Instruction Program (VIP) l Video Texture for Image-based Rendering l …

3 D Computer Vision and Video Computing Human Tracking moving subjects from video of

3 D Computer Vision and Video Computing Human Tracking moving subjects from video of a stationary camera… W 4 - Visual Surveillance of Human Activity From: Prof. Larry Davis, University of Maryland http: //www. umiacs. umd. edu/users/lsd/vsam. html

3 D Computer Vision and Video Computing Blurred Sequence Recognition by Actions: Recognize object

3 D Computer Vision and Video Computing Blurred Sequence Recognition by Actions: Recognize object from motion even if we cannot distinguish it in any images … An up-sampling from images of resolution 15 x 20 pixels From: James W. Davis. MIT Media Lab http: //vismod. www. media. mit. edu/~jdavis/Motion. Templates/m otiontemplates. html

3 D Computer Vision and Video Computing Video Mosaicing Video of a moving camera

3 D Computer Vision and Video Computing Video Mosaicing Video of a moving camera = multi-frame stereo with multiple cameras… Stereo Mosaics from a single video sequence From: Z. Zhu, E. M. Riseman, A. R. Hanson, Parallel-perspective stereo mosaics, The Eighth IEEE International Conference on Computer Vision, Vancouver, Canada, July 2001, vol I, 345 -352. http: //www-cs. engr. ccny. cuny. edu/~zhu/ Stereo. Mosaic. html

3 D Computer Vision and Video Computing Video in Classroom/Auditorium An application in e-learning:

3 D Computer Vision and Video Computing Video in Classroom/Auditorium An application in e-learning: Analyzing motion of people as well as control the motion of the camera… n Demo: Bellcore Autoauditorium l A Fully Automatic, Multi-Camera System that Produces Videos Without a Crew l http: //www. autoauditorium. com/

3 D Computer Vision and Video Computing Vision Based Interaction Motion and Gesture as

3 D Computer Vision and Video Computing Vision Based Interaction Motion and Gesture as Advanced Human-Computer Interaction (HCI)…. Demo Microsoft Research Vision based Interface by Matthew Turk

3 D Computer Vision and Video Computing Video Texture Image (video) -based rendering: realistic

3 D Computer Vision and Video Computing Video Texture Image (video) -based rendering: realistic synthesis without “vision”… Video Textures are derived from video by using the finite duration input clip to generate a smoothly playing infinite video. From: Arno Schödl, Richard Szeliski, David H. Salesin, and Irfan Essa. Video textures. Proceedings of SIGGRAPH 2000, pages 489 -498, July 2000 http: //www. gvu. gatech. edu/perception/projects/videotexture/

3 D Computer Vision and Video Computing n n n Problem Statement Two Subproblems

3 D Computer Vision and Video Computing n n n Problem Statement Two Subproblems l Correspondence: Which elements of a frame correspond to which elements in the next frame? l Reconstruction : Given a number of correspondences, and possibly the knowledge of the camera’s intrinsic parameters, how to recovery the 3 -D motion and structure of the observed world Main Difference between Motion and Stereo l Correspondence: the disparities between consecutive frames are much smaller due to dense temporal sampling l Reconstruction: the visual motion could be caused by multiple motions ( instead of a single 3 D rigid transformation) The Third Subproblem, and Fourth…. l Motion Segmentation: what are the regions the image plane corresponding to different moving objects? l Motion Understanding: lip reading, gesture, expression, event…

3 D Computer Vision and Video Computing n Two Subproblems l Correspondence: n n

3 D Computer Vision and Video Computing n Two Subproblems l Correspondence: n n l Differential Methods - >dense measure (optical flow) Matching Methods -> sparse measure Reconstruction : More difficult than stereo since n n n Approaches Structure as well as motion (3 D transformation betw. Frames) need to be recovered Small baseline causes large errors The Third Subproblem l Motion Segmentation: Chicken and Egg problem n Which should be solved first? Matching or Segmentation n n Segmentation for matching elements Matching for Segmentation

3 D Computer Vision and Video Computing The Motion Field of Rigid Objects n

3 D Computer Vision and Video Computing The Motion Field of Rigid Objects n Motion: l 3 D Motion ( R, T): n n n l Image motion field: n n camera motion (static scene) or single object motion Only one rigid, relative motion between the camera and the scene (object) 2 D vector field of velocities of the image points induced by the relative motion. Motion Field of a Video Sequence (Translation) Data: Image sequence l Many frames n l Basics: only consider two consecutive frames n l captured at time t=0, 1, 2, … We consider a reference frame and its consecutive frame Image motion field n can be viewed disparity map of the two frames captured at two consecutive camera locations ( assuming we have a moving camera)

3 D Computer Vision and Video Computing The Motion Field of Rigid Objects n

3 D Computer Vision and Video Computing The Motion Field of Rigid Objects n Notations l P = (X, Y, Z)T: 3 -D point in the camera reference frame l p = (x, y, f)T : the projection of the scene point in the pinhole camera n Relative motion between P and the camera l T= (Tx, Ty, Tz)T: translation component of the motion l w=(wx, wy, wz)T: the angular velocity n P Note: l How to connect this with stereo geometry (with R, T)? l Image velocity v= ? p v Y X f O Z V

3 D Computer Vision and Video Computing The Motion Field of Rigid Objects n

3 D Computer Vision and Video Computing The Motion Field of Rigid Objects n Notations l P = (X, Y, Z)T: 3 -D point in the camera reference frame l p = (x, y, f)T : the projection of the scene point in the pinhole camera n Relative motion between P and the camera l T= (Tx, Ty, Tz)T: translation component of the motion l w=(wx, wy, wz)T: the angular velocity n Note: l How to connect this with stereo geometry (with R, T)?

3 D Computer Vision and Video Computing. Basic Equations of Motion Field n Notes:

3 D Computer Vision and Video Computing. Basic Equations of Motion Field n Notes: l Take the time derivative of both sides of the projection equation l The motion field is the sum of two components n n l Translational part Rotational part Assume known intrinsic parameters Rotation part: no depth information Translation part: depth Z

3 D Computer Vision and Video Computing n Motion Field vs. Disparity Correspondence and

3 D Computer Vision and Video Computing n Motion Field vs. Disparity Correspondence and Point Displacements Stereo Motion Disparity Motion field Displacement – (dx, dy) Differential concept – velocity (vx, vy), i. e. time derivative (dx/dt, dy/dt) No such constraint Consecutive frame close to guarantee good discrete approximation

3 D Computer Vision and Video Computing Special Case 1: Pure Translation n Pure

3 D Computer Vision and Video Computing Special Case 1: Pure Translation n Pure Translation (w =0) n Radial Motion Field (Tz <> 0) l Vanishing point p 0 =(x 0, y 0)T : n l FOE (focus of expansion) n l Vectors towards p 0 if Tz > 0 Depth estimation n n Vectors away from p 0 if Tz < 0 FOC (focus of contraction) n l motion direction depth inversely proportional to magnitude of motion vector v, and also proportional to distance from p to p 0 Parallel Motion Field (Tz= 0) l Depth estimation: n depth inversely proportional to magnitude of motion vector v Tz =0

3 D Computer Vision and Video Computing Special Case 2: Pure Rotation n Pure

3 D Computer Vision and Video Computing Special Case 2: Pure Rotation n Pure Rotation (T =0) l Does not carry 3 D information n Motion Field (approximation) l Small motion l A quadratic polynomial in image coordinates (x, y, f)T n Image Transformation between two frames (accurate) l Motion can be large l Homography (3 x 3 matrix) for all points n Image mosaicing from a rotating camera l 360 degree panorama

3 D Computer Vision and Video Computing Special Case 3: Moving Plane n Planes

3 D Computer Vision and Video Computing Special Case 3: Moving Plane n Planes are common in the man-made world n Motion Field (approximation) l Given small motion l a quadratic polynomial in image Only has 8 independent parameters (write it out!) n Image Transformation between two frames (accurate) l Any amount of motion (arbitrary) l Homography (3 x 3 matrix) for all points l See Topic 5 Camera Models n Image Mosaicing for a planar scene l Aerial image sequence l Video of blackboard

3 D Computer Vision and Video Computing n Pure Translation l l n Vanishing

3 D Computer Vision and Video Computing n Pure Translation l l n Vanishing point and FOE (focus of expansion) Only translation contributes to depth estimation Pure Rotation l l n Special Cases: A Summary Does not carry 3 D information Motion field: a quadratic polynomial in image, or Transform: Homography (3 x 3 matrix R) for all points Image mosaicing from a rotating camera Moving Plane l l l Motion field is a quadratic polynomial in image, or Transform: Homography (3 x 3 matrix A) for all points Image mosaicing for a planar scene

3 D Computer Vision and Video Computing Motion Parallax n [Observation 1] The relative

3 D Computer Vision and Video Computing Motion Parallax n [Observation 1] The relative motion field of two instantaneously coincident points l Does not depend on the rotational component of motion l Points towards (away from) the vanishing point of the translation direction n [Observation 2] The motion field of two frames after rotation compensation l only includes the translation component l points towards (away from) the vanishing point p 0 ( the instantaneous epipole) l the length of each motion vector is inversely proportional to the depth, and also proportional to the distance from point p to the vanishing point p 0 of the translation direction l Question: how to remove rotation? n Active vision : rotation known approximately? Motion Field of a Video Sequence (Translation)

3 D Computer Vision and Video Computing n Importance of visual motion (apparent motion)

3 D Computer Vision and Video Computing n Importance of visual motion (apparent motion) l l Many applications… Problems: n n Time derivative of both sides of the projection equation Three important special cases l l l n correspondence, reconstruction, segmentation, understanding in x-y-t space Image motion field of rigid objects l n Summary Pure translation – FOE Pure rotation – no 3 D information, but lead to mosaicing Moving plane – homography with arbitrary motion Motion parallax l Only depends on translational component of motion

3 D Computer Vision and Video Computing n n Optical Flow, and Estimating and

3 D Computer Vision and Video Computing n n Optical Flow, and Estimating and Using the Motion Fields Visual Motion (II) Next