Camera Calibration Stereo Reconstruction Jinxiang Chai 3 D

Camera Calibration & Stereo Reconstruction Jinxiang Chai

3 D Computer Vision The main goal here is to reconstruct geometry of 3 D worlds.

How can we estimate the camera parameters? - Where is the camera located? - Which direction is the camera looking at? - Focal length, projection center, aspect ratio?

Stereo reconstruction Given two or more images of the same scene or object, compute a representation of its shape known camera viewpoints How can we estimate camera parameters?

Camera calibration Augmented pin-hole camera - focal point, orientation - focal length, aspect ratio, center, lens distortion Known 3 D Classical calibration - 3 D 2 D - correspondence Camera calibration online resources

Camera and calibration target

Classical camera calibration Known 3 D coordinates and 2 D coordinates - known 3 D points on calibration targets - find corresponding 2 D points in image using feature detection algorithm

Camera parameters Known 3 D coords and 2 D coords u v 1 sx 0 0 а -sy 0 Viewport proj. u 0 v 0 1 Perspective proj. View trans.

Camera parameters Known 3 D coords and 2 D coords u v 1 sx 0 0 а -sy 0 Viewport proj. u 0 v 0 1 Perspective proj. View trans. Intrinsic camera extrinsic camera parameters (5 parameters) parameters (6 parameters)

Camera matrix Fold intrinsic calibration matrix K and extrinsic pose parameters (R, t) together into a camera matrix M = K [R | t ] (put 1 in lower r. h. corner for 11 d. o. f. )

Camera matrix calibration Directly estimate 11 unknowns in the M matrix using known 3 D points (Xi, Yi, Zi) and measured feature positions (ui, vi)

Camera matrix calibration Linear regression: • Bring denominator over, solve set of (over-determined) linear equations. How?

Camera matrix calibration Linear regression: • Bring denominator over, solve set of (over-determined) linear equations. How? • Least squares (pseudo-inverse) - 11 unknowns (up to scale) - 2 equations per point (homogeneous coordinates) - 6 points are sufficient

Nonlinear camera calibration Perspective projection:

Nonlinear camera calibration Perspective projection: K R T P

Nonlinear camera calibration Perspective projection: K R T P 2 D coordinates are just a nonlinear function of its 3 D coordinates and camera parameters:

Multiple calibration images Find camera parameters which satisfy the constraints from M images, N points: for j=1, …, M for i=1, …, N This can be formulated as a nonlinear optimization problem:

Multiple calibration images Find camera parameters which satisfy the constraints from M images, N points: for j=1, …, M for i=1, …, N This can be formulated as a nonlinear optimization problem: Solve the optimization using nonlinear optimization techniques: - Gauss-newton - Levenberg-Marquardt

Nonlinear approach Advantages: • can solve for more than one camera pose at a time • fewer degrees of freedom than linear approach • Standard technique in photogrammetry, computer vision, computer graphics - [Tsai 87] also estimates lens distortions (freeware @ CMU) http: //www. cs. cmu. edu/afs/cs/project/cil/ftp/html/v-source. html Disadvantages: • more complex update rules • need a good initialization (recover K [R | t] from M)

How can we estimate the camera parameters?

Application: camera calibration for sports video images [Farin et. Al] Court model

Stereo matching Given two or more images of the same scene or object as well as their camera parameters, how to compute a representation of its shape? What are some possible representations for shapes? • depth maps • volumetric models • 3 D surface models • planar (or offset) layers

Outline Stereo matching - Traditional stereo - Active stereo Volumetric stereo - Visual hull - Voxel coloring - Space carving

Readings Stereo matching • 11. 1, 11. 2, . 11. 3, 11. 5 in Sezliski book • D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47(1/2/3): 742, April-June 2002.

Stereo scene point image plane optical center

Stereo Basic Principle: Triangulation • Gives reconstruction as intersection of two rays • Requires > calibration > point correspondence

Stereo correspondence Determine Pixel Correspondence • Pairs of points that correspond to same scene point epipolar line epipolar plane epipolar line Epipolar Constraint • Reduces correspondence problem to 1 D search along conjugate epipolar lines • Java demo: http: //www. ai. sri. com/~luong/research/Meta 3 DViewer/Epipolar. Geo. html

Stereo image rectification

Stereo image rectification • • • reproject image planes onto a common plane parallel to the line between optical centers pixel motion is horizontal after this transformation two homographies (3 x 3 transform), one for each input image reprojection Ø C. Loop and Z. Zhang. Computing Rectifying Homographies for Stereo Vision. IEEE Conf. Computer Vision and Pattern Recognition, 1999.

Rectification Original image pairs Rectified image pairs

Stereo matching algorithms Match Pixels in Conjugate Epipolar Lines • Assume brightness constancy • This is a tough problem • Numerous approaches > A good survey and evaluation: http: //www. middlebury. edu/stereo/

Your basic stereo algorithm For each epipolar line For each pixel in the left image • compare with every pixel on same epipolar line in right image • pick pixel with minimum matching cost Improvement: match windows • • This should look familiar. . (cross correlation or SSD) Can use Lukas-Kanade or discrete search (latter more common)

Window size W=3 Effect of window size • Smaller window + - • Larger window + - W = 20

More constraints? We can enforce more constraints to reduce matching ambiguity - smoothness constraints: computed disparity at a pixel should be consistent with neighbors in a surrounding window. - uniqueness constraints: the matching needs to be bijective - ordering constraints: e. g. , computed disparity at a pixel should not be larger than the disparity of its right neighbor pixel by more than one pixel.

Stereo results • Data from University of Tsukuba • Similar results on other images without ground truth Scene Ground truth

Results with window search Window-based matching (best window size) Ground truth

Better methods exist. . . A better method Boykov et al. , Fast Approximate Energy Minimization via Graph Cuts, International Conference on Computer Vision, September 1999. Ground truth

More recent development High-Quality Single-Shot Capture of Facial Geometry [siggraph 2010, project website] - capture high-fidelity facial geometry from multiple cameras - pairwise stereo reconstruction between neighboring cameras - hallucinate facial details

More recent development High Resolution Passive Facial Performance Capture [siggraph 2010, project website] - capture dynamic facial geometry from multiple video cameras - spatial stereo reconstruction for every frame - building temporal correspondences across the entire sequence

Stereo reconstruction pipeline Steps • • Calibrate cameras Rectify images Compute disparity Estimate depth

Stereo reconstruction pipeline Steps • • Calibrate cameras Rectify images Compute disparity Estimate depth What will cause errors? • • • Camera calibration errors Poor image resolution Occlusions Violations of brightness constancy (specular reflections) Large motions Low-contrast image regions

Outline Stereo matching - Traditional stereo - Active stereo Volumetric stereo - Visual hull - Voxel coloring - Space carving

Active stereo with structured light Li Zhang’s one-shot stereo camera 1 projector camera 2 Project “structured” light patterns onto the object • simplifies the correspondence problem

Active stereo with structured light

Laser scanning Digital Michelangelo Project http: //graphics. stanford. edu/projects/mich/ Optical triangulation • Project a single stripe of laser light • Scan it across the surface of the object • This is a very precise version of structured light scanning

Laser scanned models The Digital Michelangelo Project, Levoy et al.

Recent development Capturing dynamic facial movement using active stereo [project website] - use synchronized video cameras and structured light projectors to capture dynamic facial geometry - use a generic 3 D model to build temporal correspondences across the entire sequence