Place Recognition and Lifelong Mapping Kurt Konolige James

Willow Garage • PR 2 Mobile Manipulation Platform • Open-source robotics software • ROS

From 2 D laser maps to VIEW MAPS p 1 p 2 p 3

VSLAM by VIEW MAPS -View Maps: set of stereo views connected by nonlinear gaussian

Crusher Visual Odometry Stereo: [Matthies, Lacroix, Agrawal, Comport, …] Monocular: [Nister 05, …] Multi-frame:

Place Recognition: Vocabulary Trees [Nister and Stewenius CVPR 06] - “Bag of words” retrieval

Place Recognition: Vocabulary Trees [Nister and Stewenius] - “Bag of words” - Vocab tree

Geometric Check How good a rejection filter is the geometric check?

Challenges • Robust place recognition -Use more stables features, e. g. , lines [Jana

Visual environment change Challenges for lifelong maps: • Map stitching • Map repair •

View deletion strategy 1. View clusters • Distance measure between views c(v, v’) =

Visual Odometry Stereo: [Matthies, Lacroix, Agrawal, Comport, …] Monocular: [Nister 05, …] Multi-frame: [Engels

Urban Scenes [images courtesy Andrew Comport, INRIA] o Outdoor sequence in Versailles o 1

Autonomous Off-Road Terrain Traversal LAGR [Learning Applied to Ground Robotics] 200 m autonomous traverse

Visual SLAM Optimal solution: Bundle Adjustment • ~1000 camera poses • ~1 M 3

Visual SLAM Landmarks EKF Visual SLAM [Davison 02, Sim 03, Solá 05, …] -

Vision Tasks realtime Local Maps [Andrew Comport ICRA 2007] Long-range motion estimation Global Maps

Visual Odometry for Motion Estimation Local Maps no registration Long-range motion estimation GPS-less estimation

Visual Odometry left T right • Extract features - Harris, FAST, SIFT, Cen. Sur.

Challenge of Outdoor Environments 5 Datasets - 3 km to 6 km trajectories (autonomous)

Solutions Goal: 5 m error in 5 Km (0. 1%) 1. Minimize local drift

Stable Feature Detection Corners vs. Center-surround Harris, FAST ~8 ms scaled SIFT, SURF Cen.

Error and Calibration T camera vehicle trajectory, m Camera to vehicle transform T misalignment

Results, VO 5 km runs RTK GPS Ground Truth Run 1 Run 2

IMU vs. VO 1) IMU: - High XYZ drift from accelerometers (t 2) -

VO + IMU EKF VO Filter predict EKF IMU Filter update movie. IMU. mov

VO Conclusion 1. Visual Odometry can provide precise trajectories in GPS-less environments - Good

Visual SLAM using Skeletons - Local registration is a small optimization problem (VO) Loop

Long-Baseline Matching • Match using Cen. Sure features • Good matches up to 10

Frame. SLAM Results, Versaille Rond 133 frames, 29 links 35 ms PCG VO result

Frame. SLAM Results, Indoor Lab [courtesy Robert Sim] o o Indoor lab sequence 12

Frame. SLAM Results, Indoor Lab [courtesy Robert Sim] o Indoor lab sequence o 12

Frame. SLAM Results, Crusher 5 K x 2 VO run 1 VO run 2

Small-area 3 D Reconstruction Leaving Flatland Morisset, Subramanian [SRI] Rusu [TUM]

3 D Reconstruction Pipeline VSLAM IMU, Odometry 3 D Pose estimation Stereo images Maps

Frame. SLAM Conclusion 1. VO provides accurate local registration 2. Reduction to frame-frame constraints

Slides: 44

Download presentation

Place Recognition and Lifelong Mapping Kurt Konolige, James Bowman, JD Chen, Patrick Mihelich Willow Garage Michael Colander, Vincent Lepetit, Pascal Fua Ecole Polytechnique Federal de Lausanne Konolige et al. View-Based Maps, RSS, 2009 Konolige and Bowman, Lefelong Visual Maps, IROS 2009 Konolige et al. Mapping, Navigation and Learning for Off-road Traversal, JFR, 2008 Konolige and Agrawal, Frame. SLAM: from Bundle Adjustment to Realtime Visual Mapping, TRO, 2008

Willow Garage • PR 2 Mobile Manipulation Platform • Open-source robotics software • ROS • Open. CV • Robotics and vision algorithms

From 2 D laser maps to VIEW MAPS p 1 p 2 p 3 Locally metric Global manifold

VSLAM by VIEW MAPS -View Maps: set of stereo views connected by nonlinear gaussian constraints Continuous recognition p 1 p 2 Locally metric Global manifold p 3 Continuous detection [Grisetti et al. ] Toro

Crusher Visual Odometry Stereo: [Matthies, Lacroix, Agrawal, Comport, …] Monocular: [Nister 05, …] Multi-frame: [Engels 06, Mourignon 06, …] CRUSHER Carnegie Mellon NREC Vehicle 5 km autonomous traverse Rough terrain Log file data

Place Recognition: Vocabulary Trees [Nister and Stewenius CVPR 06] - “Bag of words” retrieval - Vocab tree created offline - For recognition: - Image keypoints extracted - Tree encodes approximate NN search - Inverted index of images at leaves [Cummins and Newman ICRA 07 Cullmer et al. ACRA 08 Fraundorfer et al. IROS 07 Eade and Drummond BMVC 08 Williams et al. ICCV 07] [Image from Nister and Stewenius CVPR 06]

Place Recognition: Vocabulary Trees [Nister and Stewenius] - “Bag of words” - Vocab tree created offline - New images queried and added online Performance on Indoor dataset

Geometric Check How good a rejection filter is the geometric check?

Kidnapped Robot / Relocalization

Trajectory synthesis

Indoor VSLAM with View Maps

Place Recognition after 1 week

Challenges • Robust place recognition -Use more stables features, e. g. , lines [Jana Kosecka] -Learn discriminating features with their geometry -Relax the geometry - Sub-parts: chairs, tables can move - No geometry, e. g. , FAB-MAP [Cummins and Newman] • Map repair: how to integrate new information -Update local metric maps with changes -What happens when PR fails?

Visual environment change Challenges for lifelong maps: • Map stitching • Map repair • View deletion • Robust recognition

View deletion strategy 1. View clusters • Distance measure between views c(v, v’) = k/m – 1, k inliers in m matches • A cluster of set S is a maximally connected subset of S • Neighborhood of v is a set S reachable from v within a distance nd angle na 2. LRU algorithm • Max size Q for any neighborhood • Preferentially thin clusters • Delete oldest clusters if necessary

Visual Odometry Stereo: [Matthies, Lacroix, Agrawal, Comport, …] Monocular: [Nister 05, …] Multi-frame: [Engels 06, Mourignon 06, …] - no registration - high precision Indoor Willow Garage PR 2 1 km indoor trajectories Online

Urban Scenes [images courtesy Andrew Comport, INRIA] o Outdoor sequence in Versailles o 1 m stereo baseline, narrow FOV o ~400 m sequence o Average frame distance: 0. 6 m o Max frame distance: 1. 1 m o 30 - 88 Hz implementation

Autonomous Off-Road Terrain Traversal LAGR [Learning Applied to Ground Robotics] 200 m autonomous traverse Off-road terrain 15 Hz implementation

Visual SLAM Optimal solution: Bundle Adjustment • ~1000 camera poses • ~1 M 3 D points • Several days to solve • Nx. N image matching

Visual SLAM Landmarks EKF Visual SLAM [Davison 02, Sim 03, Solá 05, …] - small-scale (On 2) - robustness? Fast. SLAM [Se 03, Eade 07, Howard 07] - large-scale (O log(n)) Hybrid (PTAM, Submaps, SWF) [Klein 07, Eade 07, Sibley 07] - small scale • ~1000 camera poses • ~1 M 3 D points • Several days to solve • Nx. N image matching Frames Frame-based SLAM [Lu+Milios 97, Gutmann 99, Grisetti 07, Konolige 07/08] - large-scale (On) - robustness

Vision Tasks realtime Local Maps [Andrew Comport ICRA 2007] Long-range motion estimation Global Maps – Place recognition and local map re-use

Visual Odometry for Motion Estimation Local Maps no registration Long-range motion estimation GPS-less estimation Stereo: [Matthies, Lacroix, Agrawal, Comport, …] Monocular: [Nister 05, …] Multi-frame: [Engels 06, Mourignon 06, …] - no registration - precision? q 2 q 1 p 2 p 3 p 1 p 2 q 3 p 3

Visual Odometry Principle (Sf. M) 6 DOF

Visual Odometry left T right • Extract features - Harris, FAST, SIFT, Cen. Sur. E • Match features - DETECTION, not TRACKING - Across successive left images - Stereo: Across left/right stereo images • Find largest consistent subset of matches T+1 - Stereo: 3 non-collinear matches yield motion estimate - Monocular: 5 matches yield motion estimate* - RANSAC method • Bundle adjust last N frames and their feature tracks

Challenge of Outdoor Environments 5 Datasets - 3 km to 6 km trajectories (autonomous) - 10 Hz stereo, 1 m baseline - Max movement typically 0. 8 m - RTK GPS for ground truth

Solutions Goal: 5 m error in 5 Km (0. 1%) 1. Minimize local drift - Center-surround features for detection stability - Incremental BA - Calibration (remove bias) 5 Km 5 m 2. Minimize global angular drift - Lever-arm problem - IMU accelerometers give global tilt/roll - Low-drift IMU for yaw drift - Visual SLAM for loop closure 1 mrad ~ 0. 06 deg

Stable Feature Detection Corners vs. Center-surround Harris, FAST ~8 ms scaled SIFT, SURF Cen. Sur. E ~300 ms, ~150 ms ~15 ms Agrawal, Blas, Konolige Cen. Sur. E: Center-surround extrema for realtime feature detection and matching ECCV 2008

Error and Calibration T camera vehicle trajectory, m Camera to vehicle transform T misalignment Stereo system miscalibration => bias trajectory, m

Results, VO 5 km runs RTK GPS Ground Truth Run 1 Run 2

IMU vs. VO 1) IMU: - High XYZ drift from accelerometers (t 2) - Global gravity normal (noisy) – correct tilt/roll Low drift yaw angle (~ 1 deg/hr, tactical grade IMU)

VO + IMU EKF VO Filter predict EKF IMU Filter update movie. IMU. mov Dataset Length RMS error MAX error course 1 DTED 4 run 2 3129 m 5. 70 m (0. 18%) 10. 06 m (0. 32%) course 2 BDTED 4 run 4 6440 m 5. 10 m (0. 08%) 8. 19 m (0. 13%) course 2 BDTED 5 run 1 4712 m 6. 09 m (0. 13%) 10. 70 m (0. 23%) course 3 DTED 5 run 1 5293 m 4. 85 m (0. 09%) 8. 58 m (0. 16%) course 3 DTED 4 run 1 4920 m 9. 16 m (0. 19%) 15. 30 m (0. 31%)

VO Conclusion 1. Visual Odometry can provide precise trajectories in GPS-less environments - Good features have high frame match rates - Incremental bundle adjustment improves accuracy ~ 5 cm / √m, ~0. 15 deg / √m 2. Integration with IMU is necessary for large-scale precision - Noisy gravity normal corrects tilt/roll - High-quality IMU for yaw correction

Visual SLAM using Skeletons - Local registration is a small optimization problem (VO) Loop closure is a larger but reducible optimization problem

Marginalization c z q

Long-Baseline Matching • Match using Cen. Sure features • Good matches up to 10 m baseline - High sensitivity - High selectivity - High accuracy • Not invariant to Z-axis rotation Frame 9 6. 42 m distance 866 features 315 matched 101 inliers Frame 463

Frame. SLAM Results, Versaille Rond 133 frames, 29 links 35 ms PCG VO result Frame. SLAM result

Frame. SLAM Results, Indoor Lab [courtesy Robert Sim] o o Indoor lab sequence 12 cm stereo baseline, wide FOV ~100 m sequence, ~8200 key frames 17 tack points in the VSLAM graph

Frame. SLAM Results, Indoor Lab [courtesy Robert Sim] o Indoor lab sequence o 12 cm stereo baseline, wide FOV o ~100 m sequence, ~8200 key frames o Green crosses are uncorrected VO; cyan environment points o Red segments are VSLAM-corrected poses; blue environment points

Challenge of Outdoor Environments 5 Datasets - 3 km to 6 km trajectories (autonomous) - 10 Hz stereo, 1 m baseline - Max movement typically 0. 8 m - RTK GPS for ground truth

Frame. SLAM Results, Crusher 5 K x 2 VO run 1 VO run 2 RTK GPS run 1 42 K key frames 2. 2 K link frames 286 links 3. 3 s PCG

Small-area 3 D Reconstruction Leaving Flatland Morisset, Subramanian [SRI] Rusu [TUM]

3 D Reconstruction Pipeline VSLAM IMU, Odometry 3 D Pose estimation Stereo images Maps Place recognition Octree voxels Registered Point Clouds Hokuyo point cloud Planes Meshes

Frame. SLAM Conclusion 1. VO provides accurate local registration 2. Reduction to frame-frame constraints eliminates all feature variables => approximation 3. Further reductions of frames to skeletons gives compact system => Large systems can be solved quickly 4. Some method of place recognition is required for closing loops Many … [Ishiguro 01, Ulrich 00, Barbosa 02, … Recent: [Cummins 07, Pronobis 06, …] 5. In small areas, realtime 3 D reconstruction