Place Recognition and Lifelong Mapping Kurt Konolige James




![Crusher Visual Odometry Stereo: [Matthies, Lacroix, Agrawal, Comport, …] Monocular: [Nister 05, …] Multi-frame: Crusher Visual Odometry Stereo: [Matthies, Lacroix, Agrawal, Comport, …] Monocular: [Nister 05, …] Multi-frame:](https://slidetodoc.com/presentation_image_h2/a5b4d06debc4b39f817628eede772127/image-5.jpg)
![Place Recognition: Vocabulary Trees [Nister and Stewenius CVPR 06] - “Bag of words” retrieval Place Recognition: Vocabulary Trees [Nister and Stewenius CVPR 06] - “Bag of words” retrieval](https://slidetodoc.com/presentation_image_h2/a5b4d06debc4b39f817628eede772127/image-6.jpg)
![Place Recognition: Vocabulary Trees [Nister and Stewenius] - “Bag of words” - Vocab tree Place Recognition: Vocabulary Trees [Nister and Stewenius] - “Bag of words” - Vocab tree](https://slidetodoc.com/presentation_image_h2/a5b4d06debc4b39f817628eede772127/image-7.jpg)








![Visual Odometry Stereo: [Matthies, Lacroix, Agrawal, Comport, …] Monocular: [Nister 05, …] Multi-frame: [Engels Visual Odometry Stereo: [Matthies, Lacroix, Agrawal, Comport, …] Monocular: [Nister 05, …] Multi-frame: [Engels](https://slidetodoc.com/presentation_image_h2/a5b4d06debc4b39f817628eede772127/image-16.jpg)
![Urban Scenes [images courtesy Andrew Comport, INRIA] o Outdoor sequence in Versailles o 1 Urban Scenes [images courtesy Andrew Comport, INRIA] o Outdoor sequence in Versailles o 1](https://slidetodoc.com/presentation_image_h2/a5b4d06debc4b39f817628eede772127/image-17.jpg)
![Autonomous Off-Road Terrain Traversal LAGR [Learning Applied to Ground Robotics] 200 m autonomous traverse Autonomous Off-Road Terrain Traversal LAGR [Learning Applied to Ground Robotics] 200 m autonomous traverse](https://slidetodoc.com/presentation_image_h2/a5b4d06debc4b39f817628eede772127/image-18.jpg)

![Visual SLAM Landmarks EKF Visual SLAM [Davison 02, Sim 03, Solá 05, …] - Visual SLAM Landmarks EKF Visual SLAM [Davison 02, Sim 03, Solá 05, …] -](https://slidetodoc.com/presentation_image_h2/a5b4d06debc4b39f817628eede772127/image-20.jpg)
![Vision Tasks realtime Local Maps [Andrew Comport ICRA 2007] Long-range motion estimation Global Maps Vision Tasks realtime Local Maps [Andrew Comport ICRA 2007] Long-range motion estimation Global Maps](https://slidetodoc.com/presentation_image_h2/a5b4d06debc4b39f817628eede772127/image-21.jpg)















![Frame. SLAM Results, Indoor Lab [courtesy Robert Sim] o o Indoor lab sequence 12 Frame. SLAM Results, Indoor Lab [courtesy Robert Sim] o o Indoor lab sequence 12](https://slidetodoc.com/presentation_image_h2/a5b4d06debc4b39f817628eede772127/image-37.jpg)
![Frame. SLAM Results, Indoor Lab [courtesy Robert Sim] o Indoor lab sequence o 12 Frame. SLAM Results, Indoor Lab [courtesy Robert Sim] o Indoor lab sequence o 12](https://slidetodoc.com/presentation_image_h2/a5b4d06debc4b39f817628eede772127/image-38.jpg)



![Small-area 3 D Reconstruction Leaving Flatland Morisset, Subramanian [SRI] Rusu [TUM] Small-area 3 D Reconstruction Leaving Flatland Morisset, Subramanian [SRI] Rusu [TUM]](https://slidetodoc.com/presentation_image_h2/a5b4d06debc4b39f817628eede772127/image-42.jpg)


- Slides: 44
Place Recognition and Lifelong Mapping Kurt Konolige, James Bowman, JD Chen, Patrick Mihelich Willow Garage Michael Colander, Vincent Lepetit, Pascal Fua Ecole Polytechnique Federal de Lausanne Konolige et al. View-Based Maps, RSS, 2009 Konolige and Bowman, Lefelong Visual Maps, IROS 2009 Konolige et al. Mapping, Navigation and Learning for Off-road Traversal, JFR, 2008 Konolige and Agrawal, Frame. SLAM: from Bundle Adjustment to Realtime Visual Mapping, TRO, 2008
Willow Garage • PR 2 Mobile Manipulation Platform • Open-source robotics software • ROS • Open. CV • Robotics and vision algorithms
From 2 D laser maps to VIEW MAPS p 1 p 2 p 3 Locally metric Global manifold
VSLAM by VIEW MAPS -View Maps: set of stereo views connected by nonlinear gaussian constraints Continuous recognition p 1 p 2 Locally metric Global manifold p 3 Continuous detection [Grisetti et al. ] Toro
Crusher Visual Odometry Stereo: [Matthies, Lacroix, Agrawal, Comport, …] Monocular: [Nister 05, …] Multi-frame: [Engels 06, Mourignon 06, …] CRUSHER Carnegie Mellon NREC Vehicle 5 km autonomous traverse Rough terrain Log file data
Place Recognition: Vocabulary Trees [Nister and Stewenius CVPR 06] - “Bag of words” retrieval - Vocab tree created offline - For recognition: - Image keypoints extracted - Tree encodes approximate NN search - Inverted index of images at leaves [Cummins and Newman ICRA 07 Cullmer et al. ACRA 08 Fraundorfer et al. IROS 07 Eade and Drummond BMVC 08 Williams et al. ICCV 07] [Image from Nister and Stewenius CVPR 06]
Place Recognition: Vocabulary Trees [Nister and Stewenius] - “Bag of words” - Vocab tree created offline - New images queried and added online Performance on Indoor dataset
Geometric Check How good a rejection filter is the geometric check?
Kidnapped Robot / Relocalization
Trajectory synthesis
Indoor VSLAM with View Maps
Place Recognition after 1 week
Challenges • Robust place recognition -Use more stables features, e. g. , lines [Jana Kosecka] -Learn discriminating features with their geometry -Relax the geometry - Sub-parts: chairs, tables can move - No geometry, e. g. , FAB-MAP [Cummins and Newman] • Map repair: how to integrate new information -Update local metric maps with changes -What happens when PR fails?
Visual environment change Challenges for lifelong maps: • Map stitching • Map repair • View deletion • Robust recognition
View deletion strategy 1. View clusters • Distance measure between views c(v, v’) = k/m – 1, k inliers in m matches • A cluster of set S is a maximally connected subset of S • Neighborhood of v is a set S reachable from v within a distance nd angle na 2. LRU algorithm • Max size Q for any neighborhood • Preferentially thin clusters • Delete oldest clusters if necessary
Visual Odometry Stereo: [Matthies, Lacroix, Agrawal, Comport, …] Monocular: [Nister 05, …] Multi-frame: [Engels 06, Mourignon 06, …] - no registration - high precision Indoor Willow Garage PR 2 1 km indoor trajectories Online
Urban Scenes [images courtesy Andrew Comport, INRIA] o Outdoor sequence in Versailles o 1 m stereo baseline, narrow FOV o ~400 m sequence o Average frame distance: 0. 6 m o Max frame distance: 1. 1 m o 30 - 88 Hz implementation
Autonomous Off-Road Terrain Traversal LAGR [Learning Applied to Ground Robotics] 200 m autonomous traverse Off-road terrain 15 Hz implementation
Visual SLAM Optimal solution: Bundle Adjustment • ~1000 camera poses • ~1 M 3 D points • Several days to solve • Nx. N image matching
Visual SLAM Landmarks EKF Visual SLAM [Davison 02, Sim 03, Solá 05, …] - small-scale (On 2) - robustness? Fast. SLAM [Se 03, Eade 07, Howard 07] - large-scale (O log(n)) Hybrid (PTAM, Submaps, SWF) [Klein 07, Eade 07, Sibley 07] - small scale • ~1000 camera poses • ~1 M 3 D points • Several days to solve • Nx. N image matching Frames Frame-based SLAM [Lu+Milios 97, Gutmann 99, Grisetti 07, Konolige 07/08] - large-scale (On) - robustness
Vision Tasks realtime Local Maps [Andrew Comport ICRA 2007] Long-range motion estimation Global Maps – Place recognition and local map re-use
Visual Odometry for Motion Estimation Local Maps no registration Long-range motion estimation GPS-less estimation Stereo: [Matthies, Lacroix, Agrawal, Comport, …] Monocular: [Nister 05, …] Multi-frame: [Engels 06, Mourignon 06, …] - no registration - precision? q 2 q 1 p 2 p 3 p 1 p 2 q 3 p 3
Visual Odometry Principle (Sf. M) 6 DOF
Visual Odometry left T right • Extract features - Harris, FAST, SIFT, Cen. Sur. E • Match features - DETECTION, not TRACKING - Across successive left images - Stereo: Across left/right stereo images • Find largest consistent subset of matches T+1 - Stereo: 3 non-collinear matches yield motion estimate - Monocular: 5 matches yield motion estimate* - RANSAC method • Bundle adjust last N frames and their feature tracks
Challenge of Outdoor Environments 5 Datasets - 3 km to 6 km trajectories (autonomous) - 10 Hz stereo, 1 m baseline - Max movement typically 0. 8 m - RTK GPS for ground truth
Solutions Goal: 5 m error in 5 Km (0. 1%) 1. Minimize local drift - Center-surround features for detection stability - Incremental BA - Calibration (remove bias) 5 Km 5 m 2. Minimize global angular drift - Lever-arm problem - IMU accelerometers give global tilt/roll - Low-drift IMU for yaw drift - Visual SLAM for loop closure 1 mrad ~ 0. 06 deg
Stable Feature Detection Corners vs. Center-surround Harris, FAST ~8 ms scaled SIFT, SURF Cen. Sur. E ~300 ms, ~150 ms ~15 ms Agrawal, Blas, Konolige Cen. Sur. E: Center-surround extrema for realtime feature detection and matching ECCV 2008
Error and Calibration T camera vehicle trajectory, m Camera to vehicle transform T misalignment Stereo system miscalibration => bias trajectory, m
Results, VO 5 km runs RTK GPS Ground Truth Run 1 Run 2
IMU vs. VO 1) IMU: - High XYZ drift from accelerometers (t 2) - Global gravity normal (noisy) – correct tilt/roll Low drift yaw angle (~ 1 deg/hr, tactical grade IMU)
VO + IMU EKF VO Filter predict EKF IMU Filter update movie. IMU. mov Dataset Length RMS error MAX error course 1 DTED 4 run 2 3129 m 5. 70 m (0. 18%) 10. 06 m (0. 32%) course 2 BDTED 4 run 4 6440 m 5. 10 m (0. 08%) 8. 19 m (0. 13%) course 2 BDTED 5 run 1 4712 m 6. 09 m (0. 13%) 10. 70 m (0. 23%) course 3 DTED 5 run 1 5293 m 4. 85 m (0. 09%) 8. 58 m (0. 16%) course 3 DTED 4 run 1 4920 m 9. 16 m (0. 19%) 15. 30 m (0. 31%)
VO Conclusion 1. Visual Odometry can provide precise trajectories in GPS-less environments - Good features have high frame match rates - Incremental bundle adjustment improves accuracy ~ 5 cm / √m, ~0. 15 deg / √m 2. Integration with IMU is necessary for large-scale precision - Noisy gravity normal corrects tilt/roll - High-quality IMU for yaw correction
Visual SLAM using Skeletons - Local registration is a small optimization problem (VO) Loop closure is a larger but reducible optimization problem
Marginalization c z q
Long-Baseline Matching • Match using Cen. Sure features • Good matches up to 10 m baseline - High sensitivity - High selectivity - High accuracy • Not invariant to Z-axis rotation Frame 9 6. 42 m distance 866 features 315 matched 101 inliers Frame 463
Frame. SLAM Results, Versaille Rond 133 frames, 29 links 35 ms PCG VO result Frame. SLAM result
Frame. SLAM Results, Indoor Lab [courtesy Robert Sim] o o Indoor lab sequence 12 cm stereo baseline, wide FOV ~100 m sequence, ~8200 key frames 17 tack points in the VSLAM graph
Frame. SLAM Results, Indoor Lab [courtesy Robert Sim] o Indoor lab sequence o 12 cm stereo baseline, wide FOV o ~100 m sequence, ~8200 key frames o Green crosses are uncorrected VO; cyan environment points o Red segments are VSLAM-corrected poses; blue environment points
Challenge of Outdoor Environments 5 Datasets - 3 km to 6 km trajectories (autonomous) - 10 Hz stereo, 1 m baseline - Max movement typically 0. 8 m - RTK GPS for ground truth
Frame. SLAM Results, Crusher 5 K x 2 VO run 1 VO run 2 RTK GPS run 1 42 K key frames 2. 2 K link frames 286 links 3. 3 s PCG
Frame. SLAM Results, Crusher 5 K x 2 VO run 1 VO run 2 RTK GPS run 1 42 K key frames 2. 2 K link frames 286 links 3. 3 s PCG
Small-area 3 D Reconstruction Leaving Flatland Morisset, Subramanian [SRI] Rusu [TUM]
3 D Reconstruction Pipeline VSLAM IMU, Odometry 3 D Pose estimation Stereo images Maps Place recognition Octree voxels Registered Point Clouds Hokuyo point cloud Planes Meshes
Frame. SLAM Conclusion 1. VO provides accurate local registration 2. Reduction to frame-frame constraints eliminates all feature variables => approximation 3. Further reductions of frames to skeletons gives compact system => Large systems can be solved quickly 4. Some method of place recognition is required for closing loops Many … [Ishiguro 01, Ulrich 00, Barbosa 02, … Recent: [Cummins 07, Pronobis 06, …] 5. In small areas, realtime 3 D reconstruction