Annotating RGBD Images of Indoor Scenes YuShiang Wong

Annotating RGBD Images of Indoor Scenes Yu-Shiang Wong and Hung-Kuo Chu National Tsing Hua University CGV LAB SA 2014. SIGGRAPH. ORG SPONSORED BY

Outline Motivation Related Works Annotation Procedure User Study SA 2014. SIGGRAPH. ORG SPONSORED BY

Motivation Scene understanding is a popular topic. RGBD dataset with high quality semantic annotations are valuable: Learning Evaluations Two fundamental problems • Data Acquisition and Annotation SA 2014. SIGGRAPH. ORG SPONSORED BY

RGBD Indoor Datasets Cornell-RGBD (2011 -12) : 24 labeled office scenes NYU 2 (2011 -12) : 1449 labeled indoor scenes – 408, 000+ RGBD videos frames (unlabeled) SUN 3 D (2013) : 415+ full captured room – 10+ room is full labeled, annotations are propagated through video. UZH & ETH 3 D Scanned Point Datasets (2014) : 42 x full captured room – high quality point clouds (unlabeled) Object Detection and Classification from Large-Scale Cluttered Indoor Scans (EG 2014) … SA 2014. SIGGRAPH. ORG SPONSORED BY

Motivation Data annotation is a painstaking and timeconsuming task OMG! So many data need to be annotated SA 2014. SIGGRAPH. ORG SPONSORED BY

Motivation Data annotation is a painstaking and timeconsuming task Interactive tool for annotating RGBD indoor scenes We need a good tool! SA 2014. SIGGRAPH. ORG SPONSORED BY

Motivation Data annotation is a tedious and timeconsuming task Interactive tool for annotating RGBD indoor scenes Leverage both the cognitive ability of human and computational power of machine. SA 2014. SIGGRAPH. ORG SPONSORED BY

RELATED WORKS SA 2014. SIGGRAPH. ORG SPONSORED BY

Image Annotation Label. Me: a database and web-based tool for image annotation. Russell et. al. , IJCV 2007 SUN 3 D: A Database of Big Spaces Reconstructed using Sf. M and Object Labels, Xiao et. al. ICCV 2013 Cheaper by the Dozen: Group Annotation of 3 D Data, Boyko et. al. , UIST 2014 SA 2014. SIGGRAPH. ORG SPONSORED BY

Scene Understanding using RGBD Data Image-based Indoor segmentation and support inference from RGBD images. Silberman et. al. ECCV 2012. RGB-(D) scene labeling: Features and algorithms. Ren et. al. CVPR. 2012 Proxy-based Imagining the unseen: Stability- based cuboid arrangements for understanding cluttered indoor scenes. Shao et. al. , SIGGRAPH Asia 2014 Pano. Context: A whole-room 3 d context model for panoramic scene understanding. Zhang et. al. , ECCV 2014 Holistic scene understanding for 3 D object detection with rgbd cameras. , Lin et. al. , ICCV 2013 3 D- based reasoning with blocks, support, and stability. Xiao et. al. CVPR 2013 SA 2014. SIGGRAPH. ORG SPONSORED BY

Annotation Procedure: Overview Input : RGB-D image Output: Seg. , Label, Box proxy, Support structure Machine Output Input picture Å lamp pillow picture User SA 2014. SIGGRAPH. ORG stand bed SPONSORED BY

Annotation Procedure: Overview Machine Session Input RGB-D Image Extract Room Draw Scribbles Estimate Boxes Output Annotated 3 D Structure Annotate Label and Structure User Session picture lamp pillow stand SA 2014. SIGGRAPH. ORG picture bed SPONSORED BY

Annotation Procedure: Preprocessing Estimate normal Perform over-segmentation using both color and normal map. • Efficient graph based image segmentation [Felzenszwalb et. al. 2004] • The coarser segmentation is used for room estimation. • The finer segmentation is used for userassisted object segmentation. SA 2014. SIGGRAPH. ORG SPONSORED BY

Annotation Procedure: Extracting Room Layout Input RGB-D Image Extract Room Draw Scribbles Estimate Boxes Output Annotated 3 D Structure Annotate Label and Structure SA 2014. SIGGRAPH. ORG SPONSORED BY [Silberman 2012]

Annotation Procedure: User Scribbles Input RGB-D Image Extract Room Draw Scribbles Estimate Boxes Annotate Label and Structure Output Annotated 3 D Structure Check floor and walls hypotheses • If the hypotheses fail, user clicks the segment to identify floor and walls. User draws scribbles to extract the object segments SA 2014. SIGGRAPH. ORG SPONSORED BY User

Annotation Procedure: Estimating Boxes Input RGB-D Image Extract Room Draw Scribbles Estimate Boxes Annotate Label and Structure Output Annotated 3 D Structure • Box orientation = Find out an orthogonal basis in 3 D domain (3 unknowns direction) • We assume one direction of box is parallel to the normal of floor (1 unknowns direction, 1 by cross product) SA 2014. SIGGRAPH. ORG Box Fitting Method : 1. Filtering point cloud by KNN 2. Project point cloud of a box to floor plane 3. Fit a line in 2 D domain to extract a major direction 4. Using cross product to extract last direction. SPONSORED BY

Annotation Procedure: Annotate Label and 3 D Structure Input RGB-D Image Extract Room picture lamp Draw Scribbles lamp pillow bed SA 2014. SIGGRAPH. ORG Output Annotated 3 D Structure User Tasks : 1. Type in the object label picture stand Estimate Boxes Annotate Label and Structure 2. Drag an arrow to specify the support relationships SPONSORED BY User

Annotation Procedure: Box Quality Refinement (Optional) Input RGB-D Image Extract Room Draw Scribbles Estimate Boxes Annotate Label and Structure Output Annotated 3 D Structure User Tasks : 1. Adjust the orientation of boxes 2. Adjust the size of boxes SA 2014. SIGGRAPH. ORG SPONSORED BY User

USER STUDY SA 2014. SIGGRAPH. ORG SPONSORED BY

User Study : Settings • Select 50 x scenes across 7 scene class from NYU 2 • Recruit 2 users, • Each user is requested to annotate 50 x scenes • Target class : 24 merged object classes • List : bed, chair, cabinet, dresser, television, night stand, table, sofa, picture, pillow, … • Each scene contains 3 -6 objects SA 2014. SIGGRAPH. ORG SPONSORED BY

User Study : Results • System Process Time: calculate normal, fitting planes and boxes: < 3 sec [in C++] • Annotation Time: ( 50 x Scenes ) Task Type Mean time per box Mean time per scene Total Time Check Room -- 1. 6 sec 1. 3 min Draw Scribbles 16 sec 1 min 51 min Type Labels 4 sec 17 sec 13 min Drag Supports 2 sec 9 sec 7. 5 min Boxes Adjustment 11 sec 35 sec 29 min ( Accuracy = 64 %) TOTAL = 101 min SA 2014. SIGGRAPH. ORG SPONSORED BY

Demo SA 2014. SIGGRAPH. ORG SPONSORED BY

Conclusion An interactive system to facilitate annotating RGBD indoor scenes. Generating high quality ground truth data with rich annotations Object segments Object labels 3 D geometry 3 D structure SA 2014. SIGGRAPH. ORG picture lamp picture table SPONSORED BY

On Going Work The major bottleneck lie in manual operations: Drawing scribbles Refine box proxy Typing labels Specify structure Incorporate inferring algorithm and 3 D structure analysis to reduce the manual burden from the user. SA 2014. SIGGRAPH. ORG SPONSORED BY

THANKS YOU ! SA 2014. SIGGRAPH. ORG SPONSORED BY