Learning Semanticsaware Distance Map with Semantics Layering Network

What is Amodal Instance Segmentation • The occluded part should be predicted modal region

Normal semantic segmentation network • Predict labels for all visible pixels • Each pixel

A natural variant from modal to amodal • Predict labels for all visible and

Comparison of the prediction’s data structure of the above two methods AMODAL (natural naïve

What information did we ignore on the GT? • Relative depth order The girl

Amodal naïve+depth order version MODAL AMODAL (natural naïve version) AMODAL (naïve + depth version)

Restriction of this naïve + depth version • No all areas of one object

SLN from this paper • Different pixels have different depths 2019/11/15 Zhixuan Li 9

SLN Architecture 2019/11/15 Zhixuan Li 11

Sem-dist map • visibility level L • “remove” at least L objects to make

Experiments COCOA dataset 2500, 1250 and 1250 images are used for training, validation and

Metrics for amodal segmentation • mean average precision (AP) and mean average recall (AR)

Result on COCOA validation set AR 10 and AR 100: 10 and 100 segments

Conclusions • STOA on COCOA and D 2 SA dataset • good baseline code

Slides: 17

Download presentation

Learning Semantics-aware Distance Map with Semantics Layering Network for Amodal Instance Segmentation ACM MM 2019 Poster Ziheng Zhang∗, Anpei Chen∗, Ling Xie , Jingyi Yu, Shenghua Gao† Shanghai. Tech University 2019/11/15 Zhixuan Li 1

What is Amodal Instance Segmentation • The occluded part should be predicted modal region amodal region -> yellow part of the refrigerator RGB input 2019/11/15 Ground Truth Zhixuan Li 2

Normal semantic segmentation network • Predict labels for all visible pixels • Each pixel only could have one possible label 2019/11/15 Zhixuan Li 3

A natural variant from modal to amodal • Predict labels for all visible and invisible pixels • Each pixel could have more than one possible labels • occluded region: >= 2 labels / pixel 2019/11/15 Zhixuan Li 4

Comparison of the prediction’s data structure of the above two methods AMODAL (natural naïve version) MODAL H*W*C Same C is the number of classes argmax cross C channels, on each position Different -> one label / position reserve predictions from all channels -> multi labels / occluded position -> one label / not occluded position H C 2019/11/15 W Zhixuan Li 5

What information did we ignore on the GT? • Relative depth order The girl is closer to camera than the surface board 2019/11/15 Zhixuan Li 6

Amodal naïve+depth order version MODAL AMODAL (natural naïve version) AMODAL (naïve + depth version) Dimensions H*W*C H*W*K Channel means what classes relative depth order note argmax reserve all in each depth, use argmax to get one class H H K C 2019/11/15 W Zhixuan Li W 7

Restriction of this naïve + depth version • No all areas of one object have same relative depth • left and right legs of the person are in front of and behind the horse 2019/11/15 Zhixuan Li 8

SLN from this paper • Different pixels have different depths 2019/11/15 Zhixuan Li 9

Illustrations 2019/11/15 Zhixuan Li 10

SLN Architecture 2019/11/15 Zhixuan Li 11

Sem-dist map • visibility level L • “remove” at least L objects to make it visible • the sem-dist map M 2019/11/15 Zhixuan Li 12

Experiments COCOA dataset 2500, 1250 and 1250 images are used for training, validation and testing D 2 SA dataset 5600 images nature scene 2019/11/15 supermarket goods Zhixuan Li 13

Metrics for amodal segmentation • mean average precision (AP) and mean average recall (AR) 2019/11/15 Zhixuan Li 14

Result on COCOA validation set AR 10 and AR 100: 10 and 100 segments per image things: person, car et. al stuff: sky, grass et. al 2019/11/15 countable uncountable Zhixuan Li 15

2019/11/15 COCOA dataset 16

Conclusions • STOA on COCOA and D 2 SA dataset • good baseline code 2019/11/15 Zhixuan Li 17