WeaklySupervised Video Object Segmentation Danny Cosentino Project Leader

Weakly-Supervised Video Object Segmentation Danny Cosentino Project Leader: Kevin Duarte

Video Object Segmentation (VOS) • Given a first frame segmentation, segment an object throughout a video • Applications in video editing and surveillance

Youtube. VOS Dataset • 4, 453 Videos • 3, 471 Training • 474 Validation • 508 Testing • 94 Object Categories • 7, 755 Objects • 197, 272 Annotations • Annotations every 5 frames

Scribble Annotations • Pixel-level segmentations are time-consuming and costly • Instead, can use simpler annotations – Scribbles • Generated with five random points in each segmentation

Youtube. VOS S 2 S Network N. Xu, L. Yang, Y. Fan, J. Yang, D. Yue, Y. Liang, B. L. Price, S. Cohen, and T. S. Huang. Youtube-vos: Sequenceto-sequence video object segmentation. In Computer Vision -ECCV 2018 - 15 th European Conference, Munich, Germany, September 8 -14, 2018, Proceedings, Part V, pages 603– 619, 2018.

Implementation • Training: – Select series of 8 frames from each video • Resize to 256 x 448 • Randomly reverse the frames • Random horizontal flip – – Randomly sample an object in video Learning rate set to 10 e-5 Batch size of 9 videos Model Converges in ~120 epochs

Youtube. VOS • ~31% overall score on the Youtube. VOS Challenge

Youtube. VOS Challenge Results Overall J_Seen J_Unseen F_Seen F_Unseen Week 3 0. 16 0. 22 0. 18 0. 10 Week 4 0. 24 0. 35 0. 20 0. 25 0. 16 Week 5 0. 26 0. 39 0. 21 0. 29 0. 17 Week 6 0. 31 0. 43 0. 24 0. 36 0. 21 First Frame Scribbles 0. 16 0. 20 0. 13 0. 18 0. 13

First Frame Scribbles

Weakly-Supervised Loss • Superpixels are generated for the frame • Superpixel is foreground if scribble goes through it Where:

Thank You…