Nonchronological Video Synopsis and Indexing Authers Yael Pritch

  • Slides: 38
Download presentation
Nonchronological Video Synopsis and Indexing Authers: Yael Pritch Alex Rav-Acha Shmual Peleg. Presenting by

Nonchronological Video Synopsis and Indexing Authers: Yael Pritch Alex Rav-Acha Shmual Peleg. Presenting by Yossi Maimon.

Overview Today, the amount of capture video is growing dramatically. Public and private places

Overview Today, the amount of capture video is growing dramatically. Public and private places are surrounded with surveillance cameras. Public places: Airports, Museums, governments instituted and so on.

Problem Each location required several camera to few hundreds of camera in order to

Problem Each location required several camera to few hundreds of camera in order to cover all place. The surveillance camera are capturing video 24/7 (public places). London city has more the million surveillance cameras. As a result, searching for activities from the last few hours/days will take hours/days. Cause to surveillance cameras to be irrelevant.

Current solutions Fast forwarding. Key frame. Arbitrary: selecting every X frame. Dynamic: according to

Current solutions Fast forwarding. Key frame. Arbitrary: selecting every X frame. Dynamic: according to activities, the algorithm will select more frames from activities. All solutions consider the frames as building blocks.

Synopsis and Indexing User: “I would like to watch in five minutes a synopsis

Synopsis and Indexing User: “I would like to watch in five minutes a synopsis video of the last week ”

Synopsis and indexing - Concepts. � The idea is to create video synopsis according

Synopsis and indexing - Concepts. � The idea is to create video synopsis according to user query. � The video synopsis will contain the importance data and activities from the raw video. � Presenting in the same time different activities from different time. � Each activity will have a pointer the original time and space in the raw video.

Synopsis example

Synopsis example

What are we going to talk about? The article describe two approaches to perform

What are we going to talk about? The article describe two approaches to perform synopsis on the raw video. 1. Low level - Pixel base approach. 2. High level - Object base approach.

Pixel base approach. � The video synopsis should be substantially shorter than the raw

Pixel base approach. � The video synopsis should be substantially shorter than the raw video. � Maximum activity/interest from the raw video should appear in the synopsis video. � The dynamics of the objects should be preserved in the synopsis video. � Visible seams and fragmented objects should be avoided. � The shift will be only in the time space.

Pixel base approach (cont) Assuming N frame choosing. 1 ≤ t ≤ N, (x,

Pixel base approach (cont) Assuming N frame choosing. 1 ≤ t ≤ N, (x, y) pixel spatial coordinate. M – mapping pixel. I(x, y, t) – Pixel in the raw video. S(x, y, t) – Pixel in the synopsis video. Since the spatial space is not change only time then: S(x, y, t) = I(x, y, M(x, y, t)).

Synopsis options

Synopsis options

Pixel base approach (cont) The time shift M is obtained by minimization the following

Pixel base approach (cont) The time shift M is obtained by minimization the following cost function: E(M) = Ea(M) + αEd(M). Ea – indicates the lost of activity. The total active Pixels in I(raw) and not in S (Synopsis) Ed – indicates the discontinuity across seams.

Pixel base approach (cont) Active pixel: The difference of the pixel from the background.

Pixel base approach (cont) Active pixel: The difference of the pixel from the background. χ(x, y, t) = I(x, y, t) - B(x, y, t) (B for background) New equation formulation: Ea(M) – Ed(M) –

Pixel base approach (cont)

Pixel base approach (cont)

Pixel base approach (cont) The solution can be represent as a graph. Pixel =>

Pixel base approach (cont) The solution can be represent as a graph. Pixel => node, The weight is derived from activity cost. Neighbor => edge, The weight is derived from the discontinuity cost. Since each pixel in the synopsis video can Come from any time then it causes to high Complexity.

Object base synopsis Moving to high level implementation. Object/tube instead of pixel. The purpose

Object base synopsis Moving to high level implementation. Object/tube instead of pixel. The purpose is to detect and track object in the raw video to synopsis video. Objects will be rank according to there importance. Maximum activity, Minimum overlapping, Maximum continuity.

Object base synopsis (cont) Background: In short videos the background doesn’t change except surveillance

Object base synopsis (cont) Background: In short videos the background doesn’t change except surveillance Cameras (lighting, static objects). Therefore, in long videos, it should be calculate every several minutes. Background subtraction and min cut are used for segmentation of foreground objects.

System diagram

System diagram

Object base synopsis - definitions Activity cost: Favor synopsis movie with maximum activity. penalizes

Object base synopsis - definitions Activity cost: Favor synopsis movie with maximum activity. penalizes for objects that are not mapped to a valid time in the synopsis. If some pixels of the tube is mapped then the function will calculate only the unmapped pixels.

Object base synopsis – definitions � Collision cost: For every two shifted tubes a

Object base synopsis – definitions � Collision cost: For every two shifted tubes a collision should be calculate. This expression will give a low penalty to pixel whose color is similar to the background.

Object base synopsis – definitions Temporal Consistency Cost � Preserving the chronological order of

Object base synopsis – definitions Temporal Consistency Cost � Preserving the chronological order of events (two people talking of two events with a reasoning relation) � The calculation will be according to the spatio -temporal distance. � C is a penalty for object that do not preserved temporal consistency.

Energy between Tubes This energy will used for maximum activity with avoiding conflicts and

Energy between Tubes This energy will used for maximum activity with avoiding conflicts and overlap between objects. α and β are user parameters. Reducing β will cause to object overlapping and increasing will cause to sparse video.

Lower bound Synopsis video are bounded from below by the longest activity. Long videos

Lower bound Synopsis video are bounded from below by the longest activity. Long videos can’t be synopsis in temporal rearrangement. Two option to deal with it: ◦ Display partial activity. ◦ Cut the activity to several activities and present them simultaneously (stroboscopic effect).

Stroboscopic Panoramic Synopsis

Stroboscopic Panoramic Synopsis

System diagram

System diagram

SYNOPSIS OF ENDLESS VIDEO The algorithm will provide the user the ability to watch

SYNOPSIS OF ENDLESS VIDEO The algorithm will provide the user the ability to watch synopsis video with the raw video (Surveillance cameras) The algorithm is divide to two phases. 1. Online phase. Collecting and analyzing the raw video. 2. Response phase. Build user synopsis as a response to user query.

Online phase Creating a background video by temporal median. Object (tube) detection and segmentation.

Online phase Creating a background video by temporal median. Object (tube) detection and segmentation. Inserting detected objects into the object queue. Removing objects from the object queue when reaching a space limit.

Response phase Constructing a time-lapse video of the changing background. Selecting tubes for the

Response phase Constructing a time-lapse video of the changing background. Selecting tubes for the synopsis video and computing the optimal temporal arrangement of these tubes. Stitching the tubes and the background into a coherent video

Synopsis generation. � Generating � computes a background video. consistency cost for each object

Synopsis generation. � Generating � computes a background video. consistency cost for each object and for each possible time in the synopsis. � determines which tubes should appear in the synopsis and at what time. � The selected tubes are combined with the background time-lapse to get the final synopsis.

Reducing unrelated frames. Removing Stationary Frames Surveillance camera have long period with no activity.

Reducing unrelated frames. Removing Stationary Frames Surveillance camera have long period with no activity. Such frames can be filtered during online phase. Recording only when notice in activity. Short activity Activity less then a second has no importance. Therefore, we will take frame every 10 frames.

The Object Queue In endless movie there is a problem to queued all items

The Object Queue In endless movie there is a problem to queued all items due to space. The common methods is to through the oldest object but then it limited to user query. Our approach is to through object with low importance (activity), collision potential and age. By user defined thresholds (uniform, dynamic). Object properties such as: Activity, time

Time-Lapsed background What should it done? It should represent the background changes over time

Time-Lapsed background What should it done? It should represent the background changes over time (day-night transition). it should represent the background of the activity tubes. Ht – uniform histogram. Ha – Activity histogram.

Consistency with background Assumption: The pixels in the object border is similar to the

Consistency with background Assumption: The pixels in the object border is similar to the background. We’ll define the cost of stitching an object to background.

Stitching the synopsis video Stitching all tubes together will cause to color blending. Boundaries

Stitching the synopsis video Stitching all tubes together will cause to color blending. Boundaries of each tune are consists with background. Suggested approach: The background is the same (except lighting) and each object will be stitched independetly.

Foreground-Background phase transition Moving object become stationary. Stationary object become moving object. Problems: Background

Foreground-Background phase transition Moving object become stationary. Stationary object become moving object. Problems: Background objects will appear and disappear with no reason. Moving objects will disappear when they stop moving rather than becoming part of the background.

Indexing original video frames of active periods should be stored together with the object

Indexing original video frames of active periods should be stored together with the object based queue. Each selected object has time stamp. Clicking the object will direct the user to the time in the raw video according to the time stamp.

Q&A

Q&A

Thank you

Thank you