ETISEO Benot GEORIS Franois BREMOND and Monique THONNAT

ETISEO Benoît GEORIS, François BREMOND and Monique THONNAT ORION Team, INRIA Sophia Antipolis, France Nice, May 10 -11 th 2005

Fair Evaluation (1/4) n Unbiased and transparent evaluation protocol n Large participation n Meaningful evaluation 2

Fair Evaluation (2/4) n Unbiased and transparent evaluation protocol § § § Each participant should have the same effective time for testing and evaluation Main decision should be collegiate : common agreement on videos, ground truth and metrics Everybody keeps control on result diffusion 3

Fair Evaluation (3/4) n Large participation: what people want § § n Meet partner expectation: large variety of videos Need minimum efforts for tuning and testing: small video sets Large participation: what can we propose § § § Possibility to adapt algorithm experimentation (having a mask, choosing options such as shadows, low contrast, wind…) Enabling a graduation of difficulties from easy to hard Minimizing overhead for result creation (format, data exchange, …) 4

Fair Evaluation (4/4) n Meaningful evaluation § § § Clear visualization (straightforward metrics and graphical result presentation) For each partner, presentation of results for problems (one after one) that will be studied by ETISEO with variations from easy to hard For a specific problem, global comparison of partner performances (e. g. , sensibility of event recognition wrt image resolution) 5

Video Selection (1/4) n New or already published videos ? n Contextual information associated to videos n Video characterization 6

Video Selection (2/4) n New or already published videos ? § Advantages of new videos: § § § Advantages of old videos: § § § Fair, since available time is the same for everyone Dedicated to specific problems, with graduations of difficulties Easy because they are available and many people have already tested them Enable to compare with algorithms outside ETISEO project Mix solution § § Using both old and new videos Sharing videos with other ongoing evaluation programs 7

Video Selection (3/4) n Contextual information associated to videos § 3 D empty scene model § § § Camera calibration § § § Minimum information : few 3 D distances drawn on the image Maximum information: 3 D scene model made available Set of 2 D and 3 D points Calibration matrix taking into account distorsion or not 3 D models of objects of interest 8

Video Selection (4/4) n Video characterization § § § Partners should specify what they want and what they don’t among the 30 possibilities From what partners want, we can select x (5? ) problems (e. g. , dynamic occlusion) to be studied and generate a series of y (10? ) videos From what partners do not want, we should provide tools to prevent disturbation of other simultaneous problems (e. g. , wind while occlusion) 9

Data terminology (1/2) n n Video sequences, video clips and scenes Blobs, moving regions, physical objects of interest and contextual objects n Ground truth, annotation and reference data n Criteria and metrics 10

Data terminology (2/2) n Definition of video analysis tasks § § n 1) 2) 3) 4) Detection of physical objects of interest Classification of physical objects of interest Tracking of physical objects of interest Event recognition Delimitation of tasks to be evaluated § § Different types of combination (only task 1 vs combined task 1&2) Evaluation of each task whatever the combination 11

Ground truth and Metrics n For each task, definition of ground truth and metrics n Annotation tool (Viper, Reading tool, …) n Format for ground truth (XML, MPEG 7, …) 12

MPEG 7 Example <? xml version="1. 0" encoding="ISO-8859 -1" standalone="yes" ? > <Mpeg 7 xmlns="urn: mpeg 7: schema: 2001" xmlns: mpeg 7="urn: mpeg 7: schema: 2001" xmlns: xsi="http: //www. w 3. org/2001/XMLSchemainstance" xsi: schema. Location="urn: mpeg 7: schema: 2001 Mpeg 7 -2001. xsd"> <Description xsi: type="Content. Entity. Type"> <Multimedia. Content xsi: type="Video. Type"> <Video> <Media. Time. Point>2004 -01 -27 T 12: 55: 00: 25 F 1000</Media. Time. Point> <Media. Duration>PT 0 H 1 M 5 S 725 N 1000 F</Media. Duration> </Media. Time> <Spatio. Temporal. Decomposition> <Moving. Region id="O 4"> <Spatio. Temporal. Locator> <Figure. Trajectory type="rectangle"> <Media. Time. Point>2004 -01 -27 T 12: 55: 00: 25 F 1000</Media. Time. Point> <Media. Duration>PT 0 H 1 M 5 S 725 N 1000 F</Media. Duration> </Media. Time> <Vertex> <Key. Time. Point> <Media. Time. Point>2004 -01 -27 T 12: 55: 00: 25 F 1000</Media. Time. Point> <Media. Time. Point>2004 -01 -27 T 12: 55: 00: 75 F 1000</Media. Time. Point> <Media. Time. Point>2004 -01 -27 T 12: 55: 00: 125 F 1000</Media. Time. Point> </Key. Time. Point> <Interpolation. Functions> <Key. Value type="start. Point">41. 000000</Key. Value> <Key. Value type="first. Order">51. 000000</Key. Value> <Key. Value param="0. 100000" type="second. Order">61. 000000</Key. Value> </Interpolation. Functions> <Key. Value type="start. Point">42. 000000</Key. Value> <Key. Value type="first. Order">52. 000000</Key. Value> <Key. Value param="0. 200000" type="second. Order">62. 000000</Key. Value> </Interpolation. Functions> </Vertex> </Figure. Trajectory> </Spatio. Temporal. Locator> </Moving. Region> </Spatio. Temporal. Decomposition> </Video> </Multimedia. Content> </Description> </Mpeg 7> 13

XML Example <br_frame type="normal" id="5539" time_year="2001" time_month="11" time_day="17" time_hour="0" time_min="18" time_sec="27" time_ms="800" camera_sector="YZER" camera_area="HALL 01" camera_id="C 05"> <list_actors> <annotation_actor id="0" class="group"> <time start_time_hour="0" start_time_min="18" start_time_sec="6" start_time_ms="0" /> <info 3 d width="61" height="161" x="137" y="-368" z="0" /> </annotation_actor> </list_actors> <list_activities> <annotation_activity id="4" priority="20" class="event" sub_class="stopped"> <list_video_frames best_camera_area="HALL 01" best_camera_id="C 05"> <video_frame id="5539" camera_area="HALL 01" camera_id="C 05" /> </list_video_frames> <time start_time_hour="0" start_time_min="18" start_time_sec="14" start_time_ms="600" /> <list_activity_participators> <participator id="0" role="source" /> </list_activity_participators> </annotation_activity> <annotation_activity id="6" priority="4" class="alarm" sub_class="fighting"> <list_video_frames best_camera_area="HALL 01" best_camera_id="C 05"> <video_frame id="5539" camera_area="HALL 01" camera_id="C 05" /> </list_video_frames> <time start_time_hour="0" start_time_min="18" start_time_sec="18" start_time_ms="800" /> <list_activity_participators> <participator id="0" role="source" /> </list_activity_participators> </annotation_activity> </list_activities> </br_frame> 14

Ground Truth Definition : example 15