Towards Semantic Multimodal Video Annotation Author Marco Grassi

  • Slides: 24
Download presentation
Towards Semantic Multimodal Video Annotation Author: Marco Grassi, Christian Morbidoni, m and Francesco Piazza

Towards Semantic Multimodal Video Annotation Author: Marco Grassi, Christian Morbidoni, m and Francesco Piazza Wang Dongsheng IIS lab, korea univ. 2011. 01

content • Abstract • Introduction • Multimodal video annotation – Video annotation tools –

content • Abstract • Introduction • Multimodal video annotation – Video annotation tools – Annotation schemas • Semantic web – Human Emotion ontology – Faced Browsing Paradigm – Semantic Video Annotation Tools • Video annotation Survey – Result discussion • A Roadmap towards Semantic multimodal annotation • Conclusion • Feeling

Abstract • Nowadays – Semantic web techniques • Find applications in several research fiels

Abstract • Nowadays – Semantic web techniques • Find applications in several research fiels • Prospect to multimodal video annotation – Enhance annotation mamagement – Promote an effective sharing • Multimodal data • Annotations • Have insight via a survey – Commonly how the task • video annotation is performed • Created annotations are mamaged – Improve these tasks using semantic web techniques – Discuss the results of the survey • Trace a roadmap towards the application of semantic web techniques

Introduction • Underlying mechanisms – Multimodel process • Human communication • Emotion – •

Introduction • Underlying mechanisms – Multimodel process • Human communication • Emotion – • Human language, speech, gesture, gaze, and facial expression Starting point in this study – – Collection of experimental data Labeled analysed from a wide spectrum • Most relevant features and their correlations • Intrinsic difficulty • Multimodal video annotation – Lack unique standardization • • – The descriptor for the annotations Web architecture for the sharing and management Have an insight via survey • Commonly how the task – – video annotation is performed Created annotations are mamaged • Improve these tasks using semantic web techniques • Discuss the results of the survey – Trace a roadmap towards the application of semantic web techniques

Multimodal video annotation • Characteristics – Time-consuming • More than one hour for labeling

Multimodal video annotation • Characteristics – Time-consuming • More than one hour for labeling just one minute of video – Complex – Expensive – Error prone • A wide set of information – – Speech, facial displays, gestures Emotional state Turn taking management sequencing in the conversation

Video Annotation tools • Several specialized software application – Support to the annotation task

Video Annotation tools • Several specialized software application – Support to the annotation task – Anvil software – ELAN – Onto. ELAN • • – Linguistic annotation tool for the creation of text annotations for audio and video files Inherits ELAN Expends the tool with an ontology-based annotation approach The EXMARa. LDA system • • Consist of a data model – – TASX Annotator – Mac Vis. STA • – A set of XML formats A number of tools Creation, management and analysis of Spoken language corpora – • XML-based annotation of multimodal data on multiple tiers A software program – Code different aspects of behavior(speech, gaze, gesture, etc) Most of them • • Hierarchical multi-layerd annotations, visualization of waveform and pitch contour Allow to perform – – complex searches in the single generated annotation files Simple text searches across multiple files Disadvantages – Desktop applications – No handling for different users permission • • Perform search queries only locally in the user PC Everyone can modify, read or create new annotations

Annotation schemas • Different schemas – Developed to supply standardized vocabularied for the annotation

Annotation schemas • Different schemas – Developed to supply standardized vocabularied for the annotation • Features • terminology – Encoded into XML • Machine-processability • Used by the video annotation tools – MUMIN multimodal coding schema • Gesture and facial displays in interpersonal communication – EARL • Encoded in XML Schema • Offers a powerful structure for describing emotion, by dimentions, intensity and etc. – W 3 C • Published a working draft specifies Emotion Markup Language(Emotion. ML) – BML(behavior markup language) – Disadvantages of XML schema • XML doesnot supply enough expressiveness • Doesnot allow the mapping between similar concepts expressed in different annotation schemas

Semantic web • Semantic web – An initiative • Aims – – • improve

Semantic web • Semantic web – An initiative • Aims – – • improve the current state of the WWW to make content of web-page accessable to machine Semantic web techniques – Suitable for applications • Requre advaned – – • data-integration » Link data from multiple sources Powerful data-modeling » represent expressive semantic descriptions » Provide inferencing power The case of multimodal video annotation – The task Generates a huge quantity of metadata • Several facets of human communication – Language, speech, gesture, facial expression, etc – Requirement • Complex models and Annotation schemas • standardization

The Semantic Web • Consist – URIs(uniform resource identifiers) • Univocally identify entities –

The Semantic Web • Consist – URIs(uniform resource identifiers) • Univocally identify entities – RDF(resource description framework) • Express info in an univocally interpretable format – RDF triples • Machine accessible and processable representation – Encoded using a XML syntax – To define the semantics of the encoded data • Via ontology – – – Then Formal explicit description of concepts and properties of a represented domain Make possible the sharing understanding about the stucture of information • Expressed in a machine accessible lanuage like Owl • Stored in a triplestore, a purpose-built database for the storage and retrival of RDF triple – Information available as interconneted knowledge base • Advanced search queries can be performed on such knowledge base • Results can be displace and explored using innovative data visualization paradigms – Like faceted browsing

Human emotion ontology • HEO(The Human Emotion Ontology) – Provide a set of high

Human emotion ontology • HEO(The Human Emotion Ontology) – Provide a set of high level features • Can be – refined using lower level concepts and propertied – Linked to other more specific ontologies – Currently • Used to encode affective information – Extracted from webpages – Using Sentic Computing » A novel AI tool for sentimental analysis • HEO – Provide a wide set of properties • To describe emotions in many different ways – By category, dimension, action tendency, appraisal and more. – Used to create a description framework • Enough flexiblity – Allow the use of a wide and extensible set of descriptor • Interoperability – Map different description models

HEO

HEO

Faceted browsing paradigm • Faceted classfication – Allow the assignment of multiple categories to

Faceted browsing paradigm • Faceted classfication – Allow the assignment of multiple categories to an object – Enabling the classifications to be ordered in multiple ways – Makes possible • Perform searches combing – Textual approach – Navigational approach • Faceted search – Navigate a multidimensional information space • Concurrently writing queries in a text box • Progressively narrowing choice in each dimension – Tools available • SIMILE Exhibit API – Constitutes a set of javascript files » Allows easily create rich interactive web-pages including maps, timelines, and galleries with very detailed client-side filtering

Semantic video annotation tools • There are several semantic video annotation tools – Althou

Semantic video annotation tools • There are several semantic video annotation tools – Althou not designed for multimodal annotation – Provide interesting features for the annotation • Toos – M-Onto. Mat-Annotizer(M = multimedea) • Supplies a graphical interface – Loading and processing visual content(images and videos) – Extract visual features and association with domain ontology concepts – VIA(Video Image Annotatioin Tool) • Allow users to import its descriptors from predefined OWL ontologies – SVAT(Semantic Video Annotation Tool) • Enables film analysts • Support a plug in – Project Pad • A web based system for collaborative media annotation

Video annotation survey • Survey was divided into 5 sections – First • About

Video annotation survey • Survey was divided into 5 sections – First • About used multimodal video annotation tools • Aimed to inquire about the users level of satisfaction – Which features should be modified or added – Second • About schemas used for annotation creation – Annotation sharing – Third, forth • Aimed to inquire about the level of knowledge of – about the semantic web – The existing semantic video annotation tool mentioned above • To evaluate the applicability of semantic web technique – Final • Present a simple demonstrative seb site based on the faceted browsing paradigm

Surveys shows a remarkable satisfaction of the participants about the software they use Tools

Surveys shows a remarkable satisfaction of the participants about the software they use Tools - The result of a long and continuous development process difficulty – deal with many annotation layers , intrinsic complexity dissatisfactory – search funtionalities, user permission missing

Many of them use more than one schemas

Many of them use more than one schemas

Data transfer - mass storage and email - video sharing services, is ignored(e. g

Data transfer - mass storage and email - video sharing services, is ignored(e. g youtube)

A roadmap towards semantic multimodal annotation • Survey results – Multimodal video annotation doesn’t

A roadmap towards semantic multimodal annotation • Survey results – Multimodal video annotation doesn’t require big improvement regarding the creation of the annotation – The efforts should rather focus in • The encoding and the management of the created annotations • Aimed – – • enhance the effectiveness of the search queries Promote a larger annotation sharing The application of semantic web techniques – Particularly suitable for this purpose • Semantic multimodal video annotation – Implement an efficient web based architecture for annotation management and sharing • Aim – the actual data encoding based on XML doc structured according to XML Schema • Evolved towords semantic data encoding based on RDF instances – – In adition Allows to express the semantic of the annotatioin in a univocally machine format • Rising the existing annotation schemas to the level of ontologies – Create a mapping betweenn different existing multimodal schemas

A roadmap towards semantic multimodal annotation • Not necessary to develop brand new desktop

A roadmap towards semantic multimodal annotation • Not necessary to develop brand new desktop application – Only to provide tools • convert the annotations and annotation schemas in a semantic aware format • Several techniques and tools has been developed – Using standard XML technology XSLT(extensible stylesheet language transformation) • Xml-based language • Transformation – XML doc -> other XML doc • The original doc is not changed – – A new doc is created based on the existing one Granting full compatibility with the existing software • Such technique – – • Implement a standalone application Plug-in for existing open source software For users – Use usual annotation tools and the annotation schemas – Just a simple additional step • Make the annotations available through the web

A roadmap towards semantic multimodal annotation • Once that – the annotations are encoded

A roadmap towards semantic multimodal annotation • Once that – the annotations are encoded in a semantic aware format and published on the web • A web application can be developed – – Web Applications • Display all the created annotation – – • • Providing different views to organize the annotations e. g. Timelines or tabular representations Faceted menus – – Automatically created based on the properties of the ontologies Used to intuitively filter the annotations adding or removing constrains on the faceted properties Web interfaces – – Using the faceted browsing paradigm to » display and browse the videos and the annotations » Perform advanced queries Should provide to the common functionalities of the desktop video annotation tools User and group management • a de facto standard feature of today web application • • But is completely missing in common desktop applications for multimodal video annotation An effective handling for different users and groups permissions for the management of the published video annotation – – Well supported by the web architecture Can be easily implemented for the Web applicaton

conclusion • Multimodal video annotation – Complex task – Regarding several fiels • Human

conclusion • Multimodal video annotation – Complex task – Regarding several fiels • Human communication • Emotions • Gesture research – Intro of the application of semantic web techniques • Can enhance the task – On the base of the results of their survey • Trace a roadmap – Toward the implementation of semantic multimodal video annotation – Efficient management and sharing – Purpose • Puch forward the productivity of M-V-A task • Enhance the accessibility of videos and annotations • Promoting their sharing – Available on the web as a interconnected knowledge base

feeling • The GUI of these tools need to be learned by common users

feeling • The GUI of these tools need to be learned by common users for some time to familiar with them • Desktop application connecting with web interface