MPEG-7 Video Retrieval using Bayesian Networks Luis M. de Campos Juan M. Fernández-Luna Juan F. Guadix Departamento de Ciencias de la Computación e Inteligencia Artificial E. T. S. I. Informática Universidad de Granada

Introduction Brief overview about our work on the design of a search engine based on Bayesian Networks to retrieve MPEG-7 videos using their text annotations.

Overview § Preliminaries: introduction to • Information Retrieval • MPEG-7 standard • Bayesian Networks § MPEG-7 video retrieval models based on Bayesian networks.

Preliminaries (I) – Information Retrieval is concerned with the representation, storage, organisation and accessing of information items. Indexing + Querying + Retrieval

Preliminaries (II) – MPEG-7 Multimedia Content Description Interface Standard to describe multimedia content using metadata: • The content of a multimedia file: concepts, objects in movement, who is speaking, . . . • Aspects related to the management of the content, i. e. , duration, structure, format and size of the file, number of frames per shot, . . . Tools: • Descriptors, Schemes, Data Definition Language.

Preliminaries (III) – MPEG-7 • Descriptors: elements, data representation. (Time to represent a duration, histogram to represent a colour or a string to represent a title) • Schemes: structure and semantic of the relationships among elements. (A film divided into scenes and shots, including textual description in the scene level and description about colour, movement and audio amplitude in the shot level) • DDL (Data Definition Language): Language to extend or modify the previous set of tools. It is a variety of XML Schema. Therefore, descriptions files are XML files.

Preliminaries (IV) – MPEG-7 <? xml version="1. 0" encoding="UTF-8"? > <Video. Segment id="shot 1_2"> <Mpeg 7> <Media. Time> <Description xsi: type="Content. Entity. Type"> <Media. Time. Point>T 00: 03: 22112 F 30000</Media. Time. Point> <Multimedia. Content xsi: type="Video. Type"> <Media. Duration>PT 9 S 18288 N 30000 F</Media. Duration> <Video id="1"> </Media. Time> <Text. Annotation confidence="0. 500000"> <Media. Time. Point>T 00: 00: 0 F 30000</Media. Time. Point> <Free. Text. Annotation> <Media. Duration>PT 16 M 33 S 11772 N 30000 F</Media. Duration> Collin Powell is speaking about the USA position </Media. Time> in the Iraq crisis. <Temporal. Decomposition gap="false" overlap="false"> <Video. Segment id="shot 1_1"> </Free. Text. Annotation> </Text. Annotation> <Media. Time> </Video. Segment> <Media. Time. Point>T 00: 00: 0 F 30000</Media. Time. Point> <Media. Duration>PT 3 S 22112 N 30000 F</Media. Duration> </Temporal. Decomposition> </Video> </Media. Time> </Multimedia. Content> <Text. Annotation confidence="0. 500000"> </Description> <Free. Text. Annotation> A tv presenter is reporting information about a meeting of the security council in UN. </Free. Text. Annotation> </Mpeg 7> </Text. Annotation> </Video. Segment>

Preliminaries (V) – MPEG-7 From the point of view of IR, the structure of a video is seen conceptually: Vídeo Scene 1 Shot 2 Scene 2 Shot 3 Shot 4 Scene 3 Shot 5 Shot 6 Frame

Preliminaries (VI) – Bayesian networks § Graphical models able to represent and efficiently manipulate n-dimensional probability distributions. § The knowledge obtained from a problem is encoded in a Belief network by means of the quantitative and qualitative componets:

Preliminaries (VII) – Bayesian networks • Qualitative part: Directed Acyclic Graph G=(V, E): 1. V (Nodes) Random variables, and 2. E (Arcs) (In)dependence relationships. • Quantitative part: A set of conditional distributions: 1. Drawn from the graph structure, 2. representing the strength of the relationships, 3. stored in each node.

MPEG-7 Video Retrieval Models based on Bayesian Networks (I) Taking advantage of the structure of an MPEG-7 video: Video, Scenes, Shots, Frames And of free text annotation tags in the. xml file…

MPEG-7 Video Retrieval Models based on Bayesian Networks (II) T 1 Sh 1 T 2 Sh 2 S 1 T 3 T 4 Sh 3 T 5 T 7 Sh 4 T 8 T 9 Sh 5 T 10 T 11 Sh 6 S 2 Sh 7 T 12 Sh 8 S 3 V

MPEG-7 Video Retrieval Models based on Bayesian Networks (III) Assesment of probability distributions: • Prior probability in term nodes: p(ti)=1/M. • Probability distributions in the rest of nodes: P(U | pa(U)). Problem: Great number of parents. Solution: Probability functions.

MPEG-7 Video Retrieval Models based on Bayesian Networks (IV) 1. Query term instantiation. 2. Run a propagation algorithm: p(u | Q), U. 3. Generate a ranking. Problem: • Great number of nodes in the graph. • Complex topology. Solution: Evaluation of probability functions in each layer.

MPEG-7 Video Retrieval Models based on Bayesian Networks (V) In shots: In Scenes and Videos: where vij and wij: Exact propagation

MPEG-7 Video Retrieval Models based on Bayesian Networks (VI) Once a relevance probability has been assigned to each unit, Which units are offered to the user? Those which present an accurate context, wider enaugh to be a good response to the query. How? Transforming the Bayesian Network into a Influence Diagram

MPEG-7 Video Retrieval Models based on Bayesian Networks (VII) Sh 4 Sh 5 D 1 Sh 6 D 3 D 2 U 1 S 4 S 3 D 5 D 4 U 4 U 3 V 2 U 5

MPEG-7 Video Retrieval Models based on Bayesian Networks (VIII) Integrated Tool: • Video capture from tv. • Automatic annotations form subtitles. • Manual annotations based on ontologies. • Querying and obtaining the best units. • Automatic generation of a video with the results.

MPEG-7 Video Retrieval Models based on Bayesian Networks (IX) Lalmas and Graves´ model: V S 1 Sh 1 S 2 Sh 2 Media. Information. DS Media. Profile. DS Sh 3 Sh 4 Sh 5 Sh 6 Creation. Information. DS Media. Quality. DS Media. Format. DS bbc C 2 C 3 C 4 C 5 C 7 C 8 C 9 C 10 C 11 dog Creation bbc dog and

