Multimedia Segmentation and Summarization Dr JiaChing Wang Honorary
- Slides: 44
Multimedia Segmentation and Summarization Dr. Jia-Ching Wang Honorary Fellow, ECE Department, UW-Madison
Outline n Introduction n Speaker Segmentation n Video Summarization n Conclusion Multimedia Segmentation and Summarization 2 / 47
What is Multimedia? n Image n Video n Speech n Audio n Text Multimedia Segmentation and Summarization 3 / 47
Multimedia Everywhere n Fax machines: transmission of binary images n Digital cameras: still images n i. Pod / i. Phone & MP 3 n Digital camcorders: video sequences with audio n Digital television broadcasting n Compact disk (CD), Digital video disk (DVD) n Personal video recorder (PVR, Ti. Vo) n Images on the World Wide Web n Video streaming, video conferencing n Video on cell phones, PDAs n High-definition televisions (HDTV) n Medical imaging: X-ray, MRI, ultrasound n Military imaging: multi-spectral, satellite, microwave Multimedia Segmentation and Summarization 4 / 47
What is Multimedia Content? n Multimedia content: the syntactic and semantic information inherent in a digital material. n Example: text document l Syntactic content: chapter, paragraph l Semantic content: key words, subject, types of text document, etc. n Example: video document l Syntactic content: scene cuts, shots l Semantic content: motion, summary, index, caption, etc. Multimedia Segmentation and Summarization 5 / 47
Why We Need to Know Multimedia Content? n Why we need to know multimedia content? l Information processing, in terms of archiving, indexing, delivering, accessing and other processing, require in-depth knowledge of content to optimize the performance. Multimedia Segmentation and Summarization 6 / 47
How to Know Multimedia Content? n How to Know Multimedia Content? l Multimedia content analysis 4 The computerized understanding of the semantic/syntactic of a multimedia document n Multimedia content analysis usually involves l Segmentation 4 Segmenting l Classification 4 Classifying l each unit into a predefined type Annotation 4 Annotating l the multimedia document into units the multimedia document Summarization 4 Summarizing Multimedia Segmentation and Summarization the multimedia document 7 / 47
Multimedia Segmentation and Summarization n Multimedia segmentation l Syntactic content n Multimedia summarization l Semantic/syntactic content n The result of the temporal segmentation can benefit the video summarization Multimedia Segmentation and Summarization 8 / 47
Multimedia Segmentation n Image segmentation n Video segmentation Scene change, shot change n Audio segmentation l Audio class change n Speech segmentation l Speaker change detection n Text Segmentation l word segmentation, sentence segmentation, topic change detection l Multimedia Segmentation and Summarization 9 / 47
Multimedia Summarization n Image summarization Region of interest Video summarization l Storyboard, highlight Audio summarization l Main theme in music, Corus in song, event sound in environmental sound stream Speech summarization l Speech abstract Text summarization l Abstract l n n Multimedia Segmentation and Summarization 10 / 47
What is Speaker Segmentation? n It can also be called speaker change detection (SCD) n Assumption: there is no overlapping between any of the two speaker streams speaker 2 speaker 1 Multimedia Segmentation and Summarization 11 / 47 speaker 3
Supervised v. s. Unsupervised SCD n Supervised manner: acoustic data are made up of distinct speakers who are known a priori l Recognition based solution n Unsupervised manner: no prior knowledge about the number and identities of speakers l Metric-based criterion l Model selection-based criterion Multimedia Segmentation and Summarization 12 / 47
Supervised Speaker Segmentation -- Gaussian Mixture Model n Gaussian mixture modeling (GMM) x is a d-dimensional random vector. , i=1, …, M is the mixture weight. , the mean vector. , the covariance matrix. n Incoming audio stream is classified into one of D classes in a maximum likelihood manner at time t Multimedia Segmentation and Summarization 13 / 47
Supervised Speaker Segmentation -- Hidden Markov Model Multimedia Segmentation and Summarization 14 / 47
Unsupervised Speaker Segmentation -- Sliding Window Strategy & Detection Criterion n Metric-based criterion (The dissimilarities between the acoustic feature vectors are measured) l Kullback-Leibler distance l Mahalanobis distance l Bhattacharyya distance n Model selection-based criterion l Multimedia Segmentation and Summarization 15 / 47 Bayesian information criterion (BIC)
Bayesian Information Criterion n Model selection l Choose one among a set of candidate models Mi , i=1, 2, . . . , m and corresponding model parameters to represent a given data set D = (D 1, D 2, …, DN). n Model Posterior Probability n Bayesian information criterion Maximized log data likelihood for the given model with model complexity penalty l Bayesian information criterion of model Mi l where di is the number of independent parameters in the mode parameter set Multimedia Segmentation and Summarization 16 / 47
Unsupervised Segmentation Using Bayesian Information Criterion n First model n Second model n Bayesian information criterion Multimedia Segmentation and Summarization 17 / 47
Disadvantages of Conventional Unsupervised Speaker Change Detection Disadvantage: n For metric based methods, it’s not easy to decide a suitable threshold n For BIC, it’s not easy to detect speaker segment less than 2 seconds Multimedia Segmentation and Summarization 18 / 47
Proposed Method -- Misclassification Error Rate n Sliding window pairs n Feature vector distribution Same speaker Multimedia Segmentation and Summarization Different speakers 19 / 47
Mathematical Analysis Multimedia Segmentation and Summarization 20 / 47
Mathematical Analysis Multimedia Segmentation and Summarization 21 / 47
Discussion n Generative and discriminant classifiers are both applicable n Key Point: Discriminant classifiers have the benefit that smaller data are required l We can have smaller scanning window size l The ability to detect short speaker change segment increases Multimedia Segmentation and Summarization 22 / 47
Speaker Segmentation Using Misclassification Error Rate n Steps l Preprocessing 4 Framing, Feature extraction l Hypothesized speaker change point selection l Forcing 2 -class labels l Training a discriminat hyperplane l Inside data recognition & calculating misclassification error rate l Accept/reject the hypothesized speaker change point n Significance l The unsupervised speaker segmentation problem is solved by supervised classification Multimedia Segmentation and Summarization 23 / 47
Experimental Results EXPERIMENTAL RESULTS Method F-score Precision Recall Proposed 71. 8 70. 2 81. 3 BIC 63. 3 54. 4 75. 7 Multimedia Segmentation and Summarization 24 / 47
Video Summarization n Dynamic v. s. Static Video Summarization l Dynamic video summarization 4 Sport l highlight, movie trailer Static video summarization 4 Storyboard – Visual-based approach – Incorporation of the semantic Information Multimedia Segmentation and Summarization 25 / 47
Static Video Summarization -- Visual Based Approach n Example n Problem l Is the summarization ratio adjustable? l How to generate effective storyboard under a given summarization ratio? Multimedia Segmentation and Summarization 26 / 47
How to Generate Effective Storyboard n Question: Assume there are n frames and the summarization ratio is r/n. How do we select the best r frames ? n Complexity: l There are C(n, r) different choices Multimedia Segmentation and Summarization 27 / 47
How to Generate Effective Storyboard n In visual viewpoint l Most visually distinct frames should be extracted l Dissimality between two frames is measured by low level visual features n How to select best r frames from n frames l Solution: maximize the overall pairwise dissimilities l Complexity: C(n, r) x C(r, 2) l Unfeasible: C(n, r) is usually huge n Fact l Human beings usually browse a storyboard in a sequential way n Optimal solution in a sequential sense l Maximize the sum of dissimilities from sequential adjacent images in a storyboard Multimedia Segmentation and Summarization 28 / 47
How to Maximize the Dissimality Sum of the Extracted Images n Lattice-based representative frame extraction approach l Extract key component from temporal sequence l Dynamic programming can be applied n Example: how to select the best 4 images from an 8 -image sequence Multimedia Segmentation and Summarization 29 / 47
How to Maximize the Adjacent Dissimality Sum of the Extracted Images n Original images: O(1), O(2), O(3), O(4), O(5), O(6), O(7), O(8) n Extracted images: E(1), E(2), E(3), E(4) n E(1) ← O(i); E(1) ← O(j); E(1) ← O(k); E(1) ← O(l); where i < j < k < l n Each legal left-to-right path represents a way to extract images n Each transition results in an adjacent dissimality n In this example, the adjacent dissimality sum of the extracted images are D[ O(1), O(3) ] + D[ O(3), O(4) ] + D[ O(4), O(7) ] Multimedia Segmentation and Summarization 30 / 47
How to Maximize the Adjacent Dissimality Sum of the Extracted Images Multimedia Segmentation and Summarization 31 / 47
Complexity Comparison n Select 4 images from an 8 -image sequence l Lattice-based approach 4 45 l dissimality comparison Optimal approach 4 420 dissimality comparison Multimedia Segmentation and Summarization 32 / 47
Segment-Based Solution Multimedia Segmentation and Summarization 33 / 47
Experimental Results Multimedia Segmentation and Summarization 34 / 47
Incorporation of the Semantic Information n Conventional l The static summarized images are extracted in accordance with low level visual features n Disadvantage l It’s difficult to catch the main story without the support of semantic significant information n We present a semantic based static video summarization l Each extracted image has an annotation l Related images are connected by edge l Using ‘who’ ‘what’ ‘where’ ‘when’ to list all extracted images Multimedia Segmentation and Summarization 35 / 47
The Proposed Architecture n Shot annotation: mapping visual content to text n Concept expansion: It provides an alterative view and dependency information while measuring the relation of two annotations. n Relational graph construction Multimedia Segmentation and Summarization 36 / 47
Concept Tree Construction n The concept tree denotes the dependent structure of the expanded words n Meronym l n ‘Wheel' is a meronym of 'automobile'. Holonym l ‘Tree' is a holonym of 'bark', of 'trunk' and of 'limb' n Pencil used for Draw n Salesperson location of Store n Motorist capable of Drive n Eat breakfast Effect of Full stomach Multimedia Segmentation and Summarization 37 / 47
Concept Tree Reorganization n Who: names of people, subset of "person" in Word. Net n Where: "social group, " "building, " and "location " in Word. Net n What: " All the other words which do not belong to "who" and "where" n When: searching for time-period phrase Multimedia Segmentation and Summarization 38 / 47
Relational Graph Construction -- Relation of Two Concept Trees n The relation of the two concept trees n The relation of the two roots n The relation of the two children Multimedia Segmentation and Summarization 39 / 47
Relational Graph Construction -- Remove Unimportant Vertices and Edges n Remove edges with smaller weighting, i. e. lower relation n Remove vertices with smaller term frequency – inverse document frequency (TF-IDF) Multimedia Segmentation and Summarization 40 / 47
The Final Relational Graph n Comparison with conventional storyboard Multimedia Segmentation and Summarization 41 / 47
Conclusion n A novel speaker segmentation criterion is proposed Misclassification error rate The unsupervised speaker segmentation problem is solved by supervised classification with label-forcing Discriminat classifier makes the proposed approach be able to have smaller scanning window size l The ability to detect short speaker change segment increases Two new static video summarization approaches are proposed Lattice-based representative frame extraction l Merely using low level visual features l The summarization ratio is adjustable l Under a given summarization ratio, the dissimality sum from sequential adjacent images is minimized Concept-organized representative frame extraction l Incorporating semantic information l Mining the four kinds of concept entities: who, what, where, and when l People can efficiently grasp the comprehensive structure of the story and understand the main points of the contents l n n n Multimedia Segmentation and Summarization 42 / 47
Future Work n Multimedia segmentation l Speech segmentation l Audio segmentation l Video segmentation n Multimedia summarization l Video summarization 4 Static, dynamic l Speech summarization l Audio summarization Multimedia Segmentation and Summarization 43 / 47
Thank all of you for your attendance!
- Ieee senior member reference letter
- The honorary shepherds
- 4 types of ffa degrees
- Newton's first law of motion statement
- Entity summarization
- Text summarization vietnamese
- Offshore medical summaries
- Text summarization vietnamese
- Text summarization vietnamese
- Abstractive summarization
- Types of multimedia
- Multimedia becomes interactive multimedia when
- Chapter 1 introduction to multimedia
- Esa multimedia.esa.int./multimedia/virtual-tour-iss
- Wang peng li you
- Regina wang md
- Stealing hyperparameters in machine learning
- Yongge wang
- Caroline wang photovoice
- Amos wang
- Arahan perbendaharaan 60
- Maksud panjar wang runcit
- Huazheng wang
- Akta wang tak dituntut 1965 (pindaan 2002)
- Holtek semiconductor inc
- Hongning wang
- Dr xia wang
- Dr kenneth wang
- Minmei wang
- Dr wang lifen
- Social media trend analysis
- Dr john wang
- Ryan wang hsbc
- Guanhua wang
- Landy wang microsoft
- Jyhwen wang
- Dr robert wang
- Wang
- Annie wang photographer
- Tom wang masterclass
- Shenghui wang
- Jumlah denominasi wang kertas malaysia
- Master dax
- Wenguang wang
- Zhaoyuan wang