Discovering Collocation Patterns from Visual Words to Visual















- Slides: 15
Discovering Collocation Patterns: from Visual Words to Visual Phrases Junsong Yuan, Ying Wu and Ming Yang CVPR’ 07 1
Discovering Visual Collocation 2
An exciting idea: detour • Related Work: J. Sivic et al. CVPR 04, B. C. Russell et al. CVPR 06, G. Wang et al. CVPR 06, T. Quack et al. CIVR 06, S. C. Zhu et al. IJCV 05, … 3
Confrontation • Spatial characteristics of images – over-counting co-occurrence frequency • Uncertainty in visual patterns – Continuous visual feature quantized word – Visual synonym and polysemy 4
Our Approach 5
Selecting visual phrases • Visual collocations may occur by chance • Selecting phrases by a likelihood ratio test: – H 0: occurrence of phrase P is randomly generated – H 1: phrase P is generated by a hidden pattern • Prior: • Likelihood: • Check if words are co-located together by chance or statistically meaningful 6
Discovery of visual phrases Closed Frequent Word-sets ( |P|>=2 ) A B F P C D E S A B F T AB AE C D E X AF A B D K ABF ABE FIM CD CE DE BE BF CDE pair-wise student t-test …… ranked by L(P) likelihood ratio AB 15. 7 AF 14. 3 ABF 12. 2 BF 10. 9 CD 9. 7 7
Frequent Itemset Mining (FIM) • If an itemset is frequent then all of its subsets must also be frequent 8
Phrase Summarization • Measuring the similarity between visual phrases by KL -divergence Yan et al. , SIGKDD 05 • Clustering visual phrases by Normalized-cut 9
Pattern Summarization Results Face database: summarizing top-10 phrases into 6 semantic phrase patterns Car database: summarizing top-10 phrases into 2 semantic phrase patterns 10
Partition of visual word lexicon Metric learning method: • Neighborhood component analysis (NCA). Goldberger, et al. , NIPS 05 – improve the leave-one-out performance of the nearest neighbor classifier 11
Evaluation • K-NN spatial group: K=5 • Two image category database: car (123 images) and face (435 images) • Precision of visual phrase lexicon – the percentage of visual phrases Pi ∈ Ψ that are located in the foreground object • Precision of background word lexicon – the percentage of background words Wi ∈ Ω− that are located in the background • Percentage of images that are retrieved: 12
Results: visual phrases from car category Visual phrase pattern 1: wheels Visual phrase pattern 2: car bodies different colors represent different semantic meanings 13
Results: visual phrases from face category 14
Comparison 15