Multimedia search engine Michal Krsek UISK Charles University
- Slides: 20
Multimedia search engine Michal Krsek, UISK Charles University at Prague & CESNET Ivan Doležal, CESNET Michal Illich, Jyxo
Electronic Media • TV & radio • Organized in channels • Zero democracy in programming (by channel management) • Centralized production (big guys business)
Internet • Not only web (audio/video and others) – remember archie. sura. net? • IPTV / Live / Video on demand • Navigation only via web => not easy to find specific program in A/V
Search options I • Voice recognition – Language identification – Accents • Video recognition – Text interpretation (bush vs. Bush) – Low video quality
Search options II • Indexing of web pages – Yahoo! does (google bomb target) Metadata – “Out of the band Metadata” (as in librarian world) – Metadata in files (added during editing or encoding)
Project description • • • Started in 2003 (oh yes, one year before Truveo) “Google for audio and video on Internet” No support from content owners Modular concept Start with. cz Internet
Technical description I • Crawler – Crawls web and collects addresses (URL) – Exports URL of multimedia files – Software written by Jyxo (Linux console app)
Technical description II • Distiller – Imports addresses of multimedia files – Distills metadata (and makes XML files) – Makes screenshots (if video in file) – C# software and mplayer (windows apps) – Runs in distributed environment
Technical description III • Database – Imports XML metadata files to full text DB – Responses back-end queries for web queries – And others fulltext things (i. e. language)
Crawls webpages crawling Gets addresses Filter A/V adresses distillation Gets metadata from multimedia files indexing search Holds fulltext database Provides back end for querries www. yournamehere. edu
Distillation • Proces description – Get URL from DB – Get metadata from file available at URL – Get screenshots at 1, 30, 50 sec – Save metadata & screenshot
Distillation • Use of win 32 applications – Native players (WMP, RP, Qt) for metadata – Mplayer for screenshots • Takes average one minute – Slow servers/bandwidth – Streaming without fast fw
Distiller. GRID • <= need 16 years to distill 8. 500. 000 URLs • Ideal application for GRID computing – Not need of real time response – Huge amount of computing time needed • Two ways to create GRID – Build dedicated system – Use of current capacities
Computing machines • PC/Windows based • HW independent • Secure environment – Security of hosting system – Security of distillation process • Well connected • Not needed to run 24 x 7 • Easy to manage
Configuration • ~100 PCs in student labs • Running on demand during weekends • Virtual machines (MS VPC 2004) in hosting system (Win XP) • Three different HW configurations • Peak rate about 5000 URLs per minute • SQL as background -> pull distribution of work
Actual status I • HW – 20 crawlers – 2 servers for fulltext DB (<1. 400 USD) – Distillation stations (X office PC) – Connected by 1 Gb/s to CESNET 2 -> GEANT 2
Actual status II • Database – EU +. com, . edu – > 13. 000 URLs – > 8. 000 valid – > 2. 800. 000 with screenshots
Live show?
Want to test? • URLs – http: //multimedia. jyxo. cz – http: //videoserver. cesnet. cz/videoarchiv_en. php – For XML interface send me e-mail
Questions ? Comments ? Michal Krsek, Michal. Krsek@cesnet. cz (academic service, cooperation) Michal Illich, michal@illich. cz (business service)
- Libor krsek
- Multimedia search engine
- Internal combustion engine vs external combustion engine
- Charles manson charles luther manson
- Multimedia becomes interactive multimedia when
- Non linear multimedia meaning
- Csc 253
- Esa multimedia.esa.int./multimedia/virtual-tour-iss
- Equation search engine
- How much is asi membership
- Sebutkan contoh tugas individu
- Goto search engine
- The anatomy of a large scale hypertextual web search engine
- Oogoogle translate
- Difference between web browser and search engine
- Diagram
- What are the four components of a search engine
- Anatomy of a search engine
- Library book drawing
- Trellian keyword discovery tool
- Adult search engine marketing