OIS Andreas Wagner CERN ITOIS Eduardo Alvarez CERN
- Slides: 38
OIS Andreas Wagner – CERN IT/OIS Eduardo Alvarez – CERN IT/OIS Sergio Fernandez – CERN IT/OIS CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it
OIS Summary • Introduction to search • Inside CERN Search • New Search Solution – Concepts, collections, pipelines, stages, architecture – Search features • Demo • Conclusions and future work CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Presentation Title - 2
OIS What is Search? • Search is the art of balancing three factors: – Recall • How many matching documents were returned? – Precision • Of returned documents, how many match the query? – Relevancy • How well does a document match the query? CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Presentation Title - 3
OIS Enterprise Search • Wide range of document sources: • • • Web Pages File systems Databases Directories (People and Places) Document repositories (CDS, EDMS, Indico, …) • Structured CMS Data • Sharepoint, Drupal, Twiki • Variety of meta data • Different Access Protection Schemes • Different retrieval methods and frequencies CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it CERN Search - 4
OIS Enterprise Search • Components of Enterprise Search: – Search Engine / Search Technology – Integration within existing infrastructure (authentication, authorization) – Document retrieval collaboration with data owners • Not only Web pages • Database/XML data (CDS, Indico, Phone data) – Protected documents collaboration with data owners • Access for document data • In addition information about ACLs needed – Ranking of documents CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it collaboration with data owners – Enterprise Search is not only a question about the search technology used! CERN Search - 5
OIS What about Google • What makes Google Web search so good – Huge Web space analysis capabilities, – Huge usage data used for “voting” the results most popular results are promoted – Substantial resources to tune and correct results; - usage data analysis - taking into account popular events - hand edited results for popular single key word searches – Personalize filter of results • Based on : Location, Preferences, search historial, … • Above is valid for all public web search engines, Yahoo, Bing CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it • At the same time Web Search is not Enterprise search! CERN Search - 6
OIS Summary • Introduction to search • Inside CERN Search • New Search Solution – Concepts, collections, pipelines, stages, architecture – Search features • Demo • Conclusions and future work CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it CERN Search - 7
OIS Search at CERN? • Why Search Service? If… – Every systems usually has its own search system • Probably one of the best place for this service • Quite a lot different content sources • High rate of new content • Solutions are not always optimal • Centralize the search of content CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Presentation Title - 8
OIS CERN Search • A Central Search solution to provide • for users – Single entry point for searching information on several content sources at the same time • for service providers – Search backend service » TWiki, Drupal, Sharepoint, JACOW, Groups • Start of project in February 2006: • Based on commercial product from FAST • (Microsoft subsidiary and market leader) • CERN Search in production since 2007 • Present resources 1 PJAS & some fraction of a staff CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it CERN Search - 9
OIS CERN Search Last Progress • 2009 Migration to FAST ESP 5. 3 • 2010 Reorganization of the Indexed Web Space (Improved relevancy) • 2010 – Twiki protected pages indexed – Service used as default Twiki search • 1 Q 2011 – Indico Protected Docs + Material • 1 Q 2011 – Index of the Sharepoint content • 3 Q 2011 – Migration to FAST Search Server 2010 for Sharepoint CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Presentation Title - 10
OIS Overview of Indexed documents Documents indexed by CERN Search 2011 2010 2009 2008 CERN Websites 1561670 1456510 1787805 829542 CDS 1116921 1048360 1040694 936018 TWiki Pages 68055 60796 --- Indico 1531908 311208 255365 432339 JACOW 169204 144388 --- Phonebook 26426 29819 25629 23982 Sharepoint 6721347 --- --- Central SSO Websites --- --- Total 11 million 3 million 2 million (e. Space/Groups Archive) CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it CERN Search - 11
OIS Summary • Introduction to search • Inside CERN Search • New Search Solution – Concepts, collections, pipelines, stages, architecture – Search features • Demo • Conclusions and future work CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it CERN Search - 12
OIS Concepts I • Document • Pipeline • Processing Stage • Collection • Crawler (Files, Web) Collection A Collection B CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Presentation Title - 13
OIS Concepts II Document Content Flow Document retrieval CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Filter API Content API Query API Connectors (Push&Pull) Document processing Document indexing CERN Search - 14
OIS Indexing Protected Content I • To allow indexing protected content we need to • Retrieve the document • Search engine needs access to document • Obtaining document ACLs • To be able to decide who is allowed to find a document • Often not trivial since most systems answer the question: “Has a given user the right to access a given document? ” and not “Tell me who has access to a given document? ” This is due to often complex permission models including inheritance, fine granularity of permissions and changing permission during document lifecycle … CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it CERN Search - 15
OIS Indexing Protected Content II • Document Processing • Resolve ACLs to SIDs • Sent to Indexer with document • FSA (FAST Security Authorization) Component • Active Directory integration, i. e. based on CERN accounts and e-groups CERN Search Document Repository CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Document ACL Document Processing Doc + ACL Search Index Active Directory Users & Groups CERN Search - 16
OIS Authentication / Authorisation Search Index Query & Identity Search Front End CERN Search Authentication (SSO) & Search Gr o up M em be r sh • Query Processing • Authentication by Front-End • FSA creates filter with expanded user credentials and groups CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it ip Active Directory Users & Groups CERN Search - 17
OIS CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it FAST Search for Sharepoint Cluster Architecture Presentation Title - 18
OIS Index Profile • Final representation of each document • Set of attributes to index (Managed Prop) – – • • CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Title Author Last modified date ACLs Define properties queryables, refiners, sort Define Full. Text. Index Properties Define mappings to Full. Text. Index Flexible Presentation Title - 19
OIS CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Result Ranking – Rank Profiles CERN Search - 20
OIS Ranking Issues at CERN • Flat Web space – Lack of metadata (Copy-Paste, not well meta html tags, . . . ) – Isolated sites (not many inter-links, only CERN main page) • Good experience with well structured content – Indico, CDS • How to improve ranking? – Manual Tuning of results, promote, demote – Modify rank profile – Custom processing stage for static rank points • Not easy, CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it – Manpower intensive – Better understand of data indexed – Not magic solution, balance rank profile for different collections CERN Search - 21
OIS Changes on FAST ESP products • Before only one product – FAST ESP 5. 3 (Standalone product) • Now, several possibilities – FAST • FAST Search Server 2010 for Internal Applications (FSIA) • FAST Search Server 2010 for Internet Sites (FSIS) – Microsoft + FAST • FAST Search Server 2010 for Sharepoint (FS 4 SP) – Same core – Configuration and OTB pipeline adapted for Sharepoint – Reduced set of tools, others migrated to Sharepoint or Powershell cmdlets CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Presentation Title - 22
OIS CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it FAST Search for Sharepoint Arquitecture Overview Presentation Title - 23
OIS FAST Search for Sharepoint Topology Sharepoint Crawler Sharepoint Sites Web Sites File Shares Exchange public folders Lotus Notes FAST Enterprise Crawler Search Centre CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Presentation Title - 24
OIS Server Architecture • Two systems (Production + Dev) • Using Sharepoint Central Service • Production – 1 admin node – 1 crawler + pre-processing node – 4 nodes index cluster • Both roles Indexer and Search • 2 rows – Backup – Query performance • 2 columns – Easy handle more than 30 million documents – High reliability on critical components • Content Distributors, Query. Servers, Document Processors CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Presentation Title - 25
OIS Fast Search for Sharepoint New features (I) • New Query Suggestions model – Based on dictionary and common user queries • Best Bets & Visual Best Bets • Custom search experience (per user/role) • New management system (microsoft style) – SCOM, Powershell, … • Sharepoint integration • Phonetic and nickname search • Thumbnails and previews in results CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Presentation Title - 26
OIS Fast Search for Sharepoint New features (II) • Entity extraction • Office Web Apps integration • Relevance improvements with social behaviour – Click-through relevancy • Enhanced Results Refinement – Deep results refinement – Based on any managed properties – Similar results CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it • Federation Search Presentation Title - 27
OIS Migration Process • • Migrate Pipelines Adapt Retrieval and Pre-processing scripts Port Custom processing stages Migrate feed process to use Sharepoint Crawlers (Files Shares) • Customize Search Centre to offer same functionality than old system • Create general helpers tools – Manage index profile – Manage keywords, best bets, … CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Presentation Title - 28
OIS Examples • Best Bets & Visual Best Bets CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Presentation Title - 29
OIS Examples • Visual Refiners CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Presentation Title - 30
OIS Examples • Federation search examples (google, bing, twitter) CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Presentation Title - 31
OIS CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Search Driven Application Presentation Title - 32
OIS Summary • Introduction to search • Inside CERN Search • New Search Solution – Concepts, collections, pipelines, stages, architecture – Search features • Demo • Conclusions and future work CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it CERN Search - 33
OIS Summary • Introduction to search • Inside CERN Search • New Search Solution – Concepts, collections, pipelines, stages, architecture – Search features • Demo • Conclusions and future work CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it CERN Search - 34
OIS Conclusions • Succesfully migrated all the content from old system – Experience in the same technology • Reduced tools and help for other content than Sharepoint • But, – New interesting features, Sharepoint integration – Complete Search Centre CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it • More community behind • High cohesion between Sharepoint and Search Services Presentation Title - 35
OIS Next Steps • Integration with Drupal – Customized pre-processing, index and query • Index SSO Centrally Manage Sites – Own SSO Crawling, Get ACLs, processing • Continue evolving the new system – Take advantage all FS 4 SP features • Office Web. Apps, Visual Refiners, phonetic search, . . . – Together with content providers improve • Relevancy, Best Bets, . . . CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Presentation Title - 36
OIS CERN Search @ • CERN Search: http: //cern. ch/search • and also via: – CERN Intranet & Public Pages – TWiki – IT, HR, PH Websites – JACOW CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it CERN Search - 37
OIS Questions ? CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it CERN Search - 38
- Itoois
- Rodrigo álvarez zenteno
- Names/nombres discussion questions
- Alonso alvarez de pineda
- Jose antonio alvarez trillos
- Alonso alvarez de pineda route
- Alfonso alvarez marcos
- Claudia johana rodríguez
- Cabeza de vaca
- Ascevedo
- Lissette alvarez cuellar
- Dr alejandra alvarez
- Chemický a farmaceutický priemysel
- Etapas del diseño curricular
- Us v alvarez
- New clothes julia alvarez
- Carlos alvarez icann site: linkedin.com
- Geni alvarez
- Enrique alvarez rodrich
- Triangulo del bullying
- Alonso alvarez de pineda quotes
- Angelica alvarez orlando
- Alejandra alvarez del castillo
- Gabe alvarez stanford
- Kendall alvarez
- Selenicarbonato ferroso
- Jones academy chicago
- Age of exploration vocab
- Maria teresa alvarez moreno
- Aaron alvarez
- Tygeciclina
- Hijos de sergio y estibaliz
- Hematoma epidural arteria
- Antojos julia alvarez
- My first free summer julia alvarez pdf
- Ois ttu
- Ois tasa
- Jhu cpt
- Ois curve