IBM Software Group DB 2 Data Management Software

IBM Software Group | DB 2 Data Management Software DB 2 Net Search Extender IBM DB 2 Data Management March 2003 1 IBM DB 2 Net Search Extender © 2003 IBM Corporation

IBM Software Group | DB 2 Data Management Software Agenda § Overview of Search Products in IBM § Product Objectives: DB 2 Net Search Extender § Product Overview § Key Features § Positioning of the Text Extender family § Customer Scenarios § Future direction 2 IBM DB 2 Net Search Extender © 2003 IBM Corporation

IBM Software Group | DB 2 Data Management Software Products with Text Search in IBM Today Category Brokered Search Index. Based Search 3 Product Positioning IBM Lotus Extended Search A brokered, index-free search for parallel, distributed, heterogeneous search of specific content sources. Bundled with DB 2 II, II for Content, WP Web. Sphere Portal search Portal product which includes full-text search library designed for high precision search on small/mid-sized collections DB 2 Information Integrator for Content Federated text and parametric search for content and data sources. Web crawler for indexing web sites. Lotus Discovery Server Knowledge management system for full text search & expertise location DB 2 Net Search Extender DB 2 extension for fast, scalable full-text search with a SQL/MM dialect. For text stored in DB 2 and federated databases. Integrated to DB 2 Content Manager. IBM DB 2 Net Search Extender © 2003 IBM Corporation

IBM Software Group | DB 2 Data Management Software Objective of the DB 2 Net Search Extender § DB 2 Net Search Extender is recommended for applications that need: A full-text search to handle for example the demands of e-Business applications with important textual content – A relational database and a rich data schema to support the application requirements – § DB 2 Net Search Extender is: An extension to DB 2 designed to provide excellent text search capabilities for e-business applications – – Seamless integrated in SQL query language using an extension of Structured Query Language: SQL/MM (multi-media) § The DB 2 Net Search Extender is NOT: – – 4 An Internet search product like Google A generalized free text search product like Verity or Autonomy IBM DB 2 Net Search Extender © 2003 IBM Corporation

IBM Software Group | DB 2 Data Management Software Overview of DB 2 Net Search Extender § Provides a parallel, scalable, full-text search § Delivers excellent performance and scalability § Tailored for e-Business applications such as e-Commerce and § § 5 Content Management with text search requirements Works seamlessly with text documents contained in DB 2 and other federated databases Extends existing DB 2 applications easily by using standard extensions to SQL. Provides very fast indexing and dynamic index update which is the basis for a high speed search solution. Integrates with the DB 2 Control Center for seamless and easy to use administration IBM DB 2 Net Search Extender © 2003 IBM Corporation

IBM Software Group | DB 2 Data Management Software DB 2 Net Search Extender Key Features § Search functions: Options to refine the search process – Boolean operations – Proximity search for words in the same sentence or paragraph – "Fuzzy" searches for words having a similar spelling as the search term – Wildcard searches, using front, middle, and end masking – Thesaurus support to broaden the query – Search within sections within documents for more targeted search – Search on numeric attributes – Supports search in 37 languages – Highlight function 6 IBM DB 2 Net Search Extender © 2003 IBM Corporation

IBM Software Group | DB 2 Data Management Software DB 2 Net Search Extender Key Features § Search Results: Presentation of Search results and responsiveness – Set a result limit on queries where a high hit count is anticipated – Built-in SQL functionality is combined with the optimizer automatically to select the best optimization plan according to the expected search results – Order the results by the document score – Returns results quickly – a high performance search solution. § Search methods: Programming mechanisms tailored to different e. Business requirements – SQL function for general text search applications – SQL scalar search function – General text search on views and presorted indexes – SQL table-valued function – High performance dedicated text search – Text Search Stored Procedure 7 IBM DB 2 Net Search Extender © 2003 IBM Corporation

IBM Software Group | DB 2 Data Management Software SQL Search: SQL scalar search function § § The recommended search method - useful for most situations Use where standard SQL would be used Use when text search results are combined with other, different conditions Integrated with the DB 2 optimizer for excellent performance where JOIN of data is needed Arrows are data flows SQL scalar search Return results DB 2 Server “CONTAINS” Extract matching primary keys Index Join DB 2 table 8 IBM DB 2 Net Search Extender © 2003 IBM Corporation

IBM Software Group | DB 2 Data Management Software Text Search on Views: SQL table-valued function for search § Use where you would normally use an SQL scalar function, but you want to exploit text indexes on views or presorted text indexes. Text. Search table-valued search function Arrows are data flows Return results DB 2 “db 2 ext. textsearch” Server Extract matching primary keys Index Join DB 2 table 9 IBM DB 2 Net Search Extender © 2003 IBM Corporation

IBM Software Group | DB 2 Data Management Software High Performance, Dedicated Text Search Stored Procedures for Search § Use for high performance/high scalability applications that need text search-only queries § Use for queries that do not need to join text search results with the results of other complex SQL conditions. Arrows are data flows Text. Search stored procedure search DB 2 Server “db 2 ext. textsearch” Index Cache Columns in cache defined at text index creation DB 2 table 10 IBM DB 2 Net Search Extender © 2003 IBM Corporation

IBM Software Group | DB 2 Data Management Software DB 2 Net Search Extender Key Features § Indexing: Very fast indexing and dynamic index update is the basis for a high speed search solution – Provides fast indexing of large data volumes – Provides incremental updates of indexes – Indexes text documents stored in DB 2 and federated databases – Provides a choice of command line or interface through the DB 2 Control Center for indexing – Supports language-specific stopword lists to reduce the index size and search speed – Monitors the progress of indexing – Optional: supports presorted text indexes – Optional: provides caching of table columns in main memory at indexing time to avoid physical read operations at search time 11 IBM DB 2 Net Search Extender © 2003 IBM Corporation

IBM Software Group | DB 2 Data Management Software Indexing § Very fast indexing and dynamic index update is the basis for a high speed search solution update DB 2 Server “UPDATE INDEX…” Index Insert/Update/Delete read RDBMS tables trigger log table 12 read IBM DB 2 Net Search Extender Instance Services © 2003 IBM Corporation

IBM Software Group | DB 2 Data Management Software Indexing Performance with DB 2 Net Search Extender § DB 2 Net Search Extender shows excellent scalability and performance when it is used together with partitioned database setup. 13 IBM DB 2 Net Search Extender © 2003 IBM Corporation

IBM Software Group | DB 2 Data Management Software Summary § § § Search and information mining is a complex problem – The amount of accessible data (petabytes) – Diversity of sources, types & formats Despite heterogeneity, users would like seamless use of all kinds of information – – Parametric & Text Multilingual Without syntax/protocol differences And they want good results! We have core technologies – Historic trends are toward integrating technologies – Key IBM products are being extended with search and mining capabilities – Search and mining technologies are evaluated as standalone products as well as embedded components – There is a search product available to solve your business problem 14 IBM DB 2 Net Search Extender © 2003 IBM Corporation

IBM Software Group Backup charts © 2003 IBM Corporation

IBM Software Group | DB 2 Data Management Software More info n DB 2 V 8. 1 announcement: – at: Ÿhttp: //www. ibmlink. ibm. com/usalets&parms=H_202 -214 – found at: Ÿhttp: //www-3. ibm. com/software/data/db 2/udb/v 8/ n "What's new in DB 2" PDF document: –http: //www-3. ibm. com/software/data/db 2/udb/pdfs/db 2 q 0. pdf n DB 2 Net Search Extender web site with Data Sheet n http: //www-3. ibm. com/software/data/db 2/extenders/netsearch/ 16 IBM DB 2 Net Search Extender © 2003 IBM Corporation

IBM Software Group | DB 2 Data Management Software Positioning the three DB 2 Text-based Extenders n DB 2 Net Search Extender V 8 – for use with DB 2 UDB V 8 – The strategic product going forward – Improvements over both TIE and NSE V 7 capability – Merges the functionality of TIE and NSE V 7 products – Backward compatibility for DB 2 NSE V 7 and TIE V 7 applications n DB 2 Net Search Extender V 7 – Designed to support web site traffic – Uses faster underlying search engine than Text Extender – Caches all potential results – Data scalability only limited by physical memory – Less SQL functionality and flexibility than the Text Information Extender n DB 2 Text Information Extender (TIE) V 7. 2 – Uses same underlying search engine as NSE – Has the SQL flexibility of Text Extender n DB 2 Text Extender (TE) is the original text extender – Limited new investment in this Extender – High functionality but limited scalability 17 IBM DB 2 Net Search Extender © 2003 IBM Corporation

IBM Software Group | DB 2 Data Management Software DB 2 Net Search Extender - Formats and Languages n. The text document formats supported are: –HTML : Hypertext Markup Language (document models supported) –XML: Extended Markup Language (document models supported) –GPP: General Purpose format (aka flat text with user-defined tags, document models supported) –TEXT: Flat text –INSO: Plug in for Outside-In filtering software by Stellent n. Language support is defined as follows: –tokenization of textual data –applying language specific processing where required (e. g. "new paragraph" indicator for Hindi) –support for DBCS languages using the proven bi-gram approach for tokenization n. Language/Codesets as follows: – 19 Group One languages (English through Korean) – 15 Group Two languages (Arabic through Turkish) – 17 Group Three languages (Albanian through Vietnamese) – 5 Group Four languages (Indonesian through Telugu/India) 18 IBM DB 2 Net Search Extender © 2003 IBM Corporation
- Slides: 18