IBM Software Group Business Intelligence IBM Architecture et
® IBM Software Group Business Intelligence IBM Architecture et direction Isabelle Claverie-Bergé Certified IT Specialist IBM Software Group © IBM Corporation
IBM Software Group Un Investissement fort …. . Etendre la valeur de l’entrepôt de données § Famille DB 2 - $1 B Investissement dans l’Innovation §DB 2 V 9. 5 “Viper 2”, DB 2 RTI , § IBM Dynamic Warehouse , IBM Omnifind Analytics § IBM BCU, DB 2 Data Warehouse Edition V 9 Des solutions avec nos Partenaires Pour aider nos clients à se développer § 16, 000+ partenaires et un programme de support Partner. World § Consulting and System Integrators § Centre de solution BI à Dallas, Singapore, Tokyo, Hursley Rapprocher la Business Intelligence temps réel de l’utilisateur et intégration dans les SOAs § IBM DB 2 RTI § IBM Information Server § IBM MDM
IBM Software Group Absorber la croissance
IBM Software Group IBM DB 2 Multi-Partition Concept n One database can reside on several separate computers n Shared nothing (function shipping) u Each Partition accesses only its local Data n Several logical partitions can be on the same machine u Physical or Logical Partitioning is transparent to the database. n Database Catalog on partition 0, DB catalog cache on the other partitions n Fast communication needed (Gigabit Ethernet, Switch)
IBM Software Group Large Wireless Carrier Understanding customers in real time Business Challenge § 360 Degree Customer View § Unified Customer Contact Information § Churn prediction Solution § Warehouse w/ Near Real-time Feeds § Load over 1 B Call Records/Day (up to 1. 6 B) § 10 Billion transactions per day § 32 TB Raw Data § 1, 000 s of Concurrent Users § 7, 000 Customer Care Users § Up to 37000 queries/day § DB 2 DWE, SAS, 16 x 8 P 5 p. Series Business Benefits § Fraud Detection < 4 hours § Campaign Responses Up 66 -300% § Margin per Customer up 20% Technology Benefits DB 2 Continuous Data Load Call Data Records 171 TB inc. HA Mirror § Scale & Performance § Users § Volatility § Data
IBM Software Group DB 2 Delivers Data The Way You Need It Flexible data partitioning § DISTRIBUTE BY HASH World’s Richest Slice & Dice Capability § PARTITION BY RANGE § ORGANIZE BY DIMENSIONS Node 1 Node 2 Distribute Node 3 T 1 Distributed across 3 database partitions Partition TS 1 TS 2 Jan Feb North South North South East West East West (V 9) Organize Compress
IBM Software Group No Partitioning Data
IBM Software Group Distribute by Hash Divide & Conquer Parallelism P 1 P 2 P 3 P 4
IBM Software Group Hash + Partition by Range - Partition Elimination Massive Parallelism with Massive IO Reduction P 1 P 2 P 3 2 0 0 6 2 0 0 5 P 4
IBM Software Group Hash + Range + MDC+ Compression High density, High Value, Low IO Reads P 1 P 2 P 3 2 0 0 6 2 0 0 5 P 4
IBM Software Group IBM DB 2 Parallel Processing n Two types of Parallelism n Intra-partition parallelism => parallel processing within one partition n Inter-partition parallelism => operations are executed in parallel on each database partition n Scalabilité: SQL Query performance proportional to number of partitions (BW environment) n. Tous les ordres SQL UPDATE, DELETE, INSERT, JOINS, GROUP BY, INDEX/TABLE SCANS, SORT n. Tools: INDEX Creation, Backup and Restore, Table Reorganization SELECT. . . FROM. . . Inter – partition parallelism SELECT. . . FROM. . . Intra – partition parallelism process process Distributed Table Database Partition 0 Database Partition 1
IBM Information Management Introducing IBM Balanced Warehouse. TM A fast track to warehousing Balanced Warehouse SIMPLE FLEXIBLE IBM DB 2® OPTIMIZED Warehouse Simplicity Reliability & Performance Extended Insight Simplicity § Predefined configurations for reduced complexity. Configuration Unit (BCU) Balanced § One number to contact for complete Preconfigured, solution supportpretested allocation of software, storage and hardware to support a specified combination ofgrowth function and scale Flexibility for § Add BCUs to address increasing demands § Multiple on-ramps for different needs § Reliable, nonproprietary hardware for reusability Optimized performance § Preconfigured and certified for guaranteed performance Better than an appliance § Based on best practices for reduced risk © 2007 IBM Corporation
DB 2 Data Stream Engine Architecture Feed Handler Plug-ins Shared Memory DB 2 With DPF DB 2 Data Stream Engine Market Data Handler RFID Handler External Message Bus • Shared Memory Management. • Data Cache & Persistence • High Availability • Statistics Maintenance • Query processing DB 2 Backing Store DSE Query Interface Realtime/Historical Queries
DSE Feed Processing & Storage Shared Memory Entity Feed Handler Messages from data source Transform To Internal Format IBM 10 minute windows : 10 : 20 : 30 Update Metadata Store Events Apply business logic & publish Publish derived data – Aggregates, etc… : 40 : 50 1: 00 00
DSE Persistence § Standard Relational Tables § Master Detail Schema – Symbol Table (Entities & Metadata) – Tick Table (Events) dse. trade_symbols Symbol_name symbol_id IBM 1 dse. trade_ticks HPQ 2 symbol_id tstamp price volume 1 Jan 1, 2004 12: 00 90. 2 1000 1 Jan 1, 2004 12: 00: 01 90. 3 500 2 Jan 1, 2004 12: 00 21. 7 700 2 Jan 1, 2004 12: 00: 04 21. 6 200
IBM Software Group Dynamic Warehousing Every Person, Every Transaction, Every Asset… Unstructured Information, Extracted Knowledge Heterogeneous Content Search and Text Analytics Extended Data Warehouse Capabilities Mixed Workload Performance Scalability & Configurability Analytics as Part of a Business Process Management In-line Analytics Real-time Access, In-context Information Integration Master Data Management Industry Specific Models 17
IBM Information Management Extended Insight Introducing IBM Omni. Find Analytics Edition § Rich analysis interface for combining structured and unstructured data § Combines search, text analytics and data visualization Unstructured analytics framework Original Data Category Structured Data Extracted metadata Call Taker: James Date: Aug. 30, 2002 Duration: 10 min. Customer. ID: ADC 00123 [Call Taker] James [Date] 2002/08/30 [Duration] 10 min. [Customer. ID] ADC 00123 Q: I do not know how to install an additional harddisk in Net. Vista. I need quick support. [product] harddisk [product] Net. Vista [request] install [service] support Unstructured data Linguistic analysis Analysis tools Item Search, visualization and interactive mining ng e i in in M ng e © 2007 IBM Corporation
MDM and Data Warehousing § Master Data Management (MDM) and Data Warehousing (DW) complement each other; they have significant synergies – MDM and DW provide quality data to the business but MDM is valuable beyond the DW for 2 reasons • Latency • Feedback – MDM and DW have different use cases • MDM provides a “golden” source of truth that is used collaboratively for authoring, operationally in the transactional / operational environment and supports the delivery of "quality" Master Data to a DW system • DW systems are a multidimensional collection of historical transactional data that may be include than Master Data used to determine trends and create forecasts • Introducing MDM enhances the value of existing DWs by improving data integrity and closing the loop with transaction systems Analytic Services (DW Models, Identity Services & Predictive Analytics ) Services Metadata Data
IBM Software Group Fin
IBM Software Group Sujets de recherche
IBM Master Data Management Data Federation § Applicable MDM Services allow for federation of data from the MDM domains as well as additional sources Requesting Application MDM Service Request DATA Response includes MDM data augmented with data from other sources § Thus providing the requesting application with all relevant data in synchronized manner § Integrated with IBM DB 2 Information Integrator § Example: – Requesting application submits a request for the MDM “Get. Party” service; MDM is configured to initiate retrieval data from a nonmaster data source using DB 2 Data Integrator; this data is included in the response to the requesting application; the data federation activity is transparent to the requesting application Information Server DB 2 Database(s) Other Database(s)
IBM MDM – Common Components (1/2) IBM MASTER DATA MANAGEMENT
IBM MDM – Common Components (2/2) Integration Services IBM MASTER DATA MANAGEMENT Lifecycle Management Services Hierarchy & Relationship Management Master Data Event Management Information Integrity Authoring Base Services Master Data Repository Metadata Reference Data Master Data History Data
IBM Master Data Management Sophisticated data integration faster implementation time and lower cost of ownership than competitors Source Systems IBM Master Data Management Industry SOA Business Processes IBM Information Server Collaborative MDM Operational MDM Analytical MDM Understand Clean Customer Product Supplier Customer / Shipping Location Account Event Management Data Quality Management Data Governance Transform Deliver
IBM Software Group De l’Entrepôt de données à l’Entrepôt d’Information L’ Entrepôt d’Information est un entrepôt d’entreprise qui est en mesure de fournir la bonne version de l’information (Single version of the truth) dans son contexte élargi et hébergée dans une base de données unique évolutive. Entrepôt Information Integration SOA Industry Models & Solutions Integrated Data Warehouse Mining Master Data Management ETL OLAP Enterprise Data Warehouse Analyse d’Entité In-Line Analytics
IBM Software Group L’entrepôt d’information Data Architect BI Designer Integrated Design Center DBA Data Modelin g Data Transfor m Data Mining OLAP Enableme nt In-Line Analytic s DSS Applications Web-based Administration Console BI Specialist DB 2 UDB ESE Entreprise Information Integration Triggers MQs ODS Text Plug In Annotator Event Categorization Stored Procs Find Words & Roots Predict DFs Identify Language EDW Web. Sphere II Omni. Find Edition Sear ch Inde x Extracted Metadata and Facts Data Wareh ouse Rule s Engi ne Search Application Reports Any Application
IBM Software Group Le service Information Du mode projet à une architecture flexible (SOA) Outils & Applications Tableaux de bord Temps réel et Flux Intelligence Basé sur les Standards : e. g. , XQuery, JSR 170, JDBC, Web Services. . . Information Données & Contenu Information as a Service Gestion des méta-données Heterogeneous Applications & Information DB 2 abc… IBM Content Manager Temps réel : e. g. , Aide en ligne adaptée, Synchronisation de données de réference … Extracteion: e. g. Basel II, Optimisation Business … Et plus … xyz… Oracle
IBM Software Group Sujet 1 : UIMA Collection Processing Engine (CPE) Aggregate Analysis Engine CAS Consumer Analysis Engine Text, Chat, Email, Audio, Video CAS Consumer Annotator Collection CAS Consumer Reader CAS Initializer CAS Analysis Engine CAS Annotator Identify Relevant Entities → Build Structure §People, Places, Organizations, Relationships §Parts, Problems, Conditions §Topics, Products, Interests, Sentiment §Times, Events, Threats, Plots, Associations Ontologies Search Engine Index DBs Knowledge Bases
IBM Software Group How Omni. Find Enables UIMA Solutions Provides a supported UIMA implementation to deliver text analytics capabilities Crawlers Parsing Omni. Find Index Parts of Speech Find Words & Roots Text Identify Language Searching Base Annotators Omni. Find Enhanced Metadata Indexing
IBM Software Group How Omni. Find Enables UIMA Solutions Provides a supported UIMA implementation to deliver text analytics capabilities Omni. Find Identify Relationships Named-entity extraction Parts of Speech Find Words & Roots Text Identify Language Collection Processing Engine Third Party Annotators Omni. Find Index External Data Store Enhanced Metadata Third Party Applications
IBM Software Group Sujet 2 : Stockage XML dans DB 2 9 Pure. XML Omni. Find Identify Relationships Named-entity extraction Parts of Speech Find Words & Roots Text Identify Language Collection Processing Engine Third Party Annotators Omni. Find Index Enhanced Metadata Third Party Applications
IBM Software Group Reference Architecture for Event-Driven Middleware DB Tradeoffs for Event-Handling Data Source 1 … Data Source N Intelligent, Timedependent, Pub/Sub, and Routing Hub 1. Latency for Consistency 2. Throughput for Persistence DBMS ETL Short-term storage Data Warehouse App 1 … Longterm storage ESB responsible for: App N § High-throughput data handling § Low-latency messaging and routing 39
IBM Software Group Requirements for Event-Driven Applications Event Throughput Responsiveness (events/sec /server) Hard real-time Richness 100, 000’s Near real-time 100, 000’s endpoints - Trained patterns 10, 000’s (scheduled, ms) Scalability Internet scale: - Untrained patterns Soft real-time Tools for integrating content behavior models Collaborating domains 1000’s Integration with processes, workflows Tools for distributed deployment Managed ESB with event services 100’s Tools for designing event flow Event server clusters OLTP General multi-stream pattern specifications OLAP Sequences, thresholds, groups Simple event pattern tool support Single server 10’s 1’s Message at a time filter/route (< sec) Transactional Data Mining Data Warehouse Ease of Use Inductive reasoning (deterministic, us) Increasing Capability Event Processing Language
IBM Software Group Middleware for Time-Dependent Internet Traffic Event Throughput Responsiveness (events/sec /server) Hard real-time Richness 100, 000’s Scalability Internet scale: - Untrained patterns 100, 000’s endpoints - Trained patterns Soft real-time Tools for integrating content behavior models Collaborating domains 1000’s Internet Traffic Integration with processes, workflows Tools for distributed deployment Managed ESB with event services 100’s Tools for designing event flow Event server clusters OLTP General multi-stream pattern specifications OLAP Sequences, thresholds, groups Simple event pattern tool support Single server 10’s 1’s Message at a time filter/route 10, 000’s (scheduled, ms) Near real-time (< sec) Transactional Data Mining Data Warehouse Ease of Use Inductive reasoning (deterministic, us) Increasing Capability Event Processing Language
IBM Software Group Middleware for RFID Applications Event Throughput Responsiveness (events/sec /server) Hard real-time Increasing Capability Richness Ease of Use Inductive reasoning (deterministic, us) 100, 000’s Soft real-time 10, 000’s (scheduled, ms) Near real-time Scalability Internet scale: - Untrained patterns 100, 000’s endpoints - Trained patterns Tools for integrating Collaborating domains RFID for retail, distribution, content behavior models manufacturing 1000’s Integration with processes, workflows Tools for distributed deployment Managed ESB with event services 100’s Tools for designing event flow Event server clusters OLTP General multi-stream pattern specifications OLAP Sequences, thresholds, groups Simple event pattern tool support Single server 10’s 1’s Message at a time filter/route (< sec) Transactional Data Mining Data Warehouse Event Processing Language
IBM Software Group Middleware for Surveillance Applications Event Throughput Responsiveness (events/sec /server) Hard real-time Increasing Capability Richness Ease of Use Inductive reasoning (deterministic, us) 100, 000’s Soft real-time 10, 000’s (scheduled, ms) Near real-time Scalability Internet scale: - Untrained patterns 100, 000’s endpoints - Trained patterns Tools for integrating Collaborating domains Surveillance Markets content behavior models 1000’s Integration with processes, workflows Tools for distributed deployment Managed ESB with event services 100’s Tools for designing event flow Event server clusters OLTP General multi-stream pattern specifications OLAP Sequences, thresholds, groups Simple event pattern tool support Single server 10’s 1’s Message at a time filter/route (< sec) Transactional Data Mining Data Warehouse Event Processing Language
IBM Software Group Middleware for Financial Services Event Throughput Responsiveness (events/sec /server) Hard real-time Increasing Capability Richness Ease of Use Inductive reasoning (deterministic, us) 100, 000’s Soft real-time 10, 000’s (scheduled, ms) Near real-time Scalability Internet scale: - Untrained patterns 100, 000’s endpoints - Trained patterns Tools for integrating Financial market information and content behavior models program trading Collaborating domains 1000’s Integration with processes, workflows Tools for distributed deployment Managed ESB with event services 100’s Tools for designing event flow Event server clusters OLTP General multi-stream pattern specifications OLAP Sequences, thresholds, groups Simple event pattern tool support Single server 10’s 1’s Message at a time filter/route (< sec) Transactional Data Mining Data Warehouse Event Processing Language
IBM Software Group Intelligence Application Daily Internet Traffic Volume 2002: 23 PB 2007: 647 PB (est. ) Email C o n t r o l Intelligence (applied knowledge) Knowledge (fact relationships) Information (facts) Data (streams) Signal (sensors) 1999: 610 Billion Emails (11 PB) 2002: 11 Trillion Emails 2006: 22 Trillion Emails (est. ) Telephony 2002: 187 Billion minutes Emerging Vo. IP Instant Messaging 2002: 41 Million users E-mail, Voice, Image, Video, IMS, TV/Radio Broadcast, Web Traffic, etc. 2003: 275 Million users
IBM Software Group Streaming Data Example: Soccer Events Informix Real-time Loader (RTL/DSE) In-Memory Database Informix Dynamic Server Periodic writes to database DB 2 SQL queries Timestamped data history § Ball, players, and referees are RF tagged (26 transmitters) § Position and speed data are streamed to RTL/DSE (± 1. 5 cm, 100 K messages/s) § RTL/DSE stores time-stamped data in database at the rate of 7 K-12 K messages/sec § Prototyped and planned for use in World Cup Soccer 2006
IBM Software Group
- Slides: 40