Context Knowledge Management for Armament Safety Stuart Madnick

  • Slides: 48
Download presentation
Context Knowledge Management for Armament Safety Stuart Madnick, Lynn Wu MIT Sloan School of

Context Knowledge Management for Armament Safety Stuart Madnick, Lynn Wu MIT Sloan School of Management {smadnick, linwu}@mit. edu 1

Information Integration & Re-Use Projects Stuart Madnick (smadnick@mit. edu): Context Knowledge Management Approach to

Information Integration & Re-Use Projects Stuart Madnick (smadnick@mit. edu): Context Knowledge Management Approach to “Armament Safety Management” Total Data Quality (TDQM) Program (5) MIT Information Quality (MIT-IQ) Program Technologies Applications (account aggregation) RFID IT Infrastructure Others … COntext INterchange (COIN) (1) Security Analysis Military Logistics Data Quality System Dynamics Modeling of State Stability (4) Pros and cons Of data standards Economic model of alternatives to EU Database Directive (3) Financial Services Strategy, Policy & Legal Issues Security Stakeholder Perceptions of Security (2) 2

COntext INterchange (COIN) Project Applications Receivers OUTPUT PROCESSING CONTEXT MEDIATION INPUT PROCESSING ODBC Driver

COntext INterchange (COIN) Project Applications Receivers OUTPUT PROCESSING CONTEXT MEDIATION INPUT PROCESSING ODBC Driver * Automatic conflict detection and conversion * Automatic web wrapping - Derived data Web Publishing - Source selection - Source attribution TRUSTED AGENTS Browsers - Semistructured text -Multi-source query plan and execution APPLICATIONS: Financial services, electronic commerce, asset visibility, in-transit visibility. Web Pages Sources Data bases 3

Key COIN Technologies b Web Wrapper l Extract selected information from web (HTML+XML l

Key COIN Technologies b Web Wrapper l Extract selected information from web (HTML+XML l Allows web to be treated as large relational SQL da l Can handle dynamic web sites, cookies, “login”, etc l Performs SQL Joins & Unions involving DB’s + Web Context Mediator b l Resolve semantic (meaning) differences l Enable meaningful aggregation & comparison 4

Context: Multiple Perspectives. . . old lady or young lady ? 5

Context: Multiple Perspectives. . . old lady or young lady ? 5

Role Of Context 06 -05 -07 $ Context 05 -06 -07 ? £ Context

Role Of Context 06 -05 -07 $ Context 05 -06 -07 ? £ Context ¥ l 07 -06 -05 CONTEXT VARIATIONS: - GEOGRAPHIC ( US vs. UK ) - FUNCTIONAL (CASH MGMT vs. LOANS ) - ORGANIZATIONAL ( CITIBANK vs. CHASE ) Data: Databases Web data E-mail 6

Types of Context Representational Ontological Temporal Example Representatio Currency: $ vs € Scale factor:

Types of Context Representational Ontological Temporal Example Representatio Currency: $ vs € Scale factor: 1 vs nal 1000 Ontological Temporal Francs before 2000, € thereafter Revenue: Includes Revenue: vs excludes Excludes interest before 1994 but incl. 7

The 1999 Overture Unit-of-measure mixup tied to loss of $125 Million Mars Orbiter “NASA’s

The 1999 Overture Unit-of-measure mixup tied to loss of $125 Million Mars Orbiter “NASA’s Mars Climate Orbiter was lost because engineers did not make a simple conversion from English units to metric, an embarrassing lapse that sent the $125 million craft off course. . . . The navigators ( JPL ) assumed metric units of force per second, or newtons. In fact, the numbers were in pounds of force per second as supplied by Lockheed Martin ( the contractor ). ” Source: Kathy Sawyer, Boston Globe, October 1, 1999, page 1. 8

Context Knowledge Management for Armament Safety Motivation • • Context Knowledge Management is an

Context Knowledge Management for Armament Safety Motivation • • Context Knowledge Management is an important challenge Semantic inconsistency is present in databases even in the military. – For example, what does accident rate really mean? • • Army Ground Accident Rate: # accidents/period-of-time 1. Per year 2. Per month 3. Per total actual personnel strength 4. Per operational personnel strength How do we address such semantic inconsistencies? – How do we interpret different accident rates? – Need context knowledge management 9

Motivating Example Unit A Weapon A 123 Accident Rate 0. 01 Injury Rate 77

Motivating Example Unit A Weapon A 123 Accident Rate 0. 01 Injury Rate 77 Per week Per month per pro-rated Strength Unit B Weapon A 123 Accident Rate 0. 52 Per year Injury Rate 170 Per month personnel Strength Nuclear Test Safety Exclusion Zone 2500 Feet Nuclear Test Safety Exclusion Zone 762 Meters Contexts: Radioactivity 0. 1/week↔ 0. 52/year 77/week/prs↔ 170/ps 2500 feet ↔ 762 meters 1 curie ↔ 3. 7 x 10^10 bq 1 Curie Radioactivity Semantic heterogeneity 3. 7 x 10^10 Bq In the military, there are many ways to measure safety. 1. Accident and injury rate can be measured in per week, per month or per year basis. 2. Nuclear testing data generally uses U. S. Customary measurement system, since most of the nuclear testing has been done in the US. To conform with international standards, the US government has been slowly trying convert the units to metric system. However, even with the metric system, there is a confusion between SI units and non SI units. Disclaimer: The data above artificial and is used to for demonstration only 10

Source Context Differences Accident Rate Injury Rate Nuclear Test Safety Exclusion Zone (radius) Radioactivity

Source Context Differences Accident Rate Injury Rate Nuclear Test Safety Exclusion Zone (radius) Radioactivity Unit A Army Ground Accident Rate (per week) Active Army Military Injury Rate (per month) Meter Curie Unit B Army Ground Accident Rate (per year) USAR & Meter ARNG military Injury Rate (per month) Becquerel Unit C Army Ground Accident Rate (per week) Active Army Military Injury Rate (per week) Kilometer TBq Unit D Army Ground Accident Rate (per month) Army Civilian Employee Injury Rate (per month) Feet MBq 11

Scenario • A general wants to see a composite reports on all four units.

Scenario • A general wants to see a composite reports on all four units. – Direct queries on all four units would results incomparable data. – Without mediation, unit B seems to be doing poorly. Accident Rate Injury Rate Exposure Radioactivity Unit A 0. 01 0. 037 762 1 Unit B 2 0. 08 762 37 x 1010 Unit C 0. 05 0. 08 0. 762 0. 037 Unit D 0. 028 0. 01 2500 37. 04 12

Standardization: often not a solution • Works in small systems. • Legitimate reasons for

Standardization: often not a solution • Works in small systems. • Legitimate reasons for diversity (e. g. , different needs) multiple standards – Unit 1 uses accident rate per year – Unit 2 uses accident rate per month • Standards are costly to develop – Do. D started data standardization in 1991; by 2000, they only standardized ~1. 2% of 1 million data elements* • Standards do evolve over time – Nuclear tests used the US Customary Measurement Standard. Now it is moving toward SI standard * Rosenthal, A. , Seligman, L. and Renner, S. (2004) "From Semantic Integration to Semantics Management: Case Studies and a Way Forward", ACM SIGMOD Record, 33(4), 44 -50. 13

The Context Interchange Approach Concept: Accident Rate Per Week Per Year f() Per Week

The Context Interchange Approach Concept: Accident Rate Per Week Per Year f() Per Week Shared Ontologies Conversion Libraries Receiver Context 2 Select accident. Rate x 52 From unit. A Context Transformation 0. 01 Source Context Management Administrator Context Mediator Source Context accident Rate Per Year 1 Select accident. Rate From unit. A 3 0. 52 Receiver 14

Aggregated results in receiver context of Unit C Accident Rate (Per week) Injury Rate

Aggregated results in receiver context of Unit C Accident Rate (Per week) Injury Rate (Per week) Nuclear Test Safety Exclusion Zone (Kilometer) Radioactivity (TBq) Mediation No mediation No mediation Unit A 0. 1232 0. 01 0. 009 0. 037 0. 762 0. 037 1 Unit B 0. 038 2 0. 08 0. 762 0. 037 37 x 10^10 Unit C 0. 05 0. 08 0. 762 0. 037 Unit D 0. 07 0. 028 0. 1234 0. 01 0. 762 2500 0. 037 37. 04 15

Conclusion v Many different contexts are used to evaluate safety measurement within the military.

Conclusion v Many different contexts are used to evaluate safety measurement within the military. v Needs to have an aggregator to gather and integrate various data. v Automatic context mediation plays a critical role v Context Interchange enables meaningful aggregation v For more information: vhttp: //context 2. mit. edu/coin 16

Another Example: Regional Comparison Shoppers US Sweden France UK 17

Another Example: Regional Comparison Shoppers US Sweden France UK 17

COIN Conceptual Model (Ontology) 18

COIN Conceptual Model (Ontology) 18

Ontology and Conversion Function format temporal. Entity basic scale. Factor currency monetary. Value kind

Ontology and Conversion Function format temporal. Entity basic scale. Factor currency monetary. Value kind price tax. Rate organization Legend is_a relationship context_a currency: ‘KRW’; scale. Factor: 1000 kind: base; format: yyyy. mm. dd context_b currency: ‘TRL’; scale. Factor: 1 e 6 kind: base+tax; format: dd-mm-yyyy context_c currency: ‘USD’; scale. Factor: 1 kind: base+tax+SH; format: mm/dd/yyyy context_d is_a context_b scale. Factor: 1 e 3 attribute modifier Example source: src_turkey(Product, Vendor, Quote. Date, Price) context_e is_a context_d Format: yyyy-mm-dd context_f is_a context_c Kind: base+tax 19

Demo – Same Context No semantic differences Meaningful data returned 20

Demo – Same Context No semantic differences Meaningful data returned 20

Compose only relevant conversions (b e) (a) Select Vendor, Price From src_turkey Where Product=“Samsung

Compose only relevant conversions (b e) (a) Select Vendor, Price From src_turkey Where Product=“Samsung Sync. Master 173 P”; Conversion for scale factor (b) Select Vendor, Quote. Date, Price From src_turkey Where Product=“Samsung Sync. Master 173 P”; Conversion for scale factor Conversion for date format 21

Auto-reconciliation for auxiliary source (b f) Introduced because of context difference in auxiliary source

Auto-reconciliation for auxiliary source (b f) Introduced because of context difference in auxiliary source 22

Detection and Explication (b a) 23

Detection and Explication (b a) 23

Mediated Query (b a) Date format for receiver Price definition – remove tax Scale

Mediated Query (b a) Date format for receiver Price definition – remove tax Scale factor Date format for auxiliary source olsen Currency 24

Interoperate: hard-wired approaches (a) BFS approach: Brute-force between pair-wise sources 1 2 5 Internal

Interoperate: hard-wired approaches (a) BFS approach: Brute-force between pair-wise sources 1 2 5 Internal standard 6 5 4 (b) BFC approach: Bruteforce between contexts 6 2 1 3 6 5 (c) Internal standard approach: Adopting a standard 1 4 2 context_a currency: ‘KRW’; scale. Factor: 1000 kind: base; format: yyyy-mm-dd context_b currency: ‘TRL’; scale. Factor: 1 e 6 kind: base+tax; format: dd-mm-yyyy 3 3 4 context_c currency: ‘USD’; scale. Factor: 1 kind: base+tax+SH; format: mm/dd/yyyy 25

Flexibility and Scalability Need to update/add many conversion programs Not flexible Flexible Update the

Flexibility and Scalability Need to update/add many conversion programs Not flexible Flexible Update the declarative knowledge base. • Why other approaches cannot fully benefit from general purpose conversion? – the decision whether to invoke the conversion is in the conversion program 26

How COIN Scales • Semantic differences cannot be standardized away • Must be flexible

How COIN Scales • Semantic differences cannot be standardized away • Must be flexible and scalable • Component conversions are defined for each modifier • Overall conversions are automatically composed by abductive reasoning engine • Composition via symbolic equation solver and a shortest path algorithm • Inheritance enabled • COIN is a good solution – Modularization, declarativeness 27

The 1805 Overture In 1805, the Austrian and Russian Emperors agreed to join forces

The 1805 Overture In 1805, the Austrian and Russian Emperors agreed to join forces against Napoleon. The Russians promised that their forces would be in the field in Bavaria by Oct. 20. The Austrian staff planned its campaign based on that date in the Gregorian calendar. Russia, however, still used the ancient Julian calendar, which lagged 10 days behind. The calendar difference allowed Napoleon to surround Austrian General Mack's army at Ulm and force its surrender on Oct. 21, well before the Russian forces could reach him, ultimately setting the stage for Austerlitz. Source: David Chandler, The Campaigns of Napoleon, New York: Mac. Millan 1966, pg. 390. 28

EXTRA SLIDES 29

EXTRA SLIDES 29

Yet Another Context Example (Basis for Demo) Company Name DAIMLER-BENZ Net Income 614, 995

Yet Another Context Example (Basis for Demo) Company Name DAIMLER-BENZ Net Income 614, 995 97, 736, 992 Sales Context Mediation Services * Datastream Company Name DAIMLER-BENZ AG Net Income 346, 577 Sales 56, 268, 168 * World. Scope Company Name DAIMLER BENZ CORP Net Income 615, 000 Sales 97, 737, 000 * Disclosure O&A DEM-USD Exchange Rate 1. 00 German Mark= 0. 58 US Dollar as 12/31/93 * OANDA Web Server Users & Appl. Systems * Wrapper Services 30

Some Context Differences Context Definitions Disclosure Country of Incorporation Money Amount As_Of_Date 3 Letters

Some Context Differences Context Definitions Disclosure Country of Incorporation Money Amount As_Of_Date 3 Letters Currency Used Currency Conversion Currency Symbols Scale Factor 1 Disclosure Names Company Names American with ‘/’ as Date Style separator Worldscope USD Money Amount As_Of_Date 3 Letters Data. Stream Country of Incorporation Money Amount As_Of_Date 2 Letters 1000 Worldscope Names 1000 Data. Stream Names American with ‘/’ as separator European with ‘-’ as separator Olsen (OANDA) Web Source uses 3 Letter Currency Symbols and European Date Style with ‘/’ as a separator 31

Domain Model exchange. Rate yp T r date f company rp cou ntry ing

Domain Model exchange. Rate yp T r date f company rp cou ntry ing d n y. E format date. F cy ren cur mt off ate Inheritance Attribute Modifier country. Name Inco fro m. C ur to. C ur cu ncy icial. Curre currency. Type company. Financials string ym S e txn. D scale. Factor number company. Name Some currency context possibilities: • Currency is stated explicitly as part of record • Currency not stated, but the same for all (e. g. , US $) 32 • Currency not stated or constant, but inferred by country

COIN System Architecture SERVER PROCESSES MEDIATOR PROCESSES CLIENT PROCESSES Web Client Context Mediator HTTPD-Daemon

COIN System Architecture SERVER PROCESSES MEDIATOR PROCESSES CLIENT PROCESSES Web Client Context Mediator HTTPD-Daemon Executioner Datalog Query Mediated Query Optimized Query Plan (cgi-scripts) N HTTPD-Daemon SQL Query Optimizer Wrapper SQL Compiler SQL Query HTTPD-Daemon WWW Gateway COIN Repository N ODBC-compliant Apps Results (e. g Microsoft Excel) ODBC-Driver HTTPD-Daemon Web-site Data Store for Intermediate Results 33

System Demonstration Single Source Queries with Mediation Q 6. Scenario: Using Context Interchange, you

System Demonstration Single Source Queries with Mediation Q 6. Scenario: Using Context Interchange, you can look at the Disclosure data using Datastream Context. Query: Find out from Disclosure what Net Income for DAIMLER-BENZ was. Use Datastream Context. Capabilities Demonstrated: Ability to perform Scale Factor Conversion, Date 34 Format Conversion, Company Name

Demonstration @ context 2. mit. edu Source Context 35

Demonstration @ context 2. mit. edu Source Context 35

Context Metadata (Partial) 36

Context Metadata (Partial) 36

Conflict Detection and Mediation Mediated Query in Datalog Date convert Scale factor convert Name

Conflict Detection and Mediation Mediated Query in Datalog Date convert Scale factor convert Name convert 37

Mediated SQL Query & Result Mediated SQL Query Adjust scale factor Date format conversion

Mediated SQL Query & Result Mediated SQL Query Adjust scale factor Date format conversion Name conversion Final results – from Disclosure but in Datastream context 38

More Complex Example (4 sources: DB + Web) Databases Web source select Worldc. AF.

More Complex Example (4 sources: DB + Web) Databases Web source select Worldc. AF. TOTAL_ASSETS, Disc. AF. NET_SALES, Disc. AF. NET_INCOME, DStream. AF. TOTAL_EXTRAORD_ITEMS_PRE_TAX, quotes. Last from Worldc. AF, Disc. AF, DStream. AF, quotes where Worldc. AF. COMPANY_NAME = "DAIMLER-BENZ AG" and DStream. AF. AS_OF_DATE = "01/05/94" and Worldc. AF. COMPANY_NAME = DStream. AF. NAME 39 and Worldc. AF. COMPANY_NAME = Disc. AF. COMPANY_NAME and Worldc. AF. COMPANY_NAME = quotes. Cname;

Conflict Table (1 st part) 40

Conflict Table (1 st part) 40

Conflict Table (2 nd part) 41

Conflict Table (2 nd part) 41

Generated SQL (1 st Part) select worldcaf. total_assets, discaf. net_sales, ((discaf. net_income*0. 001)*olsen. rate),

Generated SQL (1 st Part) select worldcaf. total_assets, discaf. net_sales, ((discaf. net_income*0. 001)*olsen. rate), (dstreamaf 2. total_extraord_items_pre_tax*olsen 2. rate), quotes. Last from (select date 1, 'European Style -', '01/05/94', 'American Style /' from datexform where format 1='European Style -' and date 2='01/05/94' and format 2='American Style /') datexform, (select dt_names, 'DAIMLER-BENZ AG' from name_map_dt_ws where ws_names='DAIMLER-BENZ AG') name_map_dt_ws, (select ds_names, 'DAIMLER-BENZ AG' from name_map_ds_ws where ws_names='DAIMLER-BENZ AG') name_map_ds_ws, (select 'DAIMLER-BENZ AG', ticker, exc from ticker_lookup 2 where comp_name='DAIMLER-BENZ AG') ticker_lookup 2, (select 'DAIMLER-BENZ AG', latest_annual_financial_date, current_outstanding_shares, net_income, sales, total_assets, country_of_incorp from worldcaf where company_name='DAIMLER-BENZ AG') worldcaf, (select country, currency from currencytypes where currency <> 'USD') currencytypes, (select exchanged, 'USD', rate, date from olsen where expressed='USD') olsen, (select company_name, latest_annual_data, current_shares_outstanding, net_income, net_sales, total_assets, location_of_incorp from discaf) discaf, 42

Generated SQL (Continued - Partial) (select as_of_date, name, total_sales, total_extraord_items_pre_tax, earned_for_ordinary, currency from dstreamaf)

Generated SQL (Continued - Partial) (select as_of_date, name, total_sales, total_extraord_items_pre_tax, earned_for_ordinary, currency from dstreamaf) dstreamaf, (select as_of_date, name, total_sales, total_extraord_items_pre_tax, earned_for_ordinary, currency from dstreamaf) dstreamaf 2, (select char 3_currency, char 2_currency from currency_map where char 3_currency <> 'USD') currency_map, (select country, currency from currencytypes where currency <> 'USD') currencytypes 2, (select exchanged, 'USD', rate, '01/05/94' from olsen where expressed='USD' and date='01/05/94') olsen 2, (select Cname, Last from quotes) quotes where currencytypes. country = discaf. location_of_incorp and currencytypes. currency = olsen. exchanged and dstreamaf. currency = dstreamaf 2. currency and dstreamaf 2. currency = currency_map. char 2_currency and olsen. date = discaf. latest_annual_data and currency_map. char 3_currency = currencytypes 2. currency and currencytypes 2. currency = olsen 2. exchanged and name_map_dt_ws. dt_names = dstreamaf 2. name and name_map_ds_ws. ds_names = discaf. company_name and ticker_lookup 2. ticker = quotes. Cname and datexform. date 1 = dstreamaf 2. as_of_date and currencytypes. currency <> 'USD' and currency_map. char 3_currency <> 'USD' union select worldcaf 2. total_assets, discaf 2. net_sales, ((discaf 2. net_income*0. 001)*olsen 3. rate), dstreamaf 4. total_extraord_items_pre_tax, quotes 2. Last from (select date 1, 'European Style -', '01/05/94', 'American Style /' from datexform where format 1='European Style -' and date 2='01/05/94' and format 2='American Style /') datexform 2, (select dt_names, 'DAIMLER-BENZ AG' from name_map_dt_ws where ws_names='DAIMLER-BENZ AG') name_map_dt_ws 2, (select ds_names, 'DAIMLER-BENZ AG' from name_map_ds_ws where ws_names='DAIMLER-BENZ AG') name_map_ds_ws 2, (select 'DAIMLER-BENZ AG', ticker, exc from ticker_lookup 2 where comp_name='DAIMLER-BENZ AG') ticker_lookup 22, (select 'DAIMLER-BENZ AG', latest_annual_financial_date, current_outstanding_shares, net_income, sales, total_assets, country_of_incorp from worldcaf where company_name='DAIMLER-BENZ AG') worldcaf 2, (select country, currency from currencytypes where currency <> 'USD') currencytypes 3, (select exchanged, 'USD', rate, date from olsen where expressed='USD') olsen 3, (select company_name, latest_annual_data, current_shares_outstanding, net_income, net_sales, total_assets, location_of_incorp from discaf) discaf 2, (select as_of_date, name, total_sales, total_extraord_items_pre_tax, earned_for_ordinary, currency from dstreamaf) dstreamaf 3, (select 'USD', char 2_currency from currency_map where char 3_currency='USD') currency_map 2, etc 43

Final Result 44

Final Result 44

Execution Trace (1 st Part - Partials) Parallel Execution . . . Retrieving data

Execution Trace (1 st Part - Partials) Parallel Execution . . . Retrieving data From Web source 45

Execution Trace (Continued - Partials). . . Stock price returned From Web source Another

Execution Trace (Continued - Partials). . . Stock price returned From Web source Another Web source used (for currency conversion) . . . 46

Appendix: Sample Applications • • Airfare, Car Rental and Merged Travel Weather Global Price

Appendix: Sample Applications • • Airfare, Car Rental and Merged Travel Weather Global Price Comparison Airfare Aggregation Disaster Relief TASC Financial Example Web Services Demo Corporate Householding 47

Appendix: User or Program (via SQL Query) COIN Web-Wrapper Techno Select Edgar. Net_income From

Appendix: User or Program (via SQL Query) COIN Web-Wrapper Techno Select Edgar. Net_income From Edgar Where Edgar. Ticker=intc and Edgar. Form=10 -Q Web page spec file * SQL Side Ticker INTC Web Wrapper Generator HTML Side Net Income 1, 983 Data record returned * Spec file contains: Schema, Navigation rule 48 and Extraction rules.