Data Quality Opportunities Data and Examples 1 Data

  • Slides: 26
Download presentation
Data Quality: Opportunities, Data, and Examples 1

Data Quality: Opportunities, Data, and Examples 1

Data Woes We are agents of CHANGE! The Kübler-Ross grief cycle …roller-coaster ride of

Data Woes We are agents of CHANGE! The Kübler-Ross grief cycle …roller-coaster ride of activity and passivity as the person wriggles and turns in their desperate efforts to avoid the change. 2

3

3

Better and More Data – Level of analysis • Take a quick look at

Better and More Data – Level of analysis • Take a quick look at what/why use data • Linking data from disparate and third party sources – Explore data types – Typical issues & Tricks • • Cross validation and sourcing Reverse Look-up GIS layering Backfill from text correlated to codes – Information from operations • Text analytics 4

General Organizational Overview An information business focused on risk taking. Make. Sell. Serve. Sales

General Organizational Overview An information business focused on risk taking. Make. Sell. Serve. Sales and Distribution Underwriting Risk Selection and Pricing Portfolio Management Premium Adequacy Billing and Collections Management 5 Producer Segmentation Market Planning Revenue Forecasting Cross sell and Up sell Retention and Profitability Claims Payment Accuracy Claim Collaboration > Fraud Detection > Subrogation > Risk Transfer > 3 rd Party Deductible > Reinsurance Recoverable

Same Problems – Different Lines of Business • Personal – Auto, HO, Umbrella •

Same Problems – Different Lines of Business • Personal – Auto, HO, Umbrella • Small Commercial – BOP, CPP • Middle Market Commercial – CPP w/GL, CP, Crime, CIM, • • • 6 B&M, WC, Auto Large Commercial Accounts Commercial Auto Workers Comp Umbrella/Excess Specialty Lines – D&O, EPL, E&O, Farm, FI

Data Types and Forms Structured data Semi-structured data Unstructured data Text Spatial Pictographic Graphic

Data Types and Forms Structured data Semi-structured data Unstructured data Text Spatial Pictographic Graphic Voice Video 7

Multiple Data Systems which must be pulled together for analysis. Great opportunity for cross-validation

Multiple Data Systems which must be pulled together for analysis. Great opportunity for cross-validation and sourcing Vendors/Partners Archive, Legacy Systems Current System Claim Medical Data - Bill Review - PPO - Case Management - Paradigm Data External Data Policy Multiple Underwriting Systems ACTIONS Multiple States Billing Systems Finance Systems CRM Systems, other data • Identify Data Systems • Get right data from right systems • Overcome internal Organizational Barriers • Bridge to legacy systems and archived data • Augment to create rich data mining environment • Expect the need to negotiate for resources 8

Some typical external data sources and vendors Dun & Bradstreet Experian Bureau of Labor

Some typical external data sources and vendors Dun & Bradstreet Experian Bureau of Labor and Statistics Market Stance AM Best Equifax US Census Claritas Melissa Data ISO GIS vendors U&C Data sets Code Sets for ICD-s and CPT’s … 9

Data Glitches – historical and on-going Systemic changes to data not process related –

Data Glitches – historical and on-going Systemic changes to data not process related – Changes in data layout / data types – Changes in scale / format – Temporary reversion to defaults – Missing and default values – Gaps in time series 10

Process Reasons for poor data entry 11

Process Reasons for poor data entry 11

Defining Issues-sample Source Data 1 -Define Issues 12

Defining Issues-sample Source Data 1 -Define Issues 12

MORE ISSUES… Mapping across sources: Same Fact, Different Terms Data Element Concept Name: Country

MORE ISSUES… Mapping across sources: Same Fact, Different Terms Data Element Concept Name: Country Identifiers Context: Definition: Unique ID: 5769 Conceptual Domain: Maintenance Org. : Steward: Classification: Registration Authority: Others Algeria Belgium China Denmark Egypt France. . . Zimbabwe Data Elements Name: Context: Definition: Unique ID: 4572 Value Domain: Maintenance Org. Steward: Classification: Registration Authority: Others 13 Algeria L`Algérie DZ DZA 012 Belgium Belgique BE BEL 056 China Chine CN CHN 156 Denmark Danemark DK DNK 208 Egypte EG EGY 818 France La France FR FRA 250 . . . . Zimbabwe ZW ZWE 716 ISO 3166 French Name ISO 3166 2 -Alpha Code ISO 3166 3 -Numeric Code ISO 3166 English Name

Data Filling • • • 14 Manual Statistical Imputation Temporal Spatial-temporal

Data Filling • • • 14 Manual Statistical Imputation Temporal Spatial-temporal

Geographic Hierarchy 15

Geographic Hierarchy 15

Deriving Data = Power Ø Ø Ø Ø Ø 16 Totals: Household Income Trends:

Deriving Data = Power Ø Ø Ø Ø Ø 16 Totals: Household Income Trends: Rate of Medical Bill Increases Ratios: Claims/Premium, Target/Median Friction: Level of inconvenience, ratio of rental to damage Sequences: Lawyer-Doctor, Auto-Life Policy Circumstances: Minimal Impact Severe Trauma Temporal: Loss shortly after adding collision Spatial: Distance to Service, proximity of stakeholders Logged: Progress Notes, Diaries, Ø Who did it, When, “Why”

Deriving Data = Power (Cont’d) Ø Ø Ø Ø Ø 17 Behavioral: Deviation from

Deriving Data = Power (Cont’d) Ø Ø Ø Ø Ø 17 Behavioral: Deviation from past usage, spike buying Experience Profiles: Vendor, Doctor, Premium Audit Channel: How applied, How reported, Service Chain Legal Jurisdiction: Venue Disposition, Rules Demographics: Working, Weekly wage, lost income Firmographics: Industry Class Code Vs Injuries Claimed Inflation: Wage, Medical, Goods, Auto, COLA Gov’t Statistics: Crime Rate, Employment, Traffic Other Stats: Rents, Occupancy, Zoning, Mgd Care

“Search” versus “Discover” Structured Data Unstructured Data (Text) 18 Search (goal-oriented) Discover (opportunistic) Data

“Search” versus “Discover” Structured Data Unstructured Data (Text) 18 Search (goal-oriented) Discover (opportunistic) Data Retrieval Data Mining Information Retrieval Text Mining

Searching Input Value [Jim] Jimmy Jim James Word Replacement Lists Transformed Input Value [JAMES]

Searching Input Value [Jim] Jimmy Jim James Word Replacement Lists Transformed Input Value [JAMES] 19 JAMES Returns “Similar Matches” All Records Found: Jimmy Jim James

Motivation for Text Mining • • Approximately 90% of the world’s data is held

Motivation for Text Mining • • Approximately 90% of the world’s data is held in unstructured formats (source: Oracle Corporation) Information intensive business processes demand that we transcend from simple document retrieval to “knowledge” discovery. 10% 90% 20 Structured Numerical or Coded Information Unstructured or Semi-structured Information

Convergence of Disciplines Example 21

Convergence of Disciplines Example 21

Techniques for attacking text data: ØRules-based ØStatistical Text Analysis and Clustering ØLinguistic and Semantic

Techniques for attacking text data: ØRules-based ØStatistical Text Analysis and Clustering ØLinguistic and Semantic Clustering ØSupport Vector Machines ØPattern Matching or other statistical algorithms ØNeural Networks ØCombination of methods from above Text is like a data iceberg 22

Claims processing – Progress notes and Diaries Service • Medical Management Staff • Special

Claims processing – Progress notes and Diaries Service • Medical Management Staff • Special Investigation Unit • NICB • Vendor Management • Consulting Engineers • Hearing Representative • Structured Settlement Unit • Recovery Staff • Legal Staff 23 CLAIMS ADJUSTER • Home Office Staff • Field Office Claim Staff • Insured Risk Manager • Agent or Broker • Diary forward – “call Dr Jones next week” • Business Rule – large loss review • System Reminder – update case reserves • Correspondence Tracking – legal letter sent

Semantic processing: Named Entity Extraction • Identify and type language features • Examples: •

Semantic processing: Named Entity Extraction • Identify and type language features • Examples: • People names • Company names • Geographic location names • Dates • Monetary amount • Phone #, zipcodes, SSN, FEIN • Others… (domain specific) 24

Feedback to UW 25

Feedback to UW 25

Data Quality: Opportunities, Data, and Examples 26

Data Quality: Opportunities, Data, and Examples 26