Mirror Outlier Detection in Foreign Trade Data Markos

  • Slides: 13
Download presentation
Mirror Outlier Detection in Foreign Trade Data Markos Fragkakis NTTS 2009

Mirror Outlier Detection in Foreign Trade Data Markos Fragkakis NTTS 2009

Introduction Foreign Trade data Improvement of FT quality is essential Quality can be assessed

Introduction Foreign Trade data Improvement of FT quality is essential Quality can be assessed using several dimensions (e. g. accuracy, timeliness, clarity) We focus on accuracy using outlier detection Methods for outlier detection (e. g. threshold, model based) Presentation of the Mirror Outlier Detection application 2

Methodology Univariate detection in time series (value, quantity, supplementary quantity) Median Absolute Deviation Robust

Methodology Univariate detection in time series (value, quantity, supplementary quantity) Median Absolute Deviation Robust ◦ median, not mean ◦ non-parametric 3

Mirror Outlier Detection Characterization of outliers according mirror flow. Possible outlier types: ◦ Green:

Mirror Outlier Detection Characterization of outliers according mirror flow. Possible outlier types: ◦ Green: outlier appears in mirror (same sign) ◦ Red: outlier does not appear in mirror ◦ Violet: outlier appears in mirror (opposite sign) ◦ Black: mirror series not present ◦ Pink: mirror series not present (confidentiality) 4

Additional functionalities Outlier classification (error in dimension, not observed values) ◦ Swapping of observation

Additional functionalities Outlier classification (error in dimension, not observed values) ◦ Swapping of observation between series ◦ Copy of observations ◦ Time delay (hidden green outlier) Outlier detection in short series (product code changes) Reporting for ◦ Detected outliers per country (e-mailed) ◦ Summary reporting 5

Example of detected outlier 6

Example of detected outlier 6

Example of error due to swap 7

Example of error due to swap 7

Error due to time delay 8

Error due to time delay 8

Technical Information MOD-DB has RDBMS repository for storing outlier data (support for Oracle, My.

Technical Information MOD-DB has RDBMS repository for storing outlier data (support for Oracle, My. SQL). Implemented in Java (portability, maintainability) Command Line Interface Performance issues ◦ Large volume of data cause bottleneck in DB ◦ Storage is in question (several GBs per month) 9

Architecture 10

Architecture 10

Proposal for new platform Use a multi dimensional viewer Enable OLAP functions (slice, dice,

Proposal for new platform Use a multi dimensional viewer Enable OLAP functions (slice, dice, rollup drilldown) Create dynamic charts from data Estimated variables (indices from raw outlier data) Data mining could be performed for extracting inferences from data ◦ Log linear models Pin-point values of poor data involving high 11

Conclusions Use of mirror flow for outlier chacterisation New features Improving quality Enable building

Conclusions Use of mirror flow for outlier chacterisation New features Improving quality Enable building new platform for data exploration Expansions of MOD to other FT data outside EU, other domain. 12

Questions Thank you for your attention 13

Questions Thank you for your attention 13