Robustness for Highdimensional Data Rob HD 2004 Vorau

  • Slides: 58
Download presentation
Robustness for High-dimensional Data Rob. HD 2004, Vorau, May 5 th-8 th 2004 Signal

Robustness for High-dimensional Data Rob. HD 2004, Vorau, May 5 th-8 th 2004 Signal Extraction from Time Series Roland Fried Ursula Gather Universidad Carlos III Universität Dortmund de Madrid, Spain Germany

Hemodynamic variables of a critically ill patient arterial pressures, pulmonary artery pressures, central venous

Hemodynamic variables of a critically ill patient arterial pressures, pulmonary artery pressures, central venous pressure, heart rate, pulse, temperature 300 250 200 150 100 50 0 0 500 1000 1500 2000 2500 Time [Minutes] 3000 3500

Goals of Clinical Data Analysis • Intelligent Alarm Systems • Clinical Decision Support •

Goals of Clinical Data Analysis • Intelligent Alarm Systems • Clinical Decision Support • • • Signal extraction Outlier detection / classification Level shift / trend detection Dimension reduction Fast, automatic algorithms Good clinical interpretability

 • Signal extraction from univariate (yt) Signal + noise model Signal smooth with

• Signal extraction from univariate (yt) Signal + noise model Signal smooth with a few sudden shifts Observational noise symmetric mean zero Spiky noise measurement artifacts … Moving window (yt-m, …, yt, … , yt+m) of width n=2 m+1 for approximation of Choice of m: bias, time delay admissible variance, robustness, computation time

Location based filtering Heart rate , running mean and running median (length 31) 85

Location based filtering Heart rate , running mean and running median (length 31) 85 80 75 70 65 60 Problems: 0 50 100 • Running mean not robust • Running median not smooth 150 Time [min] 200 250

 • Robust Linear Regression Extract local linear trend – Least median of squares:

• Robust Linear Regression Extract local linear trend – Least median of squares: – Repeated median: – L 1 Regression: from

Simulations: MSE for Level Approximation different – numbers of outliers and – sizes of

Simulations: MSE for Level Approximation different – numbers of outliers and – sizes of • outliers • level shifts • trends (Davies, Fried, Gather 2004) L 1 best RM best LMS best

60 65 70 75 80 85 Application: heart rate 0 50 100 Time 150

60 65 70 75 80 85 Application: heart rate 0 50 100 Time 150 200 250 LMS not stable, Repeated Median and L 1 similar

 • Modified Double Window (DW) Filters Double Window Modified Trimmed Mean Filter (Lee,

• Modified Double Window (DW) Filters Double Window Modified Trimmed Mean Filter (Lee, Kassam, 1985) Regression based analogues 1) Apply RM to (yt-k, …, yt+k) 2) Estimate st from regression residuals 3) Trim all yt+i with large regression residuals in (yt-m, …, yt+m) 4) Apply least squares or repeated median TRM Filter MRM Filter

Removal of spikes Number of spikes which can be dampened Med m RM MTM

Removal of spikes Number of spikes which can be dampened Med m RM MTM TRM MRM m-1 k k-1 Number of spikes which can be removed completely if Med RM MTM TRM MRM Steady state m m-1 k k-1 Trend period 0 m-1 0 k-1 Length of outlier patches important for choice of inner window

100 Efficiency for Gaussian Noise, m=10 0 20 40 60 80 Location based: Median

100 Efficiency for Gaussian Noise, m=10 0 20 40 60 80 Location based: Median MTM, k=10 MTM, k= 5 Regression: RM MRM, k= 7 TRM, k= 7 0. 0 0. 1 0. 2 0. 3 0. 4 0. 5 slope Efficiency of modified filters high

Shift Preservation 5 4 3 3 2 2 1 1 2 Slope 4 6

Shift Preservation 5 4 3 3 2 2 1 1 2 Slope 4 6 b= 0. 0 8 10 number of outliers 0 4 DW: MTM MRM TRM for outliers at the end of the window (max. for outliers of sizes 1, … , 10) 5 Med MTM RM 0 Max. 2 4 6 b= -0. 5 Location based filters blur shifts during trends, double window filters reduce blurring of shifts 8 10

-5 0 5 10 15 Series with outliers, shifts and trends 0 50 100

-5 0 5 10 15 Series with outliers, shifts and trends 0 50 100 150 200 250 300 Time RM smooth, MTM good at shifts, TRM compromise

 • Hybrid Filter FIR-Median Hybrid (FMH) Filter (Heinonen and Neuvo, 1987, 1988) Predictive

• Hybrid Filter FIR-Median Hybrid (FMH) Filter (Heinonen and Neuvo, 1987, 1988) Predictive FMH Filter, M=3 Combined FMH Filter, M=5 Predictive / Combined Repeated Median Hybrid (RMH) Filter Half-window averages half-window medians One-sided linear predictors one-sided repeated medians

Removal of spikes Number of spikes which can be dampened SM m RM PFMH

Removal of spikes Number of spikes which can be dampened SM m RM PFMH CFMH PRMH CRMH m-1 1 1 Number of spikes which can be removed completely if SM RM PFMH CFMH PRMH CRMH Steady state m m-1 1 1 Trend period 0 m-1 1 0 Hybrid filters more limited than DW filters RMH filters more robust than FMH filters

100 Efficiency for Gaussian Noise, m=10 0 20 40 60 80 Location based: Median

100 Efficiency for Gaussian Noise, m=10 0 20 40 60 80 Location based: Median Regression: RM FMH: PFMH CFMH RMH: PRMH CRMH 0. 0 0. 1 0. 2 0. 3 0. 4 0. 5 slope Hybrid filters less efficient than DW filters RMH filter almost as efficient as FMH filter

Shift Preservation Slope 5 3 4 4 2 3 2 2 4 6 b=

Shift Preservation Slope 5 3 4 4 2 3 2 2 4 6 b= 0. 0 8 10 number of outliers 0 1 1 0 Med RM FMH: PFMH CFMH RMH: PRMH CRMH for outliers at the end of the window (max. for outliers of sizes 1, … , 10) 5 Max. 2 4 6 8 b= -0. 5 Combined hybrid filters blur shifts in trends slightly, but even less than DW filters 10

-5 0 5 10 15 20 Series with outliers, trends, shifts and extremes 0

-5 0 5 10 15 20 Series with outliers, trends, shifts and extremes 0 50 100 150 200 250 300 Time RM smooth, PFMH preserves extremes, PRMH more robust

 • Signal extraction from d-variate (Yt) (e. g. Peña & Box, 1987) Factor

• Signal extraction from d-variate (Yt) (e. g. Peña & Box, 1987) Factor Model: Process of r latent variables 10 vital parameters: - Matrix of loadings Results Series of extracted factors time Series of “model errors’’

Conclusion • Extraction of time-varying mean from contaminated time series by robust regression •

Conclusion • Extraction of time-varying mean from contaminated time series by robust regression • LMS very robust, but slow and unstable Repeated median robust, fast and stable • Double window and hybrid filters improve preservation of shifts and extremes • Adaptive window width selection Full online version Multivariate signal extraction

References Bernholt, T. , Fried, R. (2003). Computing the update of the repeated median

References Bernholt, T. , Fried, R. (2003). Computing the update of the repeated median regression line in linear time. Information Processing Letters 88, 111 -117. Davies, P. L. , Fried, R. , Gather, U. (2004). Robust signal extraction for on-line monitoring data. J. Statistical Planning and Inference, 122, 65 -78. Fried, R. (2004). Robust filtering of time series with trends. J. Nonparametric Statistics, to appear. Fried, R. , Bernholt, T. , Gather, U. (2004). Repeated median and hybrid filters. Technical Report 10/2004, SFB 475, University of Dortmund, Germany. Gather, U. , Fried, R. (2004 a). Robust scale estimation for local linear temporal trends. Tatra Mountains Mathematical Publications, 26, 87 -101. Gather, U. , Fried, R. (2004 b). Methods and algorithms for robust filtering. Proceedings of COMPSTAT 2004, to appear.

Statistical demands Analysis must Procedure needs • work automatically unique solution • work online

Statistical demands Analysis must Procedure needs • work automatically unique solution • work online low computation time • resist measurement artifacts high breakdown point (many data situations possible) low bias curves • attenuate observational noise satisfactory efficiency No claim of optimality, compromise needed

Desirable properties • Noise attenuation: efficiency • Stability: continuity • Removal of spikes: exact

Desirable properties • Noise attenuation: efficiency • Stability: continuity • Removal of spikes: exact fit, robustness • Preservation of shifts and extremes: exact fit, robustness • Trend tracking: invariance • Online analysis: fast algorithms

Computation time (millisec. ) window width n=2 m+1 – Least median of squares: O(n

Computation time (millisec. ) window width n=2 m+1 – Least median of squares: O(n 2) – Repeated 2) O(n median: online: O(n) m=10 m=15 5. 40 11. 15 2. 60 0. 62 4. 50 0. 83 2. 40 4. 70 (Bernholt, Fried 2003) – L 1 Regression: O(n log n)

Finite Sample Replacement Breakdown Point Define k* TL 2 TL 1 TRM TLMS n=21

Finite Sample Replacement Breakdown Point Define k* TL 2 TL 1 TRM TLMS n=21 1 7 10 10 n=31 1 10 15 15 k*: smallest number of contaminated observations which can cause a spike of any size in the extracted signal

Robustness Smallest number k* of contaminated observations which can cause a spike of any

Robustness Smallest number k* of contaminated observations which can cause a spike of any size in the extracted signal where k* L 2 L 1 RM LMS n=21 1 7 10 10 n=31 1 10 15 15

Finite-sample efficiency relatively to L 2 (MSE) % 80 70 60 50 40 Med

Finite-sample efficiency relatively to L 2 (MSE) % 80 70 60 50 40 Med 30 L 1 20 RM 10 LMS Slope Width 0. 00 0. 10 0. 05 n=11 n=21 n=31 n=61 Rep. median and L 1 regression never much worse than median

Finite-sample Efficiency w. r. t. L 2 (MSE) % 80 70 60 50 40

Finite-sample Efficiency w. r. t. L 2 (MSE) % 80 70 60 50 40 Med 30 L 1 20 RM 10 LMS Slope Width 0. 0 0. 1 n=21 0. 0 0. 1 n=31 0. 0 0. 1 n=61 Rep. median and L 1 regression never much worse than median

Performance when outliers present MSE for Level shift of size number of outliers LMS

Performance when outliers present MSE for Level shift of size number of outliers LMS << Repeated median < L 1 regression LMS often better for large outliers Median good only for negligible slope

Application: heart rate Time series and level approximates LMS not stable, Repeated Median and

Application: heart rate Time series and level approximates LMS not stable, Repeated Median and L 1 similar

 • Outlier Replacement General strategy to improve repeated median Given and Prediction residual

• Outlier Replacement General strategy to improve repeated median Given and Prediction residual if `Optimal choice´ of d 0 and d 1 ? Special Cases: d 0 > d 1 = 0 : Trimming d 0 = d 1 > 0 : Winsorization

 • Online Outlier Replacement for RM Outlier region centered at Robust scale estimation

• Online Outlier Replacement for RM Outlier region centered at Robust scale estimation e. g. Rousseeuw and Croux‘ very good for shifts and inliers residuals Replace by if e. g. (Gather, Fried, 2004 a, Fried, 2004)

Scale Approximation Residuals MAD Length of the shortest half Rousseeuw and Croux‘ Qa Nested

Scale Approximation Residuals MAD Length of the shortest half Rousseeuw and Croux‘ Qa Nested scale statistic (Gather, Fried, 2003)

 • Outlier replacement Residuals MAD works fine Length of the shortest half better

• Outlier replacement Residuals MAD works fine Length of the shortest half better worst case e. g. with robust scale estimation Rousseeuw and Croux‘ Qa very good for shifts and inliers (Gather, Fried, 2004 a, Fried, 2004)

Application: Heart Rate Heart rate , LMS and RM with outlier replacement 85 80

Application: Heart Rate Heart rate , LMS and RM with outlier replacement 85 80 75 70 65 60 0 50 100 time 150 200 250

Sinusoid, 10% patchy additive N(0, 9 s) outliers Trimming with Qa LMS

Sinusoid, 10% patchy additive N(0, 9 s) outliers Trimming with Qa LMS

 • Level Shift Detection EWMA, CUSUM, Runs etc. not robust against outliers Robust

• Level Shift Detection EWMA, CUSUM, Runs etc. not robust against outliers Robust majority rule: Compute Detect positive LS at t+j 0 if d. LS : clinically relevant threshold, e. g. d. LS=2

Simulated Time Series with Shifts LMS and RM with outlier replacement and shift detection

Simulated Time Series with Shifts LMS and RM with outlier replacement and shift detection 18 16 14 12 10 8 6 4 2 0 -2 0 50 100 150 200 250 time 300 350 400 450 500

Trend invariance Replacing yt-m, …, y 0, … , yt+m by yt-m - mb,

Trend invariance Replacing yt-m, …, y 0, … , yt+m by yt-m - mb, …, y 0, … , yt+m + mb does not change the extracted signal Invariant: RM, TRM, MRM, PFMH, PRMH Not invariant: SM, MTM, CFMH, CRMH Lipschitz continuity SM Const. 1 RM 2 k+1 PFMH CFMH PRMH CRMH 4/(k-1) Not continuous: MTM, TRM, MRM 2 k+1

100 60 20 40 efficiency 60 40 0 0 20 efficiency 80 80 100

100 60 20 40 efficiency 60 40 0 0 20 efficiency 80 80 100 Efficiency for Gaussian noise 0. 0 0. 1 0. 2 0. 3 0. 4 0. 0 0. 5 0. 1 0. 2 slope 0. 4 0. 5 0. 3 0. 4 0. 5 60 40 20 20 40 efficiency 60 80 80 100 slope 0 0 efficiency f=0. 6 0. 3 0. 0 0. 1 0. 2 0. 3 slope 0. 4 0. 5 0. 0 0. 1 0. 2 slope

100 80 80 0. 5 0. 0 0. 1 0. 2 0. 3 0.

100 80 80 0. 5 0. 0 0. 1 0. 2 0. 3 0. 4 0. 5 80 60 40 20 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 0 0. 1 0. 2 0. 3 0. 4 0. 5 slope 0 20 40 0 f=0. 6 0. 0 100 0. 4 80 0. 3 60 0. 2 100 0. 1 20 DW: MTM MRM TRM 0. 0 Med RM FMH: PFMH CFMH RMH: PRMH CRMH 40 40 20 0 Med RM MTM 60 60 Autocor. f=0. 0 100 Efficiency for Gaussian Noise Efficiency of repeated median high, of hybrid filter low

Efficiency for Gaussian Noise Hybrid filter 20 20 40 40 60 60 80 80

Efficiency for Gaussian Noise Hybrid filter 20 20 40 40 60 60 80 80 100 Modified filter 0 0 slope 0. 0 0. 1 0. 2 Med, MTM DW: MTM 0. 3 0. 4 0. 5 RM MRM TRM 0. 0 0. 1 0. 2 0. 3 0. 4 0. 5 Med, RM FMH: PFMH, CFMH RMH: PRMH, CRMH Efficiency of modified filter high, of hybrid filter low

Shift Preservation

Shift Preservation

Shift Preservation 4 3 2 2 1 0 4 6 8 2 4 6

Shift Preservation 4 3 2 2 1 0 4 6 8 2 4 6 8 10 1 2 4 6 8 10 number of outliers 0 0 b=-0. 5 2 5 10 4 8 3 6 1 DW: MTM MRM TRM 4 5 2 Med RM FMH: PFMH CFMH RMH: PRMH CRMH 2 1 0 Med MTM RM 2 3 3 4 4 5 5 Max. RMSE for increasing number of outliers at the end Slope b= 0. 0 Double window and hybrid filter reduce blurring of shifts 10

Removal of spikes Number of spikes which can be removed competely SM RM MTM

Removal of spikes Number of spikes which can be removed competely SM RM MTM TRM MRM PFMH CFMH PRMH CRMH Steady m m-1 k k-1 1 1 Trend 0 k-1 1 0 0 m-1 Number of spikes which can be dampened SM RM MTM TRM MRM PFMH CFMH PRMH CRMH m m-1 k k-1 1 1

Outlier patch in the center

Outlier patch in the center

Outlier patch in the center 10 8 6 4 2 0 6 8 10

Outlier patch in the center 10 8 6 4 2 0 6 8 10 2 4 6 8 10 10 4 2 4 6 8 10 number of outliers 0 2 4 0 b=-0. 5 2 8 10 6 8 4 6 10 4 2 DW: MTM MRM TRM 2 Med RM FMH: PFMH CFMH RMH: PRMH CRMH 6 8 8 6 4 0 Med RM MTM 2 Slope b= 0. 0 10 Max. RMSE for increasing number of outliers at the end Double window and hybrid: info about length of patches needed

Green: SM Blue: RM RED: MRMS Purple: MRM Orange: MTM Yellow: DWMTM

Green: SM Blue: RM RED: MRMS Purple: MRM Orange: MTM Yellow: DWMTM

Green: SM Blue: RM RED: TRMS Purple: MRM Orange: MTM Yellow: DWMTM Online estimates

Green: SM Blue: RM RED: TRMS Purple: MRM Orange: MTM Yellow: DWMTM Online estimates Blue RM Yellow MRM Green TRM

Hybrid filters in practice Med RM FMH: PFMH CFMH RMH: PRMH CRMH

Hybrid filters in practice Med RM FMH: PFMH CFMH RMH: PRMH CRMH

 • Robust Window Width Selection Assumption of a constant slope only locally valid

• Robust Window Width Selection Assumption of a constant slope only locally valid 5 11 { { { Example: Smoothing of a maximum by a linear fit 5 Short window width: + small time delay + preservation of extremes Adjust window width when slope changes Large window width: + better noise attenuation + robustness

Basic Algorithm for Window Width Selection Consider sign of the residuals: where nt =

Basic Algorithm for Window Width Selection Consider sign of the residuals: where nt = 2 mt+1 window width at time point t for noise with symmetric d. f. Choose 0

Sawtooth signal -5 0 5 10 overlaid by Gaussian noise and additive outliers 0

Sawtooth signal -5 0 5 10 overlaid by Gaussian noise and additive outliers 0 50 100 150 200 250 300 time Repeated median with adaptive window width, d=0. 7, 4 < mt < 16

Sawtooth signal Repeated median, m=10 -5 0 5 10 overlaid by Gaussian noise and

Sawtooth signal Repeated median, m=10 -5 0 5 10 overlaid by Gaussian noise and additive outliers 0 50 100 150 200 250 300 time Repeated median with adaptive window width, d=0. 7, 5 mt 15

20 25 30 35 40 45 50 Application: Pulmonary Artery Pressure 0 500 1000

20 25 30 35 40 45 50 Application: Pulmonary Artery Pressure 0 500 1000 1500 2000 2500 time RM with adaptive width, outlier replacement and shift detection

Loadings of rotated factors constant? Observations 1 -1000 Observations 1000 -2000 But: Outliers, artifacts,

Loadings of rotated factors constant? Observations 1 -1000 Observations 1000 -2000 But: Outliers, artifacts, . . .

Loadings for filtered time series Observations 1 -1000 Observations 1001 -2000

Loadings for filtered time series Observations 1 -1000 Observations 1001 -2000