Event Abstraction for Process Mining using Supervised Learning

Event Abstraction for Process Mining using Supervised Learning Niek Tax In collaboration with Natalia Sidorova Wil van der Aalst Reinder Haakma (Philips Research)

Overview • Motivating example • Method • Evaluation • Plugin • Conclusion 15 -1 SLIDE 1

Overview • Motivating example • Method • Evaluation • Plugin • Conclusion 15 -1 SLIDE 2

Smart Home Event Log • Binary sensors • Motion sensors • Open/close sensors • Power sensors • Start/end times of when each sensor sensed activity • Case notion • A day (cut-off point at midnight) 15 -1 SLIDE 3

Inductive Miner result 15 -1 SLIDE 4

Inductive Miner result on high-level abstraction of the same log 15 -1 SLIDE 5

Overview • Motivating example • Method • Evaluation • Plugin • Conclusion 15 -1 SLIDE 6

Event Abstraction Low-level log High-level log Abstract 15 -1 SLIDE 7

Problem setting Event log with annotated traces and unannotated traces Case Low-level label High-level label 1 medicine cabinet Taking medicine x dish & cups cabinet ? 1 dish & cups cabinet Taking medicine x medicine cabinet ? 1 water Taking medicine x water ? 1 dish & cups cabinet Eating x dish & cups cabinet ? 1 dishwasher Eating x medicine cabinet ? 2 dish & cups cabinet Taking medicine x water ? 2 medicine cabinet Taking medicine x dish & cups cabinet ? 2 water Taking medicine x dishwasher ? 2 dish & cups cabinet Eating x+1 dish & cups cabinet ? 2 cutlery drawer Eating x+1 medicine cabinet ? 2 dish & cups cabinet Eating x+1 water ? 2 dishwasher Eating x+1 dishwasher ? … … … 15 -1 SLIDE 8

Supervised Event Abstraction Annotated traces Low-level to High-level mapping Create Learn High-level log Unannotated traces Apply traces with high-level annotations 15 -1 SLIDE 9

Supervised Event Abstraction Annotated traces Low-level to High-level mapping Create Learn High-level log Unannotated traces Apply traces with high-level annotations 15 -1 SLIDE 10

Creating a high-level log from traces with high-level annotations Labeled traces Case Low-level label High-level label x dish & cups cabinet Taking medicine x medicine cabinet Taking medicine x water Taking medicine x dish & cups cabinet Eating x dishwasher Eating x+1 dish & cups cabinet Taking medicine x+1 medicine cabinet Taking medicine x+1 water Taking medicine x+1 dishwasher Eating … … … High-level log Case Label 15 -1 SLIDE 11

Creating a high-level log from traces with high-level annotations Labeled traces High-level log Case Low-level label High-level label Case Label x dish & cups cabinet Taking medicine x medicine cabinet Taking medicine x water Taking medicine x dish & cups cabinet Eating x dishwasher Eating x+1 dish & cups cabinet Taking medicine x+1 medicine cabinet Taking medicine x+1 water Taking medicine x+1 dishwasher Eating … … … 15 -1 SLIDE 12

Creating a high-level log from traces with high-level annotations Labeled traces High-level log Case Low-level label High-level label Case Label x dish & cups cabinet Taking medicine x medicine cabinet Taking medicine x Eating x water Taking medicine x dish & cups cabinet Taking medicine x medicine cabinet Taking medicine x water Taking medicine x dish & cups cabinet Eating x dishwasher Eating x+1 dish & cups cabinet Taking medicine x+1 medicine cabinet Taking medicine x+1 water Taking medicine x+1 dishwasher Eating … … … 15 -1 SLIDE 13

Creating a high-level log from traces with high-level annotations Labeled traces High-level log Case Low-level label High-level label Case Label x dish & cups cabinet Taking medicine x medicine cabinet Taking medicine x Eating x water Taking medicine x+1 Taking medicine x dish & cups cabinet Taking medicine x medicine cabinet Taking medicine x water Taking medicine x dish & cups cabinet Eating x dishwasher Eating x+1 dish & cups cabinet Taking medicine x+1 medicine cabinet Taking medicine x+1 water Taking medicine x+1 dishwasher Eating … … … 15 -1 SLIDE 14

Creating a high-level log from traces with high-level annotations Labeled traces High-level log Case Low-level label High-level label Case Label x dish & cups cabinet Taking medicine x medicine cabinet Taking medicine x Eating x water Taking medicine x+1 Taking medicine x dish & cups cabinet Taking medicine x+1 Eating x medicine cabinet Taking medicine … … x water Taking medicine x dish & cups cabinet Eating x dishwasher Eating x+1 dish & cups cabinet Taking medicine x+1 medicine cabinet Taking medicine x+1 water Taking medicine x+1 dishwasher Eating … … … 15 -1 SLIDE 15

The effect of classification errors Labeled traces High-level log Case Low-level label High-level label Case Label x dish & cups cabinet Taking medicine x medicine cabinet Taking medicine x Eating x water Eating x Taking medicine x dish & cups cabinet Taking medicine x Eating x medicine cabinet Taking medicine x water Taking medicine x+1 Eating x dish & cups cabinet Eating … … x dishwasher Eating x+1 dish & cups cabinet Taking medicine x+1 medicine cabinet Taking medicine x+1 water Taking medicine x+1 dishwasher Eating … … … 15 -1 SLIDE 16

The effect of classification errors Labeled traces High-level log Case Low-level label High-level label Case Label x dish & cups cabinet Taking medicine x medicine cabinet Taking medicine x Eating x water Taking medicine x dish & cups cabinet Taking medicine x+1 Eating x medicine cabinet Taking medicine … … x water Eating x dish & cups cabinet Eating x dishwasher Eating x+1 dish & cups cabinet Taking medicine x+1 medicine cabinet Taking medicine x+1 water Taking medicine x+1 dishwasher Eating … … … 15 -1 SLIDE 17

Supervised Event Abstraction Annotated traces Low-level to High-level mapping Create Learn High-level log Unannotated traces Apply traces with high-level annotations 15 -1 SLIDE 18

Learning steps Annotated traces Data set Train model Create Extract features Low-level to High-level mapping Apply 15 -1 SLIDE 19

Learning steps Annotated traces Data set Train model Create Extract features Low-level to High-level mapping Apply 15 -1 SLIDE 20

Features • Concept extension • Organizational extension • Time extension 15 -1 SLIDE 21

Concept extension Input log Case Low-level label High-level label x dish & cups cabinet Taking medicine x medicine cabinet Taking medicine x water Taking medicine x dish & cups cabinet Eating x dishwasher Eating x+1 dish & cups cabinet Taking medicine x+1 medicine cabinet Taking medicine x+1 water Taking medicine x+1 dishwasher Eating … … … 15 -1 SLIDE 22

Concept extension Input log Case Low-level label High-level label x dish & cups cabinet Taking medicine x medicine cabinet Taking medicine x water Taking medicine x dish & cups cabinet Eating x dishwasher Eating x+1 dish & cups cabinet Taking medicine x+1 medicine cabinet Taking medicine x+1 water Taking medicine x+1 dishwasher Eating … … … 15 -1 SLIDE 23

Concept extension Input log Case Low-level label High-level label x dish & cups cabinet Taking medicine x medicine cabinet Taking medicine x water Taking medicine x dish & cups cabinet Eating x dishwasher Eating x+1 dish & cups cabinet Taking medicine x+1 medicine cabinet Taking medicine x+1 water Taking medicine x+1 dishwasher Eating … … … window size 15 -1 SLIDE 24

Concept extension Input log Case Low-level label High-level label x dish & cups cabinet Taking medicine High-level label Count in whole log x medicine cabinet Taking medicine 24 x water Taking medicine Eating 1 x dish & cups cabinet Taking medicine x medicine cabinet Taking medicine x water Taking medicine x dish & cups cabinet Eating x dishwasher Eating x+1 dish & cups cabinet Taking medicine x+1 medicine cabinet Taking medicine x+1 water Taking medicine x+1 dishwasher Eating … … … 15 -1 SLIDE 25

Concept extension Input log Case Low-level label High-level label x dish & cups cabinet Taking medicine High-level label Count in whole log x medicine cabinet Taking medicine 24 x water Taking medicine Eating 1 x dish & cups cabinet Taking medicine x medicine cabinet Taking medicine x water Taking medicine x dish & cups cabinet Eating x dishwasher Eating x+1 dish & cups cabinet Taking medicine x+1 medicine cabinet Taking medicine x+1 water Taking medicine x+1 dishwasher Eating … … … Estimate multinoulli over high-level labels: P(Taking medicine | window) 24/25 P(Eating | window) 1/25 15 -1 SLIDE 26

Organizational extension • Same as concept extension, but with windows from: • Resource • Group • Role 15 -1 SLIDE 27

Time extension • Project timestamps to Time in week, day, or month • Fit Gaussian Mixture Model for each high-level label P(label) Monday Sunday 15 -1 SLIDE 28

Learning steps Annotated traces Data set Train model Create Extract features Low-level to High-level mapping Apply 15 -1 SLIDE 29

Conditional Random Field • Unstable predictions cause large errors in high-level log • Conditional Random Fields are an extension to logistic regression for sequence data (Daphne Koller, 2008) 15 -1 SLIDE 30

Overview • Method • Evaluation • Plugin • Conclusion 15 -1 SLIDE 31

K-fold Cross Validation Log with annotations 15 -1 SLIDE 32

K-fold Cross Validation Log with annotations 15 -1 SLIDE 33

K-fold Cross Validation Log with annotations Training Testing 15 -1 SLIDE 34

K-fold Cross Validation Log with annotations Copy predictions 15 -1 SLIDE 35

K-fold Cross Validation Log with annotations Copy predictions 15 -1 SLIDE 36

K-fold Cross Validation Log with annotations Copy predictions 15 -1 SLIDE 37

K-fold Cross Validation Log with annotations Copy predictions 15 -1 SLIDE 38

K-fold Cross Validation Log with annotations Predicted annotations Copy predictions 15 -1 SLIDE 39

K-fold Cross Validation Predicted annotations Predicted high-level log Create 15 -1 SLIDE 40

Levenshtein Similarity Predicted high-level log Ground truth high-level log Case Label x Taking medicine x Eating x Taking medicine 15 -1 SLIDE 41

Levenshtein Similarity Predicted high-level log Max length is 5 Ground truth high-level log Case Label x Taking medicine x Eating X Taking medicine x Eating x Taking medicine 2 deletions Levenshtein similarity = 1 – 2/5 = 0. 6 15 -1 SLIDE 42

Smart home data set • Human behavior annotations as high-level labels • XES extensions used for features • Concept: sensor that generated the event • Lifecycle: ‘start’ when the sensor switched on, ‘complete’ when it switched off • Time: the timestamp of the sensor change point • Leave-One-Day-Out Cross Validation • Levenshtein similarity of traces: 0. 7042 15 -1 SLIDE 43

Synthetic data evaluation • Artificial Digital Photocopier dataset by J. C. Bose • Low-level process is too large to be understandable • No one-to-one mapping from low-level to high-level events exist • 10 -fold Cross Validation • Levenshtein similarity: 0. 9667 15 -1 SLIDE 44

Inductive Miner on high-level result 15 -1 SLIDE 45

Overview • Method • Evaluation • Plugin • Conclusion 15 -1 SLIDE 46

Two versions of the plugin 15 -1 SLIDE 47

Configuration screen 15 -1 SLIDE 48

Conclusion • When we have traces with high-level label annotations, we can learn a mapping from low-level events to high-level events • XES extensions provide us with attributes with clear semantics that can be used to generate features for a low-level to high-level mapping • Supervised event abstraction sometimes allows us to see high-level structure that is not observable from process models discovered from low-level events • Supervised event abstraction can help to make a process model that it too large to comprehend more comprehensible 15 -1 SLIDE 49

Questions 15 -1 SLIDE 50
- Slides: 51