Discovery of the Branching Conditions in Process Models
Discovery of the Branching Conditions in Process Models Some slides are adapted and extended from those prepared by Luciano García-Bañuelos
Discovery of control flow Event Log Fahland, van der Aalst (2011). Simplifying mined process models, BPM 2011 1
Data perspective? Branching points age salary amount length Inv installment 2
Cons of the Approach of Anne Rozinat @ BPM 2006 1. Problems with invisible and duplicate tasks. 2. The event log needs to be 100% compliant with the control-flow of the process model. The Control-flow Alignment again provides the solution. 22 -12 -2021 PAGE 3
Pro. M’s decision miner / 1 Log A B aligning Learning Alignments Replay Instances for Branching Point A M. de Leoni, W. M. P. van der Aalst "Data-Aware Process Mining: Discovering Decisions in Processes Using Alignments" In Proc. of the 28 th ACM Symposium on Applied Computing (SAC 2013). Track "Enterprise Engineering" Instances for Branching Point B Instances for Branching Point C PAGE 4
Pro. M’s decision miner / 2 ELA (Amount=8500, Len=2), RAD (Age=25, Salary=2000), CI(Installm=750), ELA RAD CI ASA, INV ASA RAD (Age=35, Salary=3500), CI(Installm=1200), ELA RAD CI, Inv Amount Installm Salary Age Len Task 8500 750 2000 25 2 ASA N. Avail. 1200 35 N. Avail ACA Amount Installm Salary Age Len Task 8500 No Avail 2000 25 2 INV N. Avail No Avail 3500 35 N. Avail INV ACA, INV, ACA
Pro. M’s decision miner / 2 Amount Installm Salary Age Len Task 8500 750 2000 25 1 ASA N. Avail. 1200 35 N. Avail ACA 11000 450 2500 27 2 ACA … … … Decision tree learning Implemented in Pro. M! (amount < 10000) ∨ (amount ≥ 10000 ∧ age < 35) amount < 10000 ≥ 10000 age ≥ 35 < 35 Approve Simple Application (ASA) amount ≥ 10000 ∧ age ≥ 35 Approve Complex Approve Simple Application (ACA) Application (ASA) 6
Validation with a synthetic process model 7
Validation with the BPI Challenge Log • Using different plug-ins, drawn a Petri net. • Event log split in two sets: 1. Training set to discover the guards 2. Test set to test the quality of the guards discovered. • Guards discovered in few seconds using 500 Mb of memory. • Unfortunately, only 2 attributes present in the event log: request date and amount requested. • The average data-flow conformance was 0. 85 PAGE 8
Insight gained on the BPI Challenge event log • Request is accepted 3000 < Amount < 49000 • Request is cancelled not (3000 < Amount < 49000) • Request provisionally accepted and, later, declined Amount < € 5350. • REG DATE appears in no guards The procedure has not changed over time (i. e. no concept drift in the dataflow conditions). PAGE 9
Decision miner: Not a panacea! • It cannot discover expressions of the form “v op v” installment > salary The decision miner would return: Inv installment = 1760 ∧ salary ≤ 1750 ∨ installment = 1810 ∧ salary ≤ 1800 ∨ installment = 1875 ∧ salary ≤ 1850 ∨ installment = 1960 ∧ salary ≤ 1950 ∨ installment = 1975 ∧ salary ≤ 1970 ∨ installment = 2000 ∧ salary ≤ 1990 ∨ … 10
A solution • Our solution combines • Tools for dynamic analysis of software, specifically likely invariant discovery • Theory of machine learning M. de Leoni, M. Dumas and L. García-Bañuelos "Discovering Branching Conditions from Business Process Execution Logs" In Proc. of 16 th International Conference on Fundamental Approaches to Software Engineering (FASE 2013) 11
Daikon • Tool for discovery of likely invariants from execution logs • How it works: 1. Analyzes the set of variables in the code of interest 2. Instantiates a set of predefined invariant templates 3. Traverses the execution log − Falsifying some invariants − Gathering the statistical support for the remaining templates 4. Discards some invariants based on: − Subsumption − Statistical support 12
Daikon: Tool for mining likely invariants CID Amount Installm Salary Age Len Task 13210 2000 25 1 NR 13220 25000 1200 35 2 NE 13221 9000 450 2500 27 2 NE 13219 8500 750 2000 25 1 ASA 13220 25000 1200 35 2 ACA 13221 9000 450 2500 27 2 ASA … … … … Daikon installment > salary amount ≥ 5000 length < age … installment ≤ salary amount ≤ 9500 length < age … installment ≤ salary Inv amount ≥ 5000 length < age installment ≤ salary … amount ≥ 10000 length < age … 13
Branch. Miner (Conjunctive) • Information Gain (IG) quantifies the discriminating power of a predicate (with respect to two different outcomes) • Approach: • Use Daikon for discovering invariants • Combine invariants in a conjunction so as to maximize the overall IG IG(a 1) = 0. 8 IG(a 2) = 0 IG(a 3) = 0 IG(a 1∧a 2) = 0 … a 1: installment > salary a 2: amount ≥ 5000 a 3: length < age … Inv 14
Branch. Miner (Disjunctive) • Approach • Partition the event log using a decision tree • Derive conjunctive branching conditions for each partition • Combine the conjunctive condition of partitions with the same outcome in a disjunctive condition so as to maximize IG Decision Tree Notify Rejection Partition 1 Conjunctive Branch. Miner CONJ 1 IG(CONJ 1) = 0. 4 IG(CONJ 2) = 0. 45 IG(CONJ 3) = 0. 5 Inv Notify Rejection … Partition 2 Conjunctive Branch. Miner CONJ 2 IG(CONJ 1∨CONJ 2) = 0. 78 IG(CONJ 1∨CONJ 3) = 0. 6 …
Linear and polynomial expressions • Approach • Select all numerical variables and generate some derived (a. k. a. latent) variables using an arithmetic operator e. g. , salary_div_installment, meaning “salary/installment” • Augment the instance table with the values for latent variables • Run the discovery method for conjunctive/disjunctive conditions CID Amount Installm Salary Age Len Task 13210 2000 25 1 NR 13220 25000 1200 35 2 NE 13221 9000 450 2500 27 2 NE 13219 8500 750 2000 25 1 ASA 13220 25000 1200 35 2 ACA 13221 9000 450 2500 27 2 ASA … … … … Age+Le CID Amount Installm Salary Sal/Inst Age Len n Task 13210 2000 1. 00 25 1 26 NR 13220 25000 1200 3500 2. 92 35 2 37 NE 13221 9000 450 2500 5. 56 27 2 29 NE 13219 8500 750 2000 2. 67 25 1 26 ASA 13220 25000 1200 3500 2. 92 35 2 37 ACA 13221 9000 450 2500 5. 56 27 2 29 ASA … … … … … 16
Assessment with an ~2 event log of 3000 ~12 secs. traces 17
How to generalize to the n-ary case? 22 -12 -2021 PAGE 18
Future work Branch Miner • Completing the Integration in Pro. M − The conjunctive branch miner is implemented in Pro. M. − The disjunctive branch miner is implemented as a command line tool • Validation with real-life logs • Currently assessed with synthetic event logs. • Dealing with event logs with deviating traces • Traces need to be repaired by guessing the values assigned to variables in move in model. • Again machine-learning techniques: Bayesian networks? 19
- Slides: 20