Complete and Interpretable Conformance Checking of Business Processes
Complete and Interpretable Conformance Checking of Business Processes Luciano García-Bañuelos Nick van Beest Marlon Dumas Marcello La Rosa Willem Mertens University of Tartu, Estonia Data 61 | CSIRO, Australia University of Tartu, Estonia Queensland University of Technology, Australia
Conformance checking 1. Compliance auditing – detect deviations with respect to a normative model (unfitting behavior) 2. Model maintenance – unfitting behavior – additional model behavior 3. Automated process model discovery – Iterative model improvement
Given a process model M and an event log L, explain the differences between the process behavior observed in M and L
State of the art Current approaches: • Are designed to identify the number and exact location of the differences • Don’t provide a “high-level” diagnosis that easily allows analysts to pinpoint differences: – Are unable to identify differences across traces – Are unable to fully characterize extra model behavior not present in the log
An example Log traces: ABCDEH ACBDEH ABCDFH ACBDFH ABDEH ABDFH Desired conformance output: • task C is optional in the log • the cycle including IGDF is not observed in the log
Our approach A method for business process conformance checking that: 1. Identifies all differences between the behavior in the model and the behavior in the log 2. Describes each difference via a natural language statement
How does it work?
How does it work?
How does it work?
How does it work?
Prime event structure (PES) A Prime Event Structure (PES) is a graph of events, where each event e represents the occurrence of a task in the modeled system (e. g. a business process) As such, multiple occurrences of the same task are represented by different events Pairs of events in a PES can have one of the following binary relations: • Causality: event e is a prerequisite for e' • Conflict: e and e' cannot occur in the same execution • Concurrency: no order can be established between e and e'
From event log to PES Log: PO runs:
From event log to PES Log: PO runs:
From event log to PES Log: PO runs:
From event log to PES Log: PO runs:
From event log to PES Log: PO runs:
From event log to PES Log: PO runs:
From model to PES BPMN model Petri net
From model to PES Branching process
From model to PES Corresponding event Cutoff event Complete prefix unfolding Cutoff event
PES prefix unfolding Complete prefix unfolding Cutoff event Corresponding event Cutoff event PES prefix unfolding
Loop relations B C
Comparing PESs Log PES EL Model PES prefix unfolding EM
Comparing PESs (cont’d)
Comparing PESs (cont’d)
Comparing PESs (cont’d) match C match D
Comparing PESs (cont’d) match C match D
Comparing PESs (cont’d) match C match D
Comparing PESs (cont’d) match C match D
Comparing PESs (cont’d) match C In the log, C is optional after {A, B}, whereas in the model it is not (task skipping) match D
Elementary mismatch patterns Unfitting behavior patterns: • Relation mismatch patterns 1. Causality-Concurrency 2. Conflict • Event mismatch patterns 3. 4. 5. 6. 7. Task skipping Task substitution Unmatched repetition Task relocation Task insertion / absence Additional model behavior patterns: 8. Unobserved acyclic interval 9. Unobserved cyclic interval
Example: Causality / Concurrency
Example: Task substitution
Unobserved cyclic interval: PES and PES prefix unfolding Log PES EL Model PES prefix unfolding EM
Pomsets (partially ordered multisets) • A pomset is a Directed Acyclic Graph where: – the nodes are configurations – the edges represent direct causality relations between configurations – an edge is labeled by an event • Unlike an event structure, a pomset does not have any conflict relation, since a pomset represents one possible execution • The behavior of a PES can be characterized by the set of pomsets it induces • In the case of a PES prefix, the set of induced pomsets is infinite when the PES prefix captures cyclic behavior via cc-pairs • We cannot enumerate all pomsets of a PES prefix to compare with the PES of the log • Therefore, we can extract a set of elementary pomsets (inspired by the notion of elementary paths), which collectively cover all the possible pomsets induced by a PES prefix • Cyclic behavior is not required to be unfolded infinitely
Unobserved cyclic interval: expanded prefix with elementary pomsets
Unobserved cyclic interval: creating a PSP using the expanded prefix Two unobserved elementary acyclic pomsets: • s 3 [a 5, a 9] • s 9 [a 8, a 9] Two unobserved elementary cyclic pomsets: • s 5 [a 3, a 6, a 4, a 7] • s 11 [a 4, a 7, a 3, a 6]
Verbalization of elementary mismatch patterns Change pattern Condition Verbalization Causality / Concurrency if e' < e In the log, after σ, λ(e') occurs before λ(e), while in the model they are concurrent In the model, after σ, λ(f') occurs before λ(f), while in the log they are concurrent else Conflict if e' || e else if f' || f else if e' < e else Task skipping if e ≠ ┴ else In the log, after σ, λ(e') and λ(e) are concurrent, while in the model they are mutually exclusive In the model, after σ, λ(f') and λ(f) are concurrent, while in the log they are mutually exclusive In the log, after σ, λ(e') occurs before task λ(e), while in the model they are mutually exclusive after σ In the model, after σ, λ(f') occurs before λ(f), while in the log they are mutually exclusive In the log, after σ, λ(e) is optional In the model, after σ, λ(f) is optional Task substitution In the log, after σ, λ(f) is substituted by λ(e) Unmatched repetition In the log, λ(e) is repeated after σ Task relocation if e ≠ ┴ else In the log, λ(e) occurs after σ instead of σ' In the model, λ(f) occurs after σ instead of σ'
Verbalization of elementary mismatch patterns Change pattern Condition Verbalization Task insertion / absence if e ≠ ┴ else In the log, λ(e) occurs after σ and before σ' In the model, λ(f) occurs after σ and before σ' Unobserved acyclic interval In the log, interval. . . does not occur after σ Unobserved cyclic interval In the log, the cycle involving interval. . . does not occur after σ
Implementation Standalone Java tool: Pro. Conformance OSGi plugin for Apromore: Compare – Input: BPMN process model and a log (MXML or XES format). Also accepts: • Two BPMN models for model comparison and • Two logs for log delta analysis – Output: set of difference statements
Evaluation 1. Qualitative evaluation on real life process: – Traffic fines management process in Italy with 150, 370 traces, 231 distinct traces 2. Quantitative evaluation on two large process model collections: – IBM Business Integration Unit (BIT): 735 models – SAP R/3: 604 models 3. User evaluation (academics vs practitioners)
Qualitative evaluation: traffic fines model
Qualitative evaluation: trace alignment • Replay a Log on Petri Net for Conformance Analysis: 205 misalignments out of 231 alignments • Replay a Log on Petri Net for All Optimal Alignments: 406 misalignments out of 412 alignments
Qualitative evaluation: verbalization 15 distinct statements in total, e. g. 1. In the log, “Send for credit collection” occurs after “Payment” and before the end state 2. In the model, after “Insert fine notification”, “Add penalty” occurs before “Appeal to judge”, while in the log they are concurrent 3. In the log, after “Add penalty”, “Receive results appeal from prefecture” is substituted by “Appeal to judge” 4. In the log, the cycle involving “Insert date appeal to prefecture, Send appeal to prefecture, Receive result appeal from prefecture, Notify result appeal to offender” does not occur after “Insert fine notification”.
Qualitative evaluation: verbalization 2. In the model, after “Insert fine notification”, “Add penalty” occurs before “Appeal to judge”, while in the log they are concurrent 4. In the log, the cycle involving “Insert date appeal to prefecture, Send appeal to prefecture, Receive result appeal from prefecture, Notify result appeal to offender” does not occur after “Insert fine notification”. Cannot be entirely detected by trace alignment, as this difference concerns additional model behavior, while alignment-based ETC conformance only detects escaping edges Cannot be detected by trace alignment, as diagnostics are provided at the level of individual traces
Qualitative evaluation: summary Verbalization: • produces a more compact yet more understandable diagnosis • exposes behavioral differences that are difficult or impossible to identify using trace alignment
Quantitative evaluation • For each model, we generated an event log using the Pro. M plugin “Generate Event Log from Petri Net” • This plugin generates a distinct log trace for each possible execution sequence in the model • The tool was only able to parse 274 models from the BIT collection, and 438 models from the R/3 collection, running into out-of-memory exceptions for the remaining models • Total models: 712 sound Workflow nets
Quantitative evaluation: model complexity
Quantitative evaluation: log size Total log size (events)
BIT 800 700 600 500 400 300 200 100 0 Log size (# events) Quantitative evaluation: time performance 0 50 5% noise 150 200 Time (ms) 10% noise 15% noise 250 300 20% noise 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 BIT 0 50 No noise 100 5% noise 150 200 Time (ms) 10% noise 250 15% noise 300 20% noise 12000 Log size (# events) SAP Log size (# events) No noise 100 800 700 600 500 400 300 200 100 0 0 No noise 10 20 5% noise 30 40 50 60 Time (s) 10% noise 70 15% noise Verbalization 80 90 100 20% noise 10000 8000 SAP 6000 4000 2000 0 0 No noise 0, 5 5% noise 1 Time (s) 10% noise 1, 5 15% noise 2 20% noise Trace alignment
Quantitative evaluation: results Statements Misalignments Escaping edges
Quantitative evaluation: summary • Verbalization, although generally slower than trace alignment, shows reasonable execution times (within 10 s) • Extreme cases: (logs with over 8, 000 events in distinct traces) and a high number of differences, the execution time is still below 2 minutes • Verbalization consistently produces a more compact difference diagnosis than trace alignment
User evaluation • Online survey: – a simple Petri net with 31 nodes (10 visible transitions), created from a real-life claims handling process model – assumed that this model was accompanied by a log with 53 traces • Output of the alignment method (misalignments + Petri net with alignment information) overlaid vs • Output of the verbalization method (list of statements)
User evaluation • Respondents compared both methods using the Technology Acceptance Model: 1. 2. 3. 4. 5. 6. • • What is the easiest approach for checking the conformance of an event log to a process model? What is the easiest approach for identifying the differences between a process model and an event log? What is the most useful approach for checking the conformance of an event log to a process model? What is the most useful approach for identifying the differences between a process model and an event log? Which approach would you likely use for checking the conformance of an event log to a process model? Which approach would you likely use for identifying the differences between a process model and an event log? Seven point Likert-scale: “Strongly prefer Alignment” to “Strongly prefer Verbalization” Background: academic vs professional Experience in process modelling Confidence in modelling with Petri nets
User evaluation: hypotheses H 1: respondents would have a preference for verbalization H 2: respondents with less experience, familiarity, confidence and competence in the use of Petri nets would have a stronger preference for verbalization
User evaluation: results • Academics (38 responses) – More familiar in working with Petri nets – More competent in working with Petri nets – Analysed and created more models in the past 12 months • Professionals (33 responses) – Less familiar with Petri nets – Mostly rely on professional training
User evaluation: results • H 1: – Tested for the full sample and for the two cohorts separately – For the full sample there is no general preference for our method: the median was zero (“neutral”) – Professionals did show a preference for verbalization (especially along ease of use) while academics preferred alignment, so H 1 is supported for the professionals cohort only • H 2: – Respondents with more experience, familiarity, confidence and competence in working with Petri nets have a stronger preference for alignments – H 2 is supported by the results
User evaluation: summary • Academics prefer alignment • Professionals prefer verbalization • Overall, people with less expertise in the use of Petri nets show a stronger preference for verbalization
Limitations of the approach • Input log is assumed to consist of sequences of event labels – timestamps are ignored – event payloads are ignored • Simplicity of the used concurrency oracle (a+), leading to occasional difficulties in the presence of – short loops – skipped and/or duplicated tasks • Lack of visual representation of differences (text only) • No option to use different levels of abstraction • No statistical support for differences: all equally important even if some may be very infrequent
Future work • Employing a more accurate concurrency oracle (e. g. local a)* • Group related statements to trade accuracy with greater interpretability • Add statistical support to statements • Visual representation of differences in addition to natural language statements (e. g. via “representative” runs) • Capturing non-control-flow deviance – Analysis of underlying data – Resources – Temporal aspects • Use differences as a basis for model repair *Armas-Cervantes, A. , Dumas, M. , & La Rosa, M. (2016) Discovering Local Concurrency Relations in Business Process Event Logs, https: //eprints. qut. edu. au/97615
Thank you for your attention
- Slides: 61