Fuzzy Duplicates Analysis with ACL Prepared by Kevin
Fuzzy Duplicates Analysis with ACL Prepared by: Kevin Legere Date: April 3 rd, 2013
ACL | Transforming Audit and Risk Agenda § Overview § Example § FUZZYDUP command § OMIT() Function § Script Editor and RECOFFSET § Q&A © 2012 ACL Services Ltd. 2
ACL | Transforming Audit and Risk © 2012 ACL Services Ltd. Overview § What is a "Fuzzy Duplicate"? – Match based on criteria where the values are not exact but very close » EX: "ACL Services" and "ACL Service" § Typically used for: » » Keyword matching Invoice Number matching Vendor Name matching* Employee Name matching § Can be simple or complex » Completely depends on your approach and desired accuracy * focus for this presentation 3
ACL | Transforming Audit and Risk Overview § Simple Match Examples: – Exact or 100% match » "ACL" = "ACL" – Force Upper or Lower case » "ACL" = UPPER("acl") » "acl" = LOWER("ACL") – Removal of special characters » "ACL" = EXCLUDE("*ACL. " "!@#$%^&*(). ") – Only compare numbers or letters » "ACL" = INCLUDE(UPPER("ACL 123") "ABCDEFGHIJKLMNOPQRSTUVWXYZ") » "123" = INCLUDE("ACL 123" "1234567890") © 2012 ACL Services Ltd. 4
ACL | Transforming Audit and Risk © 2012 ACL Services Ltd. Overview § Complex Match Examples: – Removal of company type indicators (LLC, INC, LTD, etc) » "ACL Services Ltd. " = "ACL Services" – Percent of word match AKA letter by letter » "ACL Services" • "ACL Service" 11/12 character match or 91. 6% match – Word by Word* » "ACL Services" • • • "ACL Champions" "ACL" "Services" "Champions" = 50% match – Levenshtein distance – Sounds like – NYSIIS *Most used by ACL Consultants 5
ACL | Transforming Audit and Risk Vendor Master Analysis © 2012 ACL Services Ltd. 6
ACL | Transforming Audit and Risk © 2012 ACL Services Ltd. Vendor Master Analysis § Fuzzy Duplicates on Vendor Name – Possible Risk » Payments are being sent to more than one vendor – May not involve risk. The desire can be to normalize the vendor master list to ensure that duplicates do not exist. » Ideally, one unique vendor should exist in your vendor master list with one or more address records in your vendor address table 7
ACL | Transforming Audit and Risk Vendor Master Analysis § Sample file contains 75 vendors – Only Vendor Code and Vendor Name § Where do you start for Vendor Name matching? – Look for exact duplicates – Focus on Simple matching – Sort or Summarize! © 2012 ACL Services Ltd. 8
ACL | Transforming Audit and Risk Vendor Master Analysis § Step 1: Summarize your Vendor Master File » Choose Vendor Name as your key field » Add Vendor Code as the Other Fields for Summarizing » Be sure to check "Presort" © 2012 ACL Services Ltd. 9
ACL | Transforming Audit and Risk © 2012 ACL Services Ltd. 10 Vendor Master Analysis § Step 2: Quickly comb over the data to identify a common trend. » We will focus on this issue, in the sample data: » Create a computed field that corrects the trend (or cleans the data).
ACL | Transforming Audit and Risk © 2012 ACL Services Ltd. 11 Vendor Master Analysis § Functions used in Default Value text box: INCLUDE(UPPER(ALLTRIM(Vendor_Name)) 'ABCDEFGHIJKLMNOPQRSTUVWXYZ') § Within ACL, the computed field will return the following:
ACL | Transforming Audit and Risk © 2012 ACL Services Ltd. 12 Vendor Master Analysis § Step 3: Perform a Duplicates Command on the computed field
ACL | Transforming Audit and Risk Vendor Master Analysis § Results are as follows: © 2012 ACL Services Ltd. 13
ACL | Transforming Audit and Risk © 2012 ACL Services Ltd. 14 FUZZYDUP command § ACL 9. 3 has new features that make Fuzzy Duplicate analysis easier – – FUZZYDUP command OMIT() function ISFUZZYDUP() function LEVDIST() function § Important parameters to understand – Levenshtein Distance – Difference Percentage
ACL | Transforming Audit and Risk © 2012 ACL Services Ltd. 15 FUZZYDUP command § Syntax – FUZZYDUP ON {key_field} <OTHER fields> {LEVDISTANCE value} <DIFFPCT value><RESULTSIZE value> <EXACT> TO table_name § Example – FUZZYDUP ON Vendor_Name OTHER ALL LEVDISTANCE 2 DIFFPCT 50 TO My_Results § Levenshtein Distance (LEVDISTANCE) » The number of edits required to make the strings equal • EX: "Smith" and "Smythe" have a Levenshtein Distance of 2 § Difference Percentage (DIFFPCT) » The threshold for percentage difference between two strings • • EX: "Smith" and "Smythe" have a Percentage Difference of 40% (2/5) * 100%
ACL | Transforming Audit and Risk © 2012 ACL Services Ltd. 16 OMIT() Function § When Do I use OMIT()? – When you want to refine fuzzy duplicate analysis – Look for repeating strings you want to remove from your Vendor Name field § Syntax – OMIT(string 1, string 2 <, case_sensitive>) – Specify T to make substrings specified for removal case-sensitive, or F to ignore case § Example – OMIT(Vendor_Name " Ltd, Inc, Corporation" F)
ACL | Transforming Audit and Risk © 2012 ACL Services Ltd. 17 Script Editor and RECOFFSET
ACL | Transforming Audit and Risk © 2012 ACL Services Ltd. 18
ACL | Transforming Audit and Risk © 2012 ACL Services Ltd. 19 Contact Information Kevin Legere ACL Services Ltd. Implementation Consultant 1550 Alberni Street, Vancouver, BC, Canada V 6 G 1 A 5 kevin_legere@acl. com | @aclkevin www. acl. com/linkedin | www. acl. com/twitter | www. acl. com/facebook
ACL | Transforming Audit and Risk © 2012 ACL Services Ltd. 20
- Slides: 20