Introduction to RFID Data Mining Prof Kesheng Wang























- Slides: 23
Introduction to RFID Data Mining Prof. Kesheng Wang Knowledge Discovery Laboratory Department of Production and Quality Engineering Norwegian University of Science and Technology kesheng. wang@ntnu. no
Contents A quick look back The form of RFID data Interpretation of the data What can we do with the data Suggestions to data analysis Conclusions
Broad Applications of RFID Technology Inventory Management Electronic Toll Collection Asset Tracking Medical 3
What do we get from the RFID supplier EPC? XML data model EPCIS standard
EPC and EPCIS Electronic Product Code, provides a unique, serialized identifier for any kind of object. Electronic Product Code Information Services (EPCIS) is an EPCglobal standard for sharing EPC related information between trading partners. EPCIS provides important new capabilities to improve efficiency, security, and visibility in the global supply chain, and complements lower level EPCglobal tag, reader, and middleware standards. In a supply chain What Where When Why Product Location Time Disposition, business step
• Business Transaction – includes a type (e. g. : Purchase Order, Invoice, Bill of Lading) and a number. By including the Business Transaction number in a business event, it is possible to relate EPCs to a Business Transaction – e. g. : state that EPCs 1 -5 are in Purchase Order Company. A-123. • Read Point – indicates the location where an event took place – e. g. : DC X conveyor belt #2 • Business Location – describes where the object is immediately after the event occurs – e. g. : DC X Shipping Area • Event Time – states when an event took place • Record Time – indicates when the event was received through the EPCIS Capture Interface WHY can be a list (Object or Transaction Events) or parent/child (Aggregation or Transaction Events). It is possible to include any unique identity in the EPC field. WHEN • EPC – WHERE WHAT Data elements in EPCIS standard • Business Step – indicates what business operation was taking place at the time of the event – e. g. : Receiving, Picking, Loading, Shipping • Disposition – describes the status of the product immediately after the event occurs – e. g. : Sellable, In Progress, Non Sellable, Destroyed
Interpretation of the data file When What Why Where
Data processing Easy for maintenance Add fields such as ‘quality’
Bulky Object Movements shelf 1 store 1 10 pallets (1000 cases) Dist. Center 1 1. 1 Factory 1 Dist. Center 2 1. 1. 1 store 2 1. 1 shelf 2 1. 1. 1. 2 … … … 20 cases (1000 packs) 10 packs (12 sodas) [{i 1, i 2, …, i 10000}, Dist Center 1, 01/01/08, 01/03/08] 1. 1 9
Non-Spatial Generalization Category level Clothing Type level Outerwear SKU level EPC level Shirt 1 … Shoes Jacket Interesting Level … Shirt n 10
Path Generalization Store View: Transportation dist. center truck backroom shelf checkout Transportation View: dist. center truck Store 11
Example Trajectory (Factory, T 1, T 2) (Checkout, T 9, T 10) (Shipping, T 3, T ) 4 (Warehou se, T 5 , T ) 6 (Shelf, T 7, T 8) 12
What Can Product Flows Tell? Why was the Milk discarded? 13
Flow. Graph • Tree shaped workflow – Nodes: Locations – Edges: Transitions storage shelf factory • Each node is annotated with: – Distribution of durations at the node – Distribution of transition probabilities – Significant duration, transition exceptions backroom truck warehouse
Data Flow Analysis
A Supposed Product Distribution Network
A Supposed Product Distribution Network Data sheet from different business locations Event. Time Epc. List Read. Point Biz. Location 19. 03. 2012 09: 04 7071371. 00000001 7080000000419 19. 03. 2012 13: 02 7071371. 00000007 7080000000421. 1 7080000000421 19. 03. 2012 09: 04 7071371. 00000006 7080000000419. 1 7080000000419 19. 03. 2012 13: 01 7071371. 00000009 7080000000421. 1 7080000000421 19. 03. 2012 09: 03 7071371. 00000007 7080000000419. 1 7080000000419 19. 03. 2012 13: 01 7071371. 00000010 7080000000421. 1 7080000000421 19. 03. 2012 09: 04 7071371. 00000009 7080000000419. 1 7080000000419 19. 03. 2012 13: 00 7071371. 00000019 7080000000421. 1 7080000000421 19. 03. 2012 09: 04 7071371. 00000010 7080000000419. 1 7080000000419 19. 03. 2012 13: 00 7071371. 00000024 7080000000421. 1 7080000000421 19. 03. 2012 09: 01 7071371. 00000011 7080000000419 19. 03. 2012 13: 01 7071371. 00000038 7080000000421. 1 7080000000421 19. 03. 2012 09: 04 7071371. 00000014 7080000000419. 1 7080000000419 19. 03. 2012 13: 00 7071371. 00000041 7080000000421 19. 03. 2012 09: 01 7071371. 00000019 7080000000419. 1 7080000000419 19. 03. 2012 13: 03 7071371. 00000045 7080000000421. 1 7080000000421 19. 03. 2012 09: 01 7071371. 00000020 7080000000419. 1 7080000000419 19. 03. 2012 13: 02 7071371. 00000050 7080000000421. 1 7080000000421 Event. Time Epc. List Read. Point Biz. Location 19. 03. 2012 09: 30 7071371. 00000002 7080000000420. 1 7080000000420 19. 03. 2012 18: 00 7071371. 00000002 7080000000423 19. 03. 2012 09: 33 7071371. 00000003 7080000000420. 1 7080000000420 19. 03. 2012 17: 30 7071371. 00000005 7080000000423. 2 7080000000423 19. 03. 2012 09: 30 7071371. 00000004 7080000000420. 1 7080000000420 19. 03. 2012 17: 32 7071371. 00000008 7080000000423. 2 7080000000423 19. 03. 2012 09: 30 7071371. 00000005 7080000000420. 1 7080000000420 19. 03. 2012 17: 30 7071371. 00000012 7080000000423 19. 03. 2012 09: 31 7071371. 00000008 7080000000420. 1 7080000000420 19. 03. 2012 17: 32 7071371. 00000018 7080000000423. 2 7080000000423 19. 03. 2012 09: 30 7071371. 00000012 7080000000420. 1 7080000000420 19. 03. 2012 17: 30 7071371. 00000029 7080000000423. 2 7080000000423 19. 03. 2012 09: 31 7071371. 00000013 7080000000420. 1 7080000000420 19. 03. 2012 18: 01 7071371. 00000030 7080000000423. 2 7080000000423 19. 03. 2012 09: 30 7071371. 00000015 7080000000420. 1 7080000000420 19. 03. 2012 18: 01 7071371. 00000031 7080000000423. 2 7080000000423 19. 03. 2012 09: 32 7071371. 00000016 7080000000420. 1 7080000000420 19. 03. 2012 17: 31 7071371. 00000034 7080000000423. 2 7080000000423 Only four keywords are kept here – Event. Time, Epc. List, Read. Point and Biz. Location
A Supposed Product Distribution Network • For the supposed problem, only the location is considered while the time fact is ignored which means – The “Biz. Location” is used to trace the items instead of the “Readpoint” – The time of storage and transportation is also not considered – A route list is formed on the basis of “Epc. List” and “Event. Time” Epc. List 7071371. 00000001 7071371. 00000002 7071371. 00000003 7071371. 00000004 7071371. 00000005 19. 03. 2012 08: 02 19. 03. 2012 08: 30 19. 03. 2012 08: 32 Biz. Location 7080000000418 7080000000418 Event. Time 19. 03. 2012 09: 30 19. 03. 2012 09: 33 19. 03. 2012 09: 30 Biz. Location 7080000000419 7080000000420 Event. Time 19. 03. 2012 14: 34 19. 03. 2012 14: 02 19. 03. 2012 14: 01 19. 03. 2012 14: 32 Biz. Location 7080000000419 7080000000420 Event. Time 19. 03. 2012 16: 03 19. 03. 2012 15: 00 19. 03. 2012 15: 04 19. 03. 2012 16: 00 Biz. Location 7080000000422 7080000000423 Event. Time 19. 03. 2012 18: 00 19. 03. 2012 17: 01 19. 03. 2012 17: 03 19. 03. 2012 17: 30 Biz. Location 7080000000422 7080000000423 Event. Time 19. 03. 2012 19: 30 19. 03. 2012 18: 32 19. 03. 2012 18: 31 19. 03. 2012 19: 02 7080000000429 7080000000427 7080000000428 Event. Time 19. 03. 2012 09: 04 19. 03. 2012 13: 01 19. 03. 2012 14: 32 19. 03. 2012 17: 01 19. 03. 2012 18: 03 Biz. Location 7080000000426
A Supposed Product Distribution Network • With the quality of the items, association rule is used to find the potential relevance between the business location and the quality. (IBM® SPSS® Modeler as the tool) • The data sheet has to be changed so that it is accepted by the software as following EPC biz 1 biz 2 biz 3 biz 4 biz 5 biz 6 biz 7 biz 8 biz 9 biz 10 biz 11 biz 12 Unqualified 7071371. 00000001 T T F F F T F F 7071371. 00000002 T F F F T F 7071371. 00000003 T F T F F T 7071371. 00000004 T F T F F F 7071371. 00000005 T F F T F F 7071371. 00000006 T T F F F T F F 7071371. 00000007 T T F F F 7071371. 00000008 T F F T F F 7071371. 00000009 T T F F F Where – – “biz” means “Biz. Location” T (True) means the Event occurs while F (False) means the opposite
A Supposed Product Distribution Network • A stream is built up in IBM® SPSS® Modeler The Apriori Algorithm is used The input and target value are defined respectively
A Supposed Product Distribution Network • As shown in the table below, biz 5 is the node most relevant to the qualify, there could be something wrong at the node • Meanwhile, the route 10 -5 -3 -1 is where most unqualified occurs. Consequent Antecedent Confidence % Rule Support % Unqualified = T biz 5 = T and biz 3 = T 30, 83 9, 25 Unqualified = T biz 5 = T and biz 3 = T and biz 1 = T 30, 83 9, 25 Unqualified = T biz 10 = T 24, 00 4, 80 Unqualified = T biz 10 = T and biz 5 = T 24, 00 4, 80 Unqualified = T biz 10 = T and biz 3 = T 24, 00 4, 80 Unqualified = T biz 10 = T and biz 1 = T 24, 00 4, 80 Unqualified = T biz 10 = T and biz 5 = T and biz 3 = T 24, 00 4, 80 Unqualified = T biz 10 = T and biz 5 = T and biz 1 = T 24, 00 4, 80 Unqualified = T biz 10 = T and biz 3 = T and biz 1 = T 24, 00 4, 80 Unqualified = T biz 10 = T and biz 5 = T and biz 3 = T and biz 1 = T 24, 00 4, 80 Unqualified = T biz 5 = T 21, 78 9, 80 Unqualified = T biz 5 = T and biz 1 = T 21, 78 9, 80
Conclusions • EPCIS standard – Provides an effective way of data exchange • Define the data mining problem – Make sure what are we interested • Extracting interesting information – Define the necessary information in the supposed problem – Extract the information from original. xml dataset – Flowgraph analysis, data cleaning, classification, clustering, trend analysis, frequent/sequential pattern analysis • Selecting suitable Data mining approaches – Association rules, decision tree, … 22
Thanks for your attention!