Machine Learning Assisted IPCbased Sorting at IP Australia

Machine Learning Assisted IPC-based Sorting at IP Australia Abhinay Mukunthan Senior Examiner of Patents IP Australia

Examination Work-flow at IP Australia New Application Filed Technology Sorting Sub-class level classification to select examination section Classification Sub-group level classification in IPC (and CPC if AU original) Examination

IP Australia’s Patent Examination Structure • 14 Examination workgroups • Work allocated to each exam group based on IPC sub-class, in rare occasions by main-group or sub-group – Example: Medical devices examines A 61 B, C, D, F, G, H, J, M, N and C 12 M – Example: Bio-technology examines A 01 H, A 01 K 67, C 12 N, Q, R, S, C 04 B 20 and G 01 N 33/48 -52 and G 01 N 33/566 -98

Technology Sorting – Existing Practice • Currently performed by examiners to allocate an IPC sub-class to all incoming non-national phase applications – where required, a main-group or sub-group are also allocated • Estimated time spent: average of 2. 5 minutes per application with around 11400 cases per year, equating to roughly 64 working days per year

Technology Sorting – Machine Learning Trial • PAC (Patent Auto-Classifier) – a machine learning based system to perform technology sorting of PCT applications by allocating IPC main-group level information • PAC is built around a training database of applications from 2010 -2016 including: – All IP Australia classified national, convention and PCT applications – All AU national phase applications with existing IPC symbols from other offices • Results from a three month trial (late 2018) – 86% accuracy when sorting to the right examination workgroup

Overview of the System • Sits outside existing case-management and other IT systems • Uses a hierarchical training and prediction model - mirroring the IPC structure • Always selects the top three results at each level of the hierarchy before traversing to the next level • Once a selection of results are available at a particular level of detail, the top three overall results are selected

A Simplified Example of the Algorithm • Claim being classified: An unmanned aerial vehicle based wireless sensor network using satellite positioning. – Required level of detail: IPC Main-group A B C B 60 B 62 B 64 B B 64 C D B 64 D E G 01 P F G G 01 G 05 G 01 R G 01 S H G 06 H 01 H 03 H 04 B H 04 L H 04 W B 64 C 27 G 01 S 7 H 04 W 4 B 64 C 29 G 01 S 11 H 04 W 40 B 64 C 39 G 01 S 19 H 04 W 84

Training the Algorithm – an overview Symbol Document Terms H 04 L 5/00 AU 2015236595 Encode H 04 L 7/00 AU 2014265895 Cipher H 04 L 9/00 AU 2017236598 Transmission H 04 L 13/00 H 04 L 15/00 Each symbol has a cluster of representative documents and each document has a cluster of weighted terms • Terms in a document are weighted based on: • Frequency within the same document • Inverse-frequency across different documents • Example: Terms like “Encryption”, “Transmission” occur frequently within the same document but not across many documents Higher weighting • Example: Terms like “Figure”, “Abstract” occur frequently within all documents Lower weighting Encryption H 04 L 1/00 H 04 L 12/00 • Authentication AU 2016325697 Token AU 2015100156 Secret

Training the Algorithm – looking a bit deeper • Each node (symbol) in the IPC is trained with a sample of documents with that particular symbol using an extreme gradient boosting (XGBoost) algorithm • Term similarities are measured using pre-trained word embedding over IPA patent documents (using the word 2 vec algorithm) • The combination of term weighting and word embedding is used to measure the distance between patent documents • When a new application is classified, each IPC node runs its own customised algorithm to provide a probability score for that application belonging to that node

Future Work • Integrate PAC with existing IT systems using process automation tools (PEGA Robotics) • Run trials with an expanded training data set including USPTO and EPO applications and classifications • Trial predictions at full IPC Sub-group level to provide suggestions to examiners • Expand the system to work with USPTO and EPO CPC data sets and provide CPC suggestions • Collaborate with other offices on their efforts in AI and ML-assisted classification

Thank you Abhinay Mukunthan Senior Examiner of Patents IP Australia Abhinay. Mukunthan@ipaustralia. gov. au
- Slides: 11