Open data sets from the EPO Linked data

  • Slides: 35
Download presentation
Open data sets from the EPO: Linked data, full-text data EU Datathon 2020 Martin

Open data sets from the EPO: Linked data, full-text data EU Datathon 2020 Martin Kracker European Patent Office 28 February 2020

Agenda § EPO and patent information § EP full-text data for text analytics §

Agenda § EPO and patent information § EP full-text data for text analytics § Linked open EP data European Patent Office 2

A simple contract Confer exclusivity to the patent applicant European Patent Office Patents Reveal

A simple contract Confer exclusivity to the patent applicant European Patent Office Patents Reveal the invention to the public 3

What information do patent documents contain? § Bibliographic data • Title, abstract • Applicant,

What information do patent documents contain? § Bibliographic data • Title, abstract • Applicant, inventor, legal representative • Dates • Technical classifications • Links to other patents (forming “families”), like − Earlier filings (priorities, . . . ) − Citations § Text and images • Detailed description of invention • Claims, drawings European Patent Office 4

EP patent publications § One patent application is published one or more times, having

EP patent publications § One patent application is published one or more times, having different a publication kind and publication date (see https: //www. epo. org/searching-for-patents/helpful-resources/first-time-here/definitions. html and Patent Information Tour http: //e-courses. epo. org/wbts/pi_tour/index. html) § The most frequently used publications kinds are: EP-A documents: published after 18 months − A 1 with search report − A 2 without search report (search report A 3 to follow later) EP-B documents: − B 1 granted patent § Typical publication sequences: A 1; A 1+B 1; A 2+A 3+B 1 European Patent Office 5

EPO’s Patent Information dissemination Human access EP full-text search EP Bulletin search European Publication

EPO’s Patent Information dissemination Human access EP full-text search EP Bulletin search European Publication Server Global Patent Index PATSTAT Online European Patent Register Espacenet Global Dossier Common Citation Document Computer access Web services Data products Open Patent Services European Publication Server EP (Linked Data, XML, PDF/A, EBD) worldwide (DOCDB, INPADOC) PATSTAT data European Patent Office 6

Agenda § EPO and Patent Information § EP full-text data for text analytics §

Agenda § EPO and Patent Information § EP full-text data for text analytics § Linked open EP data European Patent Office 7

Key facts of “EP full-text data for text analytics” § Data product containing XML-tagged

Key facts of “EP full-text data for text analytics” § Data product containing XML-tagged titles, abstracts, descriptions, (amended) claims, search reports of EP publications; It does not contain: bibliographic data, images, PDFs or TIFFs § Design goal: - simplicity - low entry barrier for NLP (machine learning, AI, linguistic research, . . . ) § Open license, free-of-charge, updated regularly (at least annually) § Launched: March 2019 Updated: Mid-September 2019; new update planned for March 2020 see epo. org/bulk-data European Patent Office 8

Simple key - value format er b m kind ate u n n d

Simple key - value format er b m kind ate u n n d n n o o o i ti cat ati age ype a lic ubli blic ngu xt t b P E Pu P Pu La Te Key European Patent Office d) e gg a xt e T (X t L M Value 9

Data format, volume and access § A set of 36 unzipped text files (200

Data format, volume and access § A set of 36 unzipped text files (200 GB in total) § Tab separated file; text is structured as XML § Every file contains publications of a range of 100 000 publication numbers § Recommended access: Self-service download from Google Cloud platform − Requires Google account − Estimated download costs, charged by Google: 25 $ European Patent Office 10

Possible use case Identify relevant publications publication numbers Data set or tool containing bibliographic

Possible use case Identify relevant publications publication numbers Data set or tool containing bibliographic patent information Examples: • Linked open EP data • Espacenet • Google Patents • EPO’s Open Patent Services (OPS) European Patent Office Retrieve publication texts Text analytics EP full-text for text analytics • PATSTAT • Global Patent Index • Commercial tools, . . . 11

Agenda § EPO and Patent Information § EP full-text data for text analytics §

Agenda § EPO and Patent Information § EP full-text data for text analytics § Linked open EP data European Patent Office 12

Key facts of “Linked open EP data” § Data product containing EP bibliographic data

Key facts of “Linked open EP data” § Data product containing EP bibliographic data and CPC scheme § Format: Linked data (aka Semantic Web) (RDF) § Open license, free-of-charge, updated weekly § Target user group: Patent-non-experts, web developers, data scientists § Launched: April 2018: epo. org/linked-data European Patent Office 13

Patent classification system (IPC, CPC) “Bottle opener” European Patent Office “Method of opening a

Patent classification system (IPC, CPC) “Bottle opener” European Patent Office “Method of opening a bottle” “Lift arrangement” “Cork lifter” “Cork remover” 14

Patent classification system (IPC, CPC) § They have a hierarchical structure; CPC: 250 000

Patent classification system (IPC, CPC) § They have a hierarchical structure; CPC: 250 000 symbols A Human necessities A 47 Furniture; Domestic articles and appliances, . . . A 47 J Kitchen equipment, coffee mills, spice mills, . . A 47 J 37 Baking, Roasting, Grilling, Frying A 47 J 37/06 Roasters, Grills, Sandwich grills A 47 J 37/08 Bread toasters A 47 J 37/0814 . . . with automatic ejection or timing means A 47 J 37/0821 . . . with mechanical clockwork timers European Patent Office 15

Linked Data: HTTP names as unique identifiers All business objects will get a an

Linked Data: HTTP names as unique identifiers All business objects will get a an HTTP name (URI) as globally unique identifier. Application identifier http: //data. epo. org/linked-data/id/application/EP/98925243 Publication identifier http: //data. epo. org/linked-data/publication/EP/1010425/A 1 In any web browser, each HTTP name will return some useful data in a standard format about that resource. It can also return relationships to other resources using their HTTP names. European Patent Office 16

Linked data can be seen as a huge network (“graph”) priority KR applic ation

Linked data can be seen as a huge network (“graph”) priority KR applic ation KR Publi cation European Patent Office 17

Example with major classes and relationships Market / Value European Patent Office 18

Example with major classes and relationships Market / Value European Patent Office 18

Linked data can be seen as a huge network (“graph”) Technology trends European Patent

Linked data can be seen as a huge network (“graph”) Technology trends European Patent Office 19

Linked data can be seen as a huge network (“graph”) Value European Patent Office

Linked data can be seen as a huge network (“graph”) Value European Patent Office 20

Linked data can be seen as a huge network (“graph”) Competitor watch, Inventor identification

Linked data can be seen as a huge network (“graph”) Competitor watch, Inventor identification European Patent Office 21

The product page epo. org/linked-data European Patent Office 22

The product page epo. org/linked-data European Patent Office 22

API – Interactive features Simple browser for data exploration § Nice presentation of resources

API – Interactive features Simple browser for data exploration § Nice presentation of resources § Click to change focus European Patent Office 23

API – Parameterized URIs Linked data API § Retrieve one resource or list of

API – Parameterized URIs Linked data API § Retrieve one resource or list of resources § Filter § Sort § Define return format § Custom views European Patent Office 24

SPARQL queries Powerful query language § for RDF graphs § for heterogeneous data sets

SPARQL queries Powerful query language § for RDF graphs § for heterogeneous data sets § to explore data § to explore structure (meta-data) § federated queries § Sol. R text index European Patent Office 25

Don't forget: it is pure data <http: //data. epo. org/linked-data/publication/EP/1676702/B 1/-> rdfs: label "EP

Don't forget: it is pure data <http: //data. epo. org/linked-data/publication/EP/1676702/B 1/-> rdfs: label "EP 1676702 B 1" ; patent: application <http: //data. epo. org/linked-data/id/application/EP/05027699> ; patent: publication. Authority <http: //data. epo. org/linked-data/id/st 3/EP> ; patent: publication. Date "2008 -11 -26"^^xsd: date ; patent: publication. Kind_B 1 rdfs: label "B 1"@en. <http: //data. epo. org/linked-data/id/application/EP/01945281> patent: application. Number "01945281". patent: publication. Kind_A 1 European Patent Office rdfs: label "A 1"@en. 26

Download § about 650 mio triples § about 60 GB (N-triple format) § Updated

Download § about 650 mio triples § about 60 GB (N-triple format) § Updated weekly European Patent Office 27

Benefits of linked data for data consumers Target group: Data scientists, web developer, .

Benefits of linked data for data consumers Target group: Data scientists, web developer, . . . § Very simple data format: “triples” § Re-use of established ontologies (classes, properties) § Infrastructure and standards already exist: The Web and various W 3 C recommendations Less "data friction" when combining different data sets European Patent Office 28

Patent information can add value to other data National statistics Company registers Geographical records

Patent information can add value to other data National statistics Company registers Geographical records Academic journals Patent data Dictionaries and encyclopaedias Trade mark data • Technical terms • Names (inventors, applicants) • Classifications • Date and numbers • Citations Court decisions Annual reports Technical magazines Telephone directories Classification data European Patent Office Economic data National patent data Library of Congress Image collections Standards Government subsidies 29

Linked Open Data cloud 2017 2008 2010 2011 2009 2014 Linking Open Data cloud

Linked Open Data cloud 2017 2008 2010 2011 2009 2014 Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http: //lod-cloud. net/ European Patent Office 30

Possible use cases § Demo web application: epolod. org Combined EPO LOD data set

Possible use cases § Demo web application: epolod. org Combined EPO LOD data set with DBpedia (LD version of Wikipedia) § Combining with other data sets in Linked Data format or other data formats • Environmental data (e. g. pollution) and “green patents” (CPC class “Y 02”) • CORDIS (EU projects) • Scientific literature (Springer Nature publishing house) § Analysis / visualisation • Effects of policy decisions • … European Patent Office 31

CPC Browser https: //worldwide. espacenet. com/patent/cpc-browser# European Patent Office 32

CPC Browser https: //worldwide. espacenet. com/patent/cpc-browser# European Patent Office 32

SPARQL query retrieving applications classified as Y 02 prefix cpc: <http: //data. epo. org/linked-data/def/cpc/>

SPARQL query retrieving applications classified as Y 02 prefix cpc: <http: //data. epo. org/linked-data/def/cpc/> prefix rdf: <http: //www. w 3. org/1999/02/22 -rdf-syntax-ns#> prefix rdfs: <http: //www. w 3. org/2000/01/rdf-schema#> prefix patent: <http: //data. epo. org/linked-data/def/patent/> SELECT DISTINCT * { ? application rdf: type patent: Application ; patent: classification. CPCAdditional/rdfs: label ? cpc ; # Y-classification is always assigned additionally patent: publication/patent: title. Of. Invention ? title. FILTER( STRSTARTS( ? cpc, "Y 02")) FILTER( LANG( ? title)= "en") } LIMIT 30 European Patent Office 33

Resources General Patent Information tour Catalog of PI products Discussion forums Helpdesk Questions: Martin

Resources General Patent Information tour Catalog of PI products Discussion forums Helpdesk Questions: Martin Kracker mkracker@epo. org http: //e-courses. epo. org/wbts/pi_tour/index. html epo. org/bulk-data, see “Overview data & tools” epo. org/forums epal@epo. org Linked open EP data Documentation epo. org/linked-data Webinar (11. 03. 2020) epo. org/learning-events to register Webinar recordings epo. org/pi-videos EP full-text data for text analytics Documentation epo. org/bulk-data European Patent Office 34

Thank you for your attention! Questions: mkracker@epo. org Martin Kracker European Patent Office Directorate

Thank you for your attention! Questions: mkracker@epo. org Martin Kracker European Patent Office Directorate Publication mkracker@epo. org European Patent Office 35