Discovering Spatial and Temporal Links among RDF Data

  • Slides: 42
Download presentation
Discovering Spatial and Temporal Links among RDF Data Panayiotis Smeros and Manolis Koubarakis WWW

Discovering Spatial and Temporal Links among RDF Data Panayiotis Smeros and Manolis Koubarakis WWW 2016 Workshop: Linked Data on the Web (LDOW 2016) April 12, 2016 - Montréal, Canada

Outline • Introduction • Background • Developed Methods • Implementation • Experimental Evaluation •

Outline • Introduction • Background • Developed Methods • Implementation • Experimental Evaluation • Conclusions 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 2

Spatial and Temporal Link Discovery Establish semantic relations (links) between entities Source Enrich the

Spatial and Temporal Link Discovery Establish semantic relations (links) between entities Source Enrich the information of datasets with Geospatial and Temporal characteristics 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 3

From Locations to Complex Geometries • Geonames, Open. Street. Map, etc. are dominated by

From Locations to Complex Geometries • Geonames, Open. Street. Map, etc. are dominated by location (point) information • Geo. SPARQL Standard • Datasets with rich geospatial and temporal information – Corine Land Cover (http: //datahub. io/dataset/corine-land-cover) – Urban Atlas (http: //datahub. io/dataset/urban-atlas) – Products from Satellite Images (http: //datahub. io/dataset/sentinel 2) • State-of-the-art works focus on distance based (similarity) relations More spatial and temporal relations can be discovered! 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 4

Link Discovery in Fire Monitoring (Example) Land Cover 12/04/2016 Municipalities Fire Discovering Spatial and

Link Discovery in Fire Monitoring (Example) Land Cover 12/04/2016 Municipalities Fire Discovering Spatial and Temporal Links among RDF Data 5

Link Discovery in Fire Monitoring (Example) Land Cover Municipalities Fire threatens 12/04/2016 Discovering Spatial

Link Discovery in Fire Monitoring (Example) Land Cover Municipalities Fire threatens 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 6

Link Discovery in Fire Monitoring (Example) Land Cover Municipalities Fire intersects 12/04/2016 Discovering Spatial

Link Discovery in Fire Monitoring (Example) Land Cover Municipalities Fire intersects 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 7

Heterogeneity: Geospatial Datasets _: 1 rdf: type geo: Geometry. _: 1 geo: has. Geometry

Heterogeneity: Geospatial Datasets _: 1 rdf: type geo: Geometry. _: 1 geo: has. Geometry "<http: //www. opengis. net/def/crs/EPSG/0/4326> POINT(10 20)"^^geo: wkt. Literal. Geo. SPAR QL _: 1 rdf: type strdf: Geometry. st. RDF _: 1 strdf: has. Geometry "<gml: Point crs. Name="EPSG: 2100"><gml: coordinates>10, 20 </gml: coordinates></gml: Point>"^^strdf: GML. _: 1 rdf: type wgs 84 Geo: Point. _: 1 wgs 84 Geo: lat “ 10“^^xsd: double. _: 1 wgs 84 Geo: long “ 20“^^xsd: double. 12/04/2016 Discovering Spatial and Temporal Links among RDF Data W 3 C GEO 8

Heterogeneity: Geospatial Datasets _: 1 rdf: type geo: Geometry. _: 1 geo: has. Geometry

Heterogeneity: Geospatial Datasets _: 1 rdf: type geo: Geometry. _: 1 geo: has. Geometry "<http: //www. opengis. net/def/crs/EPSG/0/4326> POINT(10 20)"^^geo: wkt. Literal. Geo. SPAR QL _: 1 rdf: type strdf: Geometry. st. RDF _: 1 strdf: has. Geometry "<gml: Point crs. Name="EPSG: 2100"><gml: coordinates>10, 20 </gml: coordinates></gml: Point>"^^strdf: GML. • Different Vocabularies 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 9

Heterogeneity: Geospatial Datasets _: 1 rdf: type geo: Geometry. _: 1 geo: has. Geometry

Heterogeneity: Geospatial Datasets _: 1 rdf: type geo: Geometry. _: 1 geo: has. Geometry "<http: //www. opengis. net/def/crs/EPSG/0/4326> POINT(10 20)"^^geo: wkt. Literal. Geo. SPAR QL _: 1 rdf: type strdf: Geometry. st. RDF _: 1 strdf: has. Geometry "<gml: Point crs. Name="EPSG: 2100"><gml: coordinates>10, 20 </gml: coordinates></gml: Point>"^^strdf: GML. • Different Vocabularies • Different Serializations of Geometries 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 10

Heterogeneity: Geospatial Datasets _: 1 rdf: type geo: Geometry. _: 1 geo: has. Geometry

Heterogeneity: Geospatial Datasets _: 1 rdf: type geo: Geometry. _: 1 geo: has. Geometry "<http: //www. opengis. net/def/crs/EPSG/0/4326> POINT(10 20)"^^geo: wkt. Literal. Geo. SPAR QL _: 1 rdf: type strdf: Geometry. st. RDF _: 1 strdf: has. Geometry "<gml: Point crs. Name="EPSG: 2100"><gml: coordinates>10, 20 </gml: coordinates></gml: Point>"^^strdf: GML. • Different Vocabularies • Different Serializations of Geometries • Geometries expressed in Different Coordinate Reference Systems (CRS) 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 11

Heterogeneity: Geospatial Datasets source 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 12

Heterogeneity: Geospatial Datasets source 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 12

Heterogeneity: Geospatial Datasets source • Different Sampling Values • Different Granularity • Different Rounding

Heterogeneity: Geospatial Datasets source • Different Sampling Values • Different Granularity • Different Rounding Effects 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 13

Heterogeneity: Temporal Datasets _: 1 ex: has. Birthday "1989 -0924 T 11: 05: 00+01:

Heterogeneity: Temporal Datasets _: 1 ex: has. Birthday "1989 -0924 T 11: 05: 00+01: 00"xsd: date. Time. RDF st. RDF _: 1 ex: has. Affiliation ex: Uo. A "[2007 -09 -01 T 00: 00+03: 00, 2015 -08 -31 T 00: 00+04: 00)"^^strdf: Period. 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 14

Heterogeneity: Temporal Datasets _: 1 ex: has. Birthday "1989 -0924 T 11: 05: 00+01:

Heterogeneity: Temporal Datasets _: 1 ex: has. Birthday "1989 -0924 T 11: 05: 00+01: 00"xsd: date. Time. RDF st. RDF _: 1 ex: has. Affiliation ex: Uo. A "[2007 -09 -01 T 00: 00+03: 00, 2015 -08 -31 T 00: 00+04: 00)"^^strdf: Period. • Different Vocabularies 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 15

Heterogeneity: Temporal Datasets _: 1 ex: has. Birthday "1989 -0924 T 11: 05: 00+01:

Heterogeneity: Temporal Datasets _: 1 ex: has. Birthday "1989 -0924 T 11: 05: 00+01: 00"xsd: date. Time. RDF st. RDF _: 1 ex: has. Affiliation ex: Uo. A "[2007 -09 -01 T 00: 00+03: 00, 2015 -08 -31 T 00: 00+04: 00)"^^strdf: Period. • Different Vocabularies • Different Time Zones 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 16

Heterogeneity: Temporal Datasets _: 1 ex: has. Birthday "1989 -0924 T 11: 05: 00+01:

Heterogeneity: Temporal Datasets _: 1 ex: has. Birthday "1989 -0924 T 11: 05: 00+01: 00"xsd: date. Time. RDF st. RDF _: 1 ex: has. Affiliation ex: Uo. A "[2007 -09 -01 T 00: 00+03: 00, 2015 -08 -31 T 00: 00+04: 00)"^^strdf: Period. • Different Vocabularies • Different Time Zones • Time Instants and Periods 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 17

Outline • Introduction • Background • Developed Methods • Implementation • Experimental Evaluation •

Outline • Introduction • Background • Developed Methods • Implementation • Experimental Evaluation • Conclusions 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 18

Link Discovery (Definition) • 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 19

Link Discovery (Definition) • 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 19

State-of-the-art Spatial Relations • Dimensionally Extended 9 -Intersection Model • Egenhofer’s Model • OGC

State-of-the-art Spatial Relations • Dimensionally Extended 9 -Intersection Model • Egenhofer’s Model • OGC Simple Features Model Intersects, Overlaps, Equals, Touches, Disjoint, Contains, Crosses, Covered. By and Within • Region Connection Calculus – e. g. , RCC 8 • Cardinal Direction Calculus 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 20

State-of-the-art Temporal Relations • Allen’s Interval Calculus 12/04/2016 Discovering Spatial and Temporal Links among

State-of-the-art Temporal Relations • Allen’s Interval Calculus 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 21

Outline • Introduction • Background • Developed Methods • Implementation • Experimental Evaluation •

Outline • Introduction • Background • Developed Methods • Implementation • Experimental Evaluation • Conclusions 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 22

Introduced Relations • 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 23

Introduced Relations • 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 23

Introduced Transformations (1/2) • Vocabulary Transformation – converts the vocabulary of geometry literals into

Introduced Transformations (1/2) • Vocabulary Transformation – converts the vocabulary of geometry literals into Geo. SPARQL • Serialization Transformation – converts the serialization of geometries into WKT • CRS Transformation – converts the CRS of geometries into the World Geodetic System (WGS 84) • Validation Transformation – converts not valid geometries (e. g. , self-intersecting polygons) to valid ones • Simplification Transformation – simplifies geometries according to a given distance tolerance 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 24

Introduced Transformations (2/2) • Envelope Transformation – computes the envelope (minimum bounding rectangle) of

Introduced Transformations (2/2) • Envelope Transformation – computes the envelope (minimum bounding rectangle) of geometries • Area Transformation – computes the area of geometries in square metres • Points-To-Centroid Transformation – computes the centroid of a cluster of points • Time-Zone Transformation – converts the time zone of time elements to Coordinated Universal Time (UTC) • Period Transformation – converts time instants to periods with the same starting and ending point 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 25

Techniques for Checking the Relations • Cartesian Product Technique (Naive) – Exhaustive checks between

Techniques for Checking the Relations • Cartesian Product Technique (Naive) – Exhaustive checks between the pairs of the entities of datasets – Complete – Complexity: O(|S||T|) checks • Blocking Technique – Decreases the number of checks – Divides the entities into blocks – Complexity: O(|S||T|) checks (worst case), O(|L|) checks (best case) * |S|, |T|: number of entities in datasets S and T; |L|: number of links between datasets S and T 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 26

Blocking Technique (algorithm) • 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 27

Blocking Technique (algorithm) • 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 27

Blocking Technique (algorithm) b 1 b 2 e 1 e 2 b 3 b

Blocking Technique (algorithm) b 1 b 2 e 1 e 2 b 3 b 4 e 1: b 1, b 2 e 2: b 2 e 2 b 1 12/04/2016 e 1: b 1, b 2 e 2: b 2, b 4 b 2 Discovering Spatial and Temporal Links among RDF Data 28

Blocking Technique (algorithm) • 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 29

Blocking Technique (algorithm) • 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 29

Blocking Technique (accuracy) • 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 30

Blocking Technique (accuracy) • 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 30

Outline • Introduction • Background • Developed Methods • Implementation • Experimental Evaluation •

Outline • Introduction • Background • Developed Methods • Implementation • Experimental Evaluation • Conclusions 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 31

Extensions to the Silk Framework 12/04/2016 Discovering Spatial and Temporal Links among RDF Data

Extensions to the Silk Framework 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 32

Extensions to the Silk Framework • Implemented as Plugins • Transparent to all the

Extensions to the Silk Framework • Implemented as Plugins • Transparent to all the applications of Silk (Single Machine, Map. Reduce and Workbench) • Included in the default Silk distribution (from release 2. 6. 1 and above) • https: //github. com/silk-framework/silk 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 33

Outline • Introduction • Background • Developed Methods • Implementation • Experimental Evaluation •

Outline • Introduction • Background • Developed Methods • Implementation • Experimental Evaluation • Conclusions 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 34

Real-world Scenario (Fire Monitoring) • Which fires (hotspots) threaten forests? • Which municipalities are

Real-world Scenario (Fire Monitoring) • Which fires (hotspots) threaten forests? • Which municipalities are threatened by fires? Geometries Time Elements Dataset #Entities Type #Points Type #Instants Municipalities from Greek Administrative Geography (GAG) 325 Polygons 979, 929 Periods 650 Forests from CORINE Land Cover of Greece (CLCG) 4, 868 Polygons 8, 004, 058 Periods 9, 736 Hotspots of Greece (HG) 37, 048 Polygons 148, 192 Instants 37, 048 • Using Silk: Discover the relation intersects between HG -GAG and HG-CLCG 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 35

Real-world Scenario (Fire Monitoring) Land Cover (CLCG) Municipalities (GAG) Fire (HG) intersects 12/04/2016 Discovering

Real-world Scenario (Fire Monitoring) Land Cover (CLCG) Municipalities (GAG) Fire (HG) intersects 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 36

Environment of Experiments • Single machine environment – 2 Intel Xeon E 5620 processors,

Environment of Experiments • Single machine environment – 2 Intel Xeon E 5620 processors, 12 MB L 3 cache, 2. 4 GHz, 32 GB RAM, RAID-5. 4 disks, 32 MB cache, 7200 rpm • Distributed environment – cluster provided by the European Public Cloud Provider Interoute (1 Master Node + 20 Slave Nodes: 2 CPUs, 4 GB RAM, 10 GB disk) • More details: http: //silk. di. uoa. gr 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 37

Experiment 1: Adjusting the Spatial Blocking Factor (sbf) HG-CLCG HG-GAG Links 8000 200000 7000

Experiment 1: Adjusting the Spatial Blocking Factor (sbf) HG-CLCG HG-GAG Links 8000 200000 7000 150000 5000 4000 100000 Links Time (seconds) 6000 3000 2000 50000 1000 0 0 0, 5 12/04/2016 1 5 10 Spatial Blocking Factor 20 50 Discovering Spatial and Temporal Links among RDF Data 100 38

Experiment 2: Adjusting the number of Entities per Dataset Silk (Baseline) Silk (Best sbf)

Experiment 2: Adjusting the number of Entities per Dataset Silk (Baseline) Silk (Best sbf) Strabon Silk (MR) Links 100000 10000 1000 100 10 10 1 1 Links Time (seconds) 1000000 0, 1 10 12/04/2016 1000* Entities per Dataset all Discovering Spatial and Temporal Links among RDF Data 39

Outline • Introduction • Background • Developed Methods • Implementation • Experimental Evaluation •

Outline • Introduction • Background • Developed Methods • Implementation • Experimental Evaluation • Conclusions 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 40

Conclusions & Future Work • 12/04/2016 Discovering Spatial and Temporal Links among RDF Data

Conclusions & Future Work • 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 41

Thanks for your attention! Questions? 12/04/2016 Discovering Spatial and Temporal Links among RDF Data

Thanks for your attention! Questions? 12/04/2016 Discovering Spatial and Temporal Links among RDF Data 42