1 Gegevensbanken Outlook The Semantic Web XML RDF
1 Gegevensbanken Outlook – The Semantic Web, XML, RDF, Linked (Open) Data, No. SQL Bettina Berendt Katholieke Universiteit Leuven, Department of Computer Science http: //www. cs. kuleuven. be/~berendt/teaching/ Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 1
2 Waar zijn we? Les # 1 2 2 3 4, 5 6 7 8 10 11 12 13 14 15 -17 18 9 wie ED ED ED KV KV KV BB BB wat intro, ER EER, (E)ER naar relationeel schema relationeel model Relationele algebra & relationeel calculus Hoe worden SQL Programma's verbinden met gegevensbanken gegevens Functionele afhankelijkheden & normalisatie machtig? Analyse PHP & combinatie Beveiliging van gegevensbanken Geheugen en bestandsorganisatie Externe hashing Indexstructuren Queryverwerking Transactieverwerking en concurrentiecontrole Data mining en Information Retrieval Nieuwe thema‘s / XML (en meer over het Web als GB), No. SQL vooruitblik Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 2
3 Een motivatie V: Algemeen over het internet: valt dit te beschouwen als één grote ongeordende chaos van websites, of zijn het meer allemaal aparte databases (bijvoorbeeld met alle webpagina's uit België of alle webpagina's van een internetprovider als Telenet) die samen het internet vormen (en dus toelaten aan een grote, algemene database om die zijn taken te verdelen) ? Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 3
4 Bijvoorbeeld: SIG. MA Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 4
5 Gegevensbanken Outlook – The Semantic Web, XML, RDF, Linked (Open) Data, No. SQL Bettina Berendt Katholieke Universiteit Leuven, Department of Computer Science http: //www. cs. kuleuven. be/~berendt/teaching/ Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 5
6 The original vision The entertainment system was belting out the Beatles' "We Can Work It Out" when the phone rang. When Pete answered, his phone turned the sound down by sending a message to all the other local devices that had a volume control. His sister, Lucy, was on the line from the doctor's office: "Mom needs to see a specialist and then has to have a series of physical therapy sessions. Biweekly or something. I'm going to have my agent set up the appointments. " Pete immediately agreed to share the chauffeuring. At the doctor's office, Lucy instructed her Semantic Web agent through her handheld Web browser. The agent promptly retrieved information about Mom's prescribed treatment from the doctor's agent, looked up several lists of providers, and checked for the ones in-plan for Mom's insurance within a 20 -mile radius of her home and with a rating of excellent or very good on trusted rating services. It then began trying to find a match between available appointment times (supplied by the agents of individual providers through their Web sites) and Pete's and Lucy's busy schedules. (The emphasized keywords indicate terms whose semantics, or meaning, were defined for the agent through the Semantic Web. ) Tim Berners-Lee, James Hendler and Ora Lassila (2001). The Semantic Web. A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities. Scientific American. http: //www. sciam. com/article. cfm? article. ID=00048144 -10 D 2 -1 C 7084 A 9809 EC 588 EF 21 Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 6
7 The Semantic Web layer cake (T. Berners-Lee talk at XML 2000) URI = Uniform Resource Identifier, bv: • URL (U. R. Locator) : waar te vinden (~ adres van een persoon) • URN (U. R. Name) : identiteit (~ naam van een persoon, ISBN van een boek) OWL: W 3 C Rec. 2004 OWL 2: W 3 C Rec. 2009 RDF: W 3 C Rec. 2004 Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 7
8 Gegevensbanken Outlook – The Semantic Web, XML, RDF, Linked (Open) Data, No. SQL Bettina Berendt Katholieke Universiteit Leuven, Department of Computer Science http: //www. cs. kuleuven. be/~berendt/teaching/ Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 8
9 You have data … How should you structure it? Here's some data about an aircraft: medium-altitude, long-endurance unmanned aerial vehicle 14. 7 meters 512 kilograms 70 knots 400 nautical miles Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 9
The XML approach is to "wrap" each data item in start/end tags 10 and define this data schema, in a DTD or XML Schema <Aircraft> <wingspan>14. 8 meters</wingspan> <weight>512 kilograms</weight> <cruise-speed>70 knots</cruise-speed> <range>400 nautical miles</range> <description> medium-altitude, long-endurance unmanned aerial vehicle </description> </Aircraft> RQ-1. xml Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 10
11 XML Terminology <wingspan>14. 8 meters</wingspan> Start tag End tag Data Element Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 11
12 Why use XML? n It is a universally accepted standard way of structuring data (syntax). n It is a W 3 C recommendation (W 3 C = World Wide Web Consortium) n The marketplace supports it with a lot of free/inexpensive tools. n The alternative to using XML is to define your own proprietary data syntax, and then build your own proprietary tools to support the proprietary syntax (Not a very appealing idea). Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 12
13 But: What is this XML snippet talking about, i. e. , what are the semantics? What is a Predator? <Predator> … </Predator> Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 13
14 Predator - which one? n Predator: a medium-altitude, long-endurance unmanned aerial vehicle system. n Predator : one that victimizes, plunders, or destroys, especially for one's own gain. n Predator : an organism that lives by preying on other organisms. n Predator: a company which specializes in camouflage attire. n Predator: a video game. n Predator: software for machine networking. n Predator: a chain of paintball stores. Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 14
15 A little more flexibility through namespaces <? xml version="1. 0" encoding=„UTF-8"? > <my. Things xmlns: h=http: //www. my. Schemas. org/TR/aircraft/ xmlns: f="http: //www. your. Schemas. com/animals"> <h: Predator> <h: name>OL 231 -b</hname> <h: wingspan>14. 8 metres</h: wingspan> </h: Predator> <f: name>Panthera</f: name> <f: eats>antelopes</f: eats> </f: Predator> </my. Things> Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 15
16 Querying XML Verschillende querytalen, bv. XPath, XQuery Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 16
17 Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 17
18 Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 18
19 Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 19
20 Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 20
21 Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 21
22 Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 22
23 Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 23
24 Problems of XML 1. What does nesting mean? 2. What do syntactical variations mean? 3. What do linguistic variations mean? 4. How can we extend our knowledge? Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 24
25 1. What does nesting mean? Schema 1 allows for expressions like: <Person> <name>Peter Parker</name> . . . </Person> name being an XML-element of Person means: the person HAS-A. . . Schema 2 allows for expressions like: <Person> <type>Comic-book hero</type> . . . </Person> type being an XML-element of Person means: the person IS-A. . . Problems: a) we don‘t know what nesting means, b) even if we do know, we can‘t express this in a machine-readable way (at most build it into an application that uses these XML statements, but that would bury meaning in procedures!) Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 25
26 2. What do syntactical variations mean? Schema 1 allows for expressions like: <Person> <name>Peter Parker</name> <birthday>1932 -04 -12</birthday> . . . </Person> Schema 2 allows for expressions like: <Person name=“Peter Parker“> <type>Comic-book hero</type> . . . </Person> Problems: a) what does it mean for some information to be an XMLelement vs. an XML-attribute? b) even if we do know that they are the same, we can‘t express this in a machine-readable way, for example to combine the information from the two sources (same remark about applications as in 1. ) Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 26
27 3. What do linguistic variations mean? Schema 1 allows for expressions like: <Person> <name>Peter Parker</name>. . . </Person> Schema 2 allows for expressions like: <Person> <naam>Peter Parker</naam>. . . </Person> Problems: a) we do not know whether elements from different data sources that differ by, e. g. natural, language, are the same or not b) even if we do know that they are the same, we can‘t express this in a machine-readable way, for example to combine the information from the two sources (same remark about applications as in 1. ) Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 27
28 4. How can we extend our knowledge? Schema 1 allows for expressions like: <Web. Resource> <type>Picture</type> <has. URL>http: //www. example. org/Pictures/my. Pic. png</has. URL> <is. About>Peter Parker</is. About> . . . </Web. Resource> Schema 2 allows for expressions like: <Web. Resource> <has. URL>http: //www. example. org/Pictures/my. Pic. png</has. URL> <has. Licence>Creative. Commons</has. Licence> . . . </Web. Resource> Problems: a) we cannot refine our schema information by that provided by another source b) even if we can be sure about principal linkability (here: via the URL), we can‘t express this in a machine-readable way, for example to combine the information from the two sources (same remark about applications as in 1. ) Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 28
Summary: XML not well-suited for conceptual modelling and therefore not suited for truly semantic markup 29 XML makes no commitment on: Domain-specific ontological vocabulary Ontological modeling primitives Requires pre-arranged agreement on & Only feasible for closed collaboration n agents in a small & stable community n pages on a small & stable intranet Not suited for sharing Web-resources Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 29
30 Solution approach of the „higher levels“ of the Semantic Web 1. Break down information into atomic statements: subjectpredicate-object 2. Define (in a formal-semantics way) what each component of each statement means 3. a. Give it a URI (uniform resource identifier) to enable uniform meaning specification b. Define languages to say more about (specify) the meaning (by relating it to other units of meaning – cf. a dictionary in which each word is explained by other words) The languages mentioned in 2. b. each add more expressivity: 1. RDF: subject-predicate-object statements (in RDF terminology: a resource has a property with a certain value. 2. RDFS: simple ontology building blocks: class, subclass-of relation, use RDF‘s type to denote that (e. g. ) an individual is a instance of a class (= make it possible to define a schema and its instances), . . . 3. OWL: more advanced ontology building blocks: a class (= concept) is disjoint with another one, is the same as another one; a property is functional, symmetric, the inverse of another one; . . . Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 30
31 Semantic Web vs. Database Advantages of using RDF/RDFS/OWL to define an Ontology: n Extensible: much easier to add new properties. Contrast with a database - adding a new column may break a lot of applications n Portable: much easier to move an OWL document than to move a database. Advantages of using a Database to define an Ontology: n Mature: the database technology has been around a long time and is very mature. Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 31
32 Gegevensbanken Outlook – The Semantic Web, XML, RDF, Linked (Open) Data, No. SQL Bettina Berendt Katholieke Universiteit Leuven, Department of Computer Science http: //www. cs. kuleuven. be/~berendt/teaching/ Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 32
33 RDF model RDF “statements” consist of resources (= nodes) which have properties which have values (= nodes, strings) resource property = subject = predicate = object value “http: //www. w 3. org/TR/REC-rdf-syntax/ has the author Ora Lassila” http: //www. w 3. org/TR/REC-rdf-syntax/ author “Ora Lassila” Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 33
34 RDF Model Example “W 3 C” dc: Publisher http: //www. w 3. org/TR/REC-rdf-syntax/ dc: Creator dc: Date “Ora Lassila” “ 1999 -02 -22” Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 34
35 Complex values So far, values of properties have been strings A graph node (corresponding to a resource) also can be the value of a property n arbitrarily complex tree and graph structures are possible n syntactically, values can be embedded (i. e. lexically in-line) or referenced (linked) Example: http: //www. w 3. org/TR/REC-rdf-syntax/ dc: Creator p: Name “Ora Lassila” p: EMail “ora. lassila@nokia. com” Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 35
36 RDF in XML <? xml version="1. 0"? > <rdf: RDF xmlns: rdf="http: //www. w 3. org/1999/02/22 -rdf-syntax-ns#" xmlns: dc="http: //purl. org/dc/elements/1. 1/" xmlns: p="http: //example. org/persons/1. 0/"> <rdf: Description rdf: about="http: //www. w 3. org/TR/REC-rdf-syntax"> <dc: creator> <rdf: node. ID="abc“> </dc: creator> </rdf: Description> <rdf: Description rdf: node. ID="abc"> <p: Name>“Ora Lassila”</p: Name> <p: Email>”ora. lassila@nokia. com”</p: Email> <p: Has. Homepage><rdf: resource=“http: //www. nokia. com”></p: …> <p: Works. In> <rdf: ID=“xyz"> </p: Works. In> </rdf: Description> </rdf: RDF> Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 36
37 RDF Schema • Defines small vocabulary for RDF: • Class, sub. Class. Of, type • Property, sub. Property. Of • domain, range • Vocabulary can be used to define other vocabularies for your application Person domain sub. Class. Of Student domain has. Super. Visor type Frank has. Super. Visor range Researcher type Jeen Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 37
38 RDF Schema syntax in XML <rdf: Description ID="Motor. Vehicle"> <rdf: type resource="http: //www. w 3. org/. . . #Class"/> <rdfs: sub. Class. Of rdf: resource="http: //www. w 3. org/. . . #Resource"/> </rdf: Description> <rdf: Description ID="Truck"> <rdf: type resource="http: //www. w 3. org/. . . #Class"/> <rdfs: sub. Class. Of rdf: resource="#Motor. Vehicle"/> </rdf: Description> <rdf: Description ID="registered. To"> <rdf: type resource="http: //www. w 3. org/. . . #Property"/> <rdfs: domain rdf: resource="#Motor. Vehicle"/> <rdfs: range rdf: resource="#Person"/> </rdf: Description> <rdf: Description ID=”owned. By"> <rdf: type resource="http: //www. w 3. org/. . . #Property"/> <rdfs: sub. Property. Of rdf: resource="#registered. To"/> </rdf: Description> Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 38
39 Gegevensbanken Outlook – The Semantic Web, XML, RDF, Linked (Open) Data, No. SQL Bettina Berendt Katholieke Universiteit Leuven, Department of Computer Science http: //www. cs. kuleuven. be/~berendt/teaching/ Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 39
Wat is dit? Kunnen we hiermee iets doen? Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 40
41 Gecombineerd door SIG. MA Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 41
42 En hoe werkt dit? Linked Open Data: n “A way of making the Semantic Web happen“ (it is hoped) n Key concept: leverage the existence of structured data and combine it with the languages and infrastructures of the Web and the Semantic Web End 2011: 32 billion triples Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 42
43 Data items are identified with HTTP URIs pd: cygri rdf: type foaf: name foaf: Person Richard Cyganiak foaf: based_near dbpedia: Berlin pd: cygri = http: //richard. cyganiak. de/foaf. rdf#cygri dbpedia: Berlin = http: //dbpedia. org/resource/Berlin From http: //www. ai. sri. com/~nysmith/slides/aic-seminars/090724 -bizer. ppt Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 43
44 Resolving URIs over the Web pd: cygri rdf: type foaf: name foaf: Person Richard Cyganiak foaf: based_near dp: population 3. 405. 259 dbpedia: Berlin skos: subject dp: Cities_in_Germany From http: //www. ai. sri. com/~nysmith/slides/aic-seminars/090724 -bizer. ppt Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 44
45 Dereferencing URIs over the Web pd: cygri rdf: type foaf: name foaf: Person Richard Cyganiak foaf: based_near dp: population 3. 405. 259 dbpedia: Berlin skos: subject dbpedia: Hamburg dbpedia: Muenchen skos: subject dp: Cities_in_Germany skos: subject From http: //www. ai. sri. com/~nysmith/slides/aic-seminars/090724 -bizer. ppt Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 45
46 What is LOD? n “A way of making the Semantic Web happen“ (it is hoped) n Key concept: leverage the existence of structured data and combine it with the languages and infrastructures of the Web and the Semantic Web n Tim Berners-Lee: four principles of Linked Data (http: //www. w 3. org/Design. Issues/Linked. Data) l Use URIs to identify things. l Use HTTP URIs so that these things can be referred to and looked up ("dereferenced") by people and user agents. l Provide useful information about the thing when its URI is dereferenced, using standard formats such as RDF/XML. l Include links to other, related URIs in the exposed data to improve discovery of other related information on the Web. Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 46
47 SPARQL: The standard query language for LOD "What are all the country capitals in Africa? " PREFIX abc: <http: //example. com/example. Ontology#> SELECT ? capital ? country WHERE { ? x abc: cityname ? capital ; abc: is. Capital. Of ? y abc: countryname ? country ; abc: is. In. Continent abc: Africa. } Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 47
48 Connecting to a database … ah … triple store Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 48
49 The Linked Open Data Cloud Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 49
50 Gegevensbanken Outlook – The Semantic Web, XML, RDF, Linked (Open) Data, No. SQL Bettina Berendt Katholieke Universiteit Leuven, Department of Computer Science http: //www. cs. kuleuven. be/~berendt/teaching/ Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 50
51 History of the World, Part 1 Relational Databases – mainstay of business Web-based applications caused spikes n Especially true for public-facing e-Commerce sites Developers begin to front RDBMS with memcache or integrate other caching mechanisms within the application (ie. Ehcache) From: Perry Hoekstra. No. SQL. www. intertech. com/resource/usergroup/No. SQL. ppt Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 51
52 SELECT * FROM members WHERE name LIKE „%kirsten%“ ? ? Get write lock Update friends table Release write lock ? ? Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 52
53 Herinnering: Taak voor de volgende les Zijn alle ACID eigenschappen even belangrijk voor de volgende types van toepassingen? Wat kann je doen als voor je toepassing snelheid heel belangrijk is? Online banking Een online shop (e. g. boeken/media) Een sociale netwerk site Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 53
54 Scaling Up Issues with scaling up when the dataset is just too big RDBMS were not designed to be distributed Began to look at multi-node database solutions Known as ‘scaling out’ or ‘horizontal scaling’ Different approaches include: n Master-slave n Sharding All approaches come with their own respective problems From: Perry Hoekstra. No. SQL. www. intertech. com/resource/usergroup/No. SQL. ppt Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 54
55 What is No. SQL? Stands for Not Only SQL Class of non-relational data storage systems Usually do not require a fixed table schema nor do they use the concept of joins All No. SQL offerings relax one or more of the ACID properties (will talk about the CAP theorem) No. SQL best gebruikt in grote gedistribueerde gegevensbanken! From: Perry Hoekstra. No. SQL. www. intertech. com/resource/usergroup/No. SQL. ppt Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 55
56 Why No. SQL? For data storage, an RDBMS cannot be the be-all/end-all Just as there are different programming languages, need to have other data storage tools in the toolbox A No. SQL solution is more acceptable to a client now than even a year ago n Think about proposing a Ruby/Rails or Groovy/Grails solution now versus a couple of years ago From: Perry Hoekstra. No. SQL. www. intertech. com/resource/usergroup/No. SQL. ppt Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 56
57 Dynamo and Big. Table Three major papers were the seeds of the No. SQL movement n Big. Table (Google) n Dynamo (Amazon) n l Gossip protocol (discovery and error detection) l Distributed key-value data store l Eventual consistency CAP Theorem (discuss in a sec. . ) From: Perry Hoekstra. No. SQL. www. intertech. com/resource/usergroup/No. SQL. ppt Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 57
58 CAP Theorem Three properties of a system: consistency, availability and partitions You can have at most two of these three properties for any shared-data system To scale out, you have to partition. That leaves either consistency or availability to choose from n In almost all cases, you would choose availability over consistency Note that this is a slightly different notion of consistency than the one we are used to from transaction systems (ACID)! n http: //www. allthingsdistributed. com/2008/12/eventually_consistent. html From: Perry Hoekstra. No. SQL. www. intertech. com/resource/usergroup/No. SQL. ppt Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 58
59 Availability Traditionally, thought of as the server/process available five 9’s (99. 999 %). However, for large node system, at almost any point in time there’s a good chance that a node is either down or there is a network disruption among the nodes. n Want a system that is resilient in the face of network disruption From: Perry Hoekstra. No. SQL. www. intertech. com/resource/usergroup/No. SQL. ppt Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 59
60 Consistency Model A consistency model determines rules for visibility and apparent order of updates. For example: n Row X is replicated on nodes M and N n Client A writes row X to node N n Some period of time t elapses. n Client B reads row X from node M n Does client B see the write from client A? n Consistency is a continuum with tradeoffs n For No. SQL, the answer would be: maybe n CAP Theorem states: Strict Consistency can't be achieved at the same time as availability and partition-tolerance. From: Perry Hoekstra. No. SQL. www. intertech. com/resource/usergroup/No. SQL. ppt Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 60
61 Eventual Consistency When no updates occur for a long period of time, eventually all updates will propagate through the system and all the nodes will be consistent For a given accepted update and a given node, eventually either the update reaches the node or the node is removed from service Known as BASE (Basically Available, Soft state, Eventual consistency), as opposed to ACID From: Perry Hoekstra. No. SQL. www. intertech. com/resource/usergroup/No. SQL. ppt Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 61
62 What kinds of No. SQL solutions fall into two major areas: n n Key/Value or ‘the big hash table’. l Amazon S 3 (Dynamo) l Voldemort l Scalaris Schema-less which comes in multiple flavors, column-based, document-based or graph-based. l Cassandra (column-based) l Couch. DB (document-based) l Neo 4 J (graph-based) l HBase (column-based) From: Perry Hoekstra. No. SQL. www. intertech. com/resource/usergroup/No. SQL. ppt Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 62
63 Dus, kunnen jullie nu beantwoorden: p 26 tabel 2. 4: Relationele databases komen slecht uit de vergelijking, waarom worden deze dan zo veel gebruikt? Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 63
64 Gegevensbanken Outlook – The Semantic Web, XML, RDF, Linked (Open) Data, No. SQL Bettina Berendt Katholieke Universiteit Leuven, Department of Computer Science http: //www. cs. kuleuven. be/~berendt/teaching/ Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 64
65 Data mining/information retrieval and Linked Data? Crowdsourcing: Unstructured / semi-structured information Structured data DM and IR: Unstructured / semi-structured information Structured data … and vice versa: LOD as a data source for DM ! Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 65
66 No. SQL and Linked Data ? „RDF database systems are the only standardized No. SQL solutions available at the moment, being built on a simple, uniform data model and a powerful, declarative query language. ” http: //blog. datagraph. org/2010/04/rdf-nosql-diff More ideas: http: //webofdata. wordpress. com/2011/05/02/nosql-linked-dataprocessing/ Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 66
67 No. SQL and Data Mining / Information Retrieval ? Indeed! Since scalability is a huge issue! More in Advanced Databases and Text-Based Information Retrieval, where you‘ll work with such systems (and, if you want, use LOD …) Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 67
68 Berendt: Gegevensbanken, 2 nd semester 2011/2012, http: //www. cs. kuleuven. be/~berendt/teaching/ 68
- Slides: 68