XQuery and Hierarchical Naming Zachary G Ives University

  • Slides: 28
Download presentation
XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555

XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008

Today § Reminder: Homework 1 due 2/12 @ 11: 59 PM § XQuery and

Today § Reminder: Homework 1 due 2/12 @ 11: 59 PM § XQuery and joins § Addressing vs. naming § Hierarchical names 2

XQuery’s Basic Form § The model: bind nodes (or node sets) to variables; operate

XQuery’s Basic Form § The model: bind nodes (or node sets) to variables; operate over each legal combination of bindings; produce a set of nodes § “FLWOR” statement pattern: for {iterators that bind variables} let {collections} where {conditions} order by {order-conditions} return {output constructor} 3

Example XML Data Root dblp ? xml mastersthesis mdate 2002… inproceedings university key school

Example XML Data Root dblp ? xml mastersthesis mdate 2002… inproceedings university key school mdate author title year 1992 ms/Brown 92 key wisc USA author title crossref year ee 2002. . PRPL… Kurt Brown country key name On… conf/sigm. . / 1997 sigmod-97 www… wisc Wisconsin Paul R. 4

XQuery and Joins for $i in doc (“dblp. xml”)/dblp/inproceedings, $r in $i/crossref/text(), $c in

XQuery and Joins for $i in doc (“dblp. xml”)/dblp/inproceedings, $r in $i/crossref/text(), $c in doc (“dblp. xml”)/dblp/conf, $n in $c/@name where $c = $r return <result>{ $i, $c }</result> 5

Some Uses for Join in XML § Translation between values § SSN Penn. ID

Some Uses for Join in XML § Translation between values § SSN Penn. ID § Joining or combining information § Amazon invoice info + UPS tracking info § Restructuring information § <author><book>…</book> <book>. . </book></author> <book><author>…</author></book> § Here, we separate authors from books, then join them back in “upside-down” fashion 6

Changing Nesting of XML Content Re-nesting XML trees is a common operation Simply nest

Changing Nesting of XML Content Re-nesting XML trees is a common operation Simply nest the query blocks and correlate them – similar to join for $u in doc(“dblp. xml”)/dblp/university, $n = $u/name/text(), $k = $u/@key where $u/country = “USA” return <ms-theses-92 -by-univ> { $n } { for $mt in $u/. . /mastersthesis, $inst in $mt/school/text() where $mt/year/text() = “ 1992” and ________ return $mt/title } </ms-theses-92 -by-univ> 7

Collections & Aggregation in XQuery § Given a collection, we can compute an average,

Collections & Aggregation in XQuery § Given a collection, we can compute an average, count, etc. of its members: <article-authors> { for $paper in doc(“dblp. xml”)/dblp/inproceedings a collection let $pauth : = $paper/author return <paper> { $paper/title } <count> { fn: count($pauth) } </count> </paper> } </article-authors> 8

Sorting in XQuery § We can order the sequence of “result tuples” output by

Sorting in XQuery § We can order the sequence of “result tuples” output by the return clause: for $x in doc(“dblp. xml”)/proceedings order by $x/title/text() return $x 9

Querying & Defining Tags § Can get a node’s name by querying node-name(): for

Querying & Defining Tags § Can get a node’s name by querying node-name(): for $x in document(“dblp. xml”)/dblp/* return node-name($x) § Can construct elements and attributes using computed names: for $x in document(“dblp. xml”)/dblp/*, $year in $x/year, $title in $x/title/text(), element { node-name($x) } { attribute {“year-” + $year} { $title } } 10

XQuery Summary § Very flexible and powerful language for XML § Focus is on

XQuery Summary § Very flexible and powerful language for XML § Focus is on database-style operations like joins § Performs tasks that can’t be done with XPath or XSLT and that are tedious to program in Java : § Integrating information from multiple sources § Joins, based on correspondences of values § Computing count, average, etc. § Today, XQuery is available: § In RDBMSs (SQL Server, Oracle, DB 2) and XML DBMS systems (Mark. Logic) § As the basis of research prototypes for “XQuery full text” § As the basis of “XQuery. P” – a Web Services/AJAX programming language based on XQuery but with programming language features http: //2006. xmlconference. org/programme/presentations/38. html § We will discuss data integration and middleware later in the course 11

Hierarchical Naming Schemes Thus far, we’ve seen XPath as a hierarchical naming scheme §

Hierarchical Naming Schemes Thus far, we’ve seen XPath as a hierarchical naming scheme § “Content-based naming”: describe the structure and values of a tree structure § Assumption: XML tree resides in (or is being sent to) one place But hierarchy is often used for naming and location 12

How Do We Find Things on the Internet? Generally, using one of three means:

How Do We Find Things on the Internet? Generally, using one of three means: § Addresses or locations: specify where something is, assuming that we understand how to navigate Just like a physical address, we may still need a map! In the Internet, addresses are typically IP addresses – the routers know the map § Names: are mapped into addresses via lookup services Best-known example on the Internet: DNS name Cell phone numbers, email addresses, etc. are becoming names § Content-based addressing/naming The actual data value is somehow used to find its location The basis of publish-subscribe systems and peer-to-peer architectures 13

The Simplest Way of Going from Names or Content Locations § Directory-based lookup protocols

The Simplest Way of Going from Names or Content Locations § Directory-based lookup protocols are very common § Examples: § Napster 1. 0 – peer-to-peer storage with central directory § Inverted index – used to look up keywords in information retrieval § DNS – distributed hierarchical directory § LDAP – hierarchical Directory Information Tree 14

Napster 1. 0, ca 2002 § Hybrid of peer-to-peer storage with central directory showing

Napster 1. 0, ca 2002 § Hybrid of peer-to-peer storage with central directory showing what’s currently available § What are the trade-offs implicit in this model? Why did it fail? Peer 1 Directory Napster. com jjackson-lame bspears-oops Peer 3 Peer 2 jjackson-lame. mp 3 bspears-oops. mp 3 jjackson-lame. mp 3 15

Other Services with Similar Directory + Peer Architectures § Folder. Sync – now owned

Other Services with Similar Directory + Peer Architectures § Folder. Sync – now owned by Microsoft § Google Desktop Search with multiple machines § Bit. Torrent trackers are quite similar (we’ll discuss Bit. Torrent more later) 16

Inverted Indices § A “forward index”: documents to words § The “inverted index”: words

Inverted Indices § A “forward index”: documents to words § The “inverted index”: words to word-occurrences § The basis of most information retrieval engines, Google, etc. § Can handle positional predicates § … But how can we reconstruct previews ? 17

Naming People and Devices: LDAP § Lightweight Directory Access Protocol § Hierarchical naming system

Naming People and Devices: LDAP § Lightweight Directory Access Protocol § Hierarchical naming system that can be partitioned and replicated 18

LDAP’s Schema LDAP information has an XML-like schema: § A unique name in LDAP

LDAP’s Schema LDAP information has an XML-like schema: § A unique name in LDAP is called a Distinguished Name, “dn” and consists of a sequence of attributes representing a hierarchy, from most-specific to least-specific (as in DNS names): o = organization; dc = domain component ou = organizational unit uid = user ID cn = common name c = country; st = state; l = locality Can also have object. Class – the type of entity 19

LDAP Hierarchy Brad Marshall LDAP Tutorial, quark. humbug. au/publications/ldap_tut. html 20

LDAP Hierarchy Brad Marshall LDAP Tutorial, quark. humbug. au/publications/ldap_tut. html 20

Querying LDAP queries are mostly attribute-value predicates: § uid=zives; o=upenn; c = usa §

Querying LDAP queries are mostly attribute-value predicates: § uid=zives; o=upenn; c = usa § (|(cn=Susan Davidson)(cn=Zachary Ives)(cn=Val Tannen)) § objectclass=posix. Account § (!cn=Val Tannen) How does this differ from XPath? How might we process these queries? 21

The Backbone of Internet Naming: Domain Name Service § A simple, hierarchical name system

The Backbone of Internet Naming: Domain Name Service § A simple, hierarchical name system with a distributed database – each domain controls its own names Top Level Domains com edu … columbia … upenn berkeley … www amazon … cis www sas … www … … www 22

Top-Level Domains (TLDs) Mostly controlled by Network Solutions, Inc. today. com: commercial. edu: educational

Top-Level Domains (TLDs) Mostly controlled by Network Solutions, Inc. today. com: commercial. edu: educational institution. gov: US government. mil: US military. net: networks and ISPs (now also a number of other things) §. org: other organizations § 244, 2 -letter country suffixes, e. g. , . us, . uk, . cz, . tv, … § and a bunch of new suffixes that are not very common, e. g. , . biz, . name, . pro, … § § § 23

Finding the Root 13 “root servers” store entries for all top level domains (TLDs)

Finding the Root 13 “root servers” store entries for all top level domains (TLDs) DNS servers have a hard-coded mapping to root servers so they can “get started” 24

Excerpt from DNS Root Server Entries This file is made available by Inter. NIC

Excerpt from DNS Root Server Entries This file is made available by Inter. NIC registration services under anonymous FTP as ; file /domain/named. root ; ; formerly NS. INTERNIC. NET ; . 3600000 IN NS A. ROOT-SERVERS. NET. 3600000 A 98. 41. 0. 4 ; ; formerly NS 1. ISI. EDU ; . 3600000 NS B. ROOT-SERVERS. NET. 3600000 A 128. 9. 0. 107 ; ; formerly C. PSI. NET ; . 3600000 NS C. ROOT-SERVERS. NET. 3600000 A 192. 33. 4. 12 (13 servers in total, A through M) 25

Supposing We Were to Build DNS § How would we start? How is a

Supposing We Were to Build DNS § How would we start? How is a lookup performed? (Hint: what do you need to specify when you add a client to a network that doesn’t do DHCP? ) 26

Issues in DNS § We know that everyone wants to be “mydomain”. com §

Issues in DNS § We know that everyone wants to be “mydomain”. com § How does this mesh with the assumptions inherent in our hierarchical naming system? § What happens if things move frequently? § What happens if we want to provide different behavior to different requestors (e. g. , Akamai)? 27

Next Time… § We’ll look at alternative mechanisms for finding things: § § Publish-subscribe

Next Time… § We’ll look at alternative mechanisms for finding things: § § Publish-subscribe models Gossip protocols, such as in routers Flooding … and soon, peer-to-peer or content-based routing 28