Representing data with XML SE2030 Dr Mark L

  • Slides: 15
Download presentation
Representing data with XML SE-2030 Dr. Mark L. Hornick 1

Representing data with XML SE-2030 Dr. Mark L. Hornick 1

Parsing exercise that is is that is not is that not it it is

Parsing exercise that is is that is not is that not it it is SE-2030 Dr. Mark L. Hornick 2

Parsing I never said she STOLE my money SE-2030 Dr. Mark L. Hornick 3

Parsing I never said she STOLE my money SE-2030 Dr. Mark L. Hornick 3

XML: EXtensible Markup Language XML is a technology for defining markup languages to represent

XML: EXtensible Markup Language XML is a technology for defining markup languages to represent data l designed to define structured data descriptions that are l l l Extensible/customizable Portable across languages, Portable across operating system platforms SE-2030 Dr. Mark L. Hornick 4

XML allows you to define your own markup language l You can define your

XML allows you to define your own markup language l You can define your own tags l l XML can use an optional DTD (Document Type Definition) or XSD (XML Schema Defn) to formally describe the data l l You can create your own tag vocabulary How tags can nest What attributes a tag can/must have i. e. tag grammar Data that is described by a specified vocabulary & grammar is called an XML Application SE-2030 Dr. Mark L. Hornick 5

HTML is an XML application l HTML is a description of data in web

HTML is an XML application l HTML is a description of data in web page documents, and how it is structured <!doctype html> <head> <meta charset=“UTF-8”> <title>My web page</title> </head> <!-- This is a comment --> <body> <h 1>HTML syntax summary</h 1> <h 2>or, all you need to know about HTML</h 2> <p>This is how you write an HTML document. </p> <p>The end. </p> </body> </html> SE-2030 Dr. Mark L. Hornick 6

XML schemas define how HTML can be structured (for instance, a <title> can appear

XML schemas define how HTML can be structured (for instance, a <title> can appear within a <head>, but not within a <body>) html body head h 1 p p p title strong em em em strong SE-2030 Dr. Mark L. Hornick 7

What else is XML good for? XML can be used as a format for

What else is XML good for? XML can be used as a format for storing data in files in a structured manner l Separating content from presentation, so that data created by an application written in Java can be read by an application written in C (or Javascript, or any other language) SE-2030 Dr. Mark L. Hornick 8

Scenario Consider a Java collection of Students: List<Student> roster; where public class Student {

Scenario Consider a Java collection of Students: List<Student> roster; where public class Student { String firstname; String lastname; int id; String program; } SE-2030 Dr. Mark L. Hornick 9

In Java, we can use serialization to write/store roster to a file, to be

In Java, we can use serialization to write/store roster to a file, to be read (later) by another Java application l Provided the other Java application knows the definition of Student and List l It would be much more difficult for a program written in C or Java. Script to read the file Further complications arise when the file is read by an application running on another HW or OS platform l SE-2030 Dr. Mark L. Hornick 10

XML allows us to create a document that can be used to represent a

XML allows us to create a document that can be used to represent a portable collection of Students: <? xml version="1. 0" encoding="ISO-8859 -1"? > <roster> <!-- This is a comment --> <student lastname=“Bored” firstname=“Bill”> <id>1111</id> <program>SE</program> </student> <!-- This is a comment --> <student firstname=“Bob” lastname=“Sledd”> <id>1112</id> <program>CE</program> XML grammars (like the one here) </student> represent all data in plain text, which </roster> is most easily interpreted across platforms SE-2030 Dr. Mark L. Hornick 11

The XML grammar itself may be defined by an optional XML Schema Definition (or

The XML grammar itself may be defined by an optional XML Schema Definition (or a Document Type Defintion/DTD) <xs: schema target. Namespace="urn: Student. File. Schema" xmlns: xs="http: //www. w 3. org/2001/XMLSchema" xmlns: msdata="urn: schemas-microsoft-com: xmlmsdata"> <xs: element name=“roster"> <xs: complex. Type > <xs: element name="student" min. Occurs="1"> <xs: attribute name="firstname" type="xs: string" /> <xs: attribute name="lastname" type="xs: string" /> <xs: complex. Type> <xs: sequence> <xs: element name="id" type="xs: integer" min. Occurs="1" max. Occurs="1"> <xs: element name="program" type="xs: string" min. Occurs="1" max. Occurs="1"/> <xs: sequence> </xs: complex. Type> XML Schema Definitions (like the one here) </xs: element> define the valid format of the XML data. </xs: complex. Type> For instance the <xs: sequence> tag specifies </xs: element> </xs: schema> that the id…program tags MUST appear in only that sequence SE-2030 Dr. Mark L. Hornick 12

How do you read the data from an XML document? l l Most languages

How do you read the data from an XML document? l l Most languages implement XML Parsers that can interpret an XML file and extract the data XML Parsers can be “told” to use the optional XML Schema to ensure that the XML file being parsed is in a valid format l Validation is optional; you can create XML files without creating an XML Schema, but then you have no way of constraining the syntax SE-2030 Dr. Mark L. Hornick 13

How does a SAX Parser work? l l The parser reads an xml file

How does a SAX Parser work? l l The parser reads an xml file line by line As each line is read, it performs a syntactical analysis, based on rules of xml The parser generates various events depending on what type of element it encounters during analysis Event handling methods are called, where element information is passed via parameters of the method start. Document() <? xml version="1. 0" encoding="ISO-8859 -1"? > <roster> <student firstname=“Bill” lastname=“Bored”> <id>1111</id> <program>SE</program> </student> <student firstname=“Bob” lastname=“Sledd” <id>2222</id> <program>CS</program> </student_list> Note: there are several other SE-2030 events and event handling methods. Dr. Mark L. Hornick start. Element(name, attrs) (name=“roster”, attrs=none) start. Element(name, attrs) (name=“student”, attrs=…) end. Element(name) (name=“student”) start. Element/end. Element is called again for the 2 nd student end. Element(name) (name=“student_list”) end. Document() 14

XML SAX Parser Demonstration SE-2030 Dr. Mark L. Hornick 15

XML SAX Parser Demonstration SE-2030 Dr. Mark L. Hornick 15