CIT 383 Administrative Scripting XML Computer Security Art

  • Slides: 18
Download presentation
CIT 383: Administrative Scripting XML Computer Security: Art and Science 1

CIT 383: Administrative Scripting XML Computer Security: Art and Science 1

Topics 1. What is XML? 2. XML Structure 3. REXML CIT 383: Administrative Scripting

Topics 1. What is XML? 2. XML Structure 3. REXML CIT 383: Administrative Scripting

e. Xtensible Markup Language Extensible descriptive markup language framework – Began as subset of

e. Xtensible Markup Language Extensible descriptive markup language framework – Began as subset of Standard Generalized Markup Language (SGML). – To ensure that data remains available after programs that originally created/read it become obsolete or unusable. <? xml version="1. 0" encoding="UTF-8"? > <inventory> <book isbn=“ 0976694042”> <author>Chris Pine</author> <title>Learn to Program</title> </book> </inventory> CIT 383: Administrative Scripting

Descriptive vs Presentational describe how documents should look <b>text</b> turns on boldface for text

Descriptive vs Presentational describe how documents should look <b>text</b> turns on boldface for text What if you want to change book titles from bold to italics? Replace won’t work if items other than books are bold. Descriptive languages focus on the meaning <title>xml and you</title> Stylesheets describe how to present logical items. Can just be used for data storage, interchange. A/K/A logical or structural markup languages. CIT 383: Administrative Scripting

XML-based Languages • • Ant Atom CML Math. ML Music. XML ODF OPML CIT

XML-based Languages • • Ant Atom CML Math. ML Music. XML ODF OPML CIT 383: Administrative Scripting • • RDF SAML SOAP SVG Voice. XML WML XHTML XUL

Evolution of XML 1986 SGML standard published as ISO 8879 1987 Unicode proposal published

Evolution of XML 1986 SGML standard published as ISO 8879 1987 Unicode proposal published 1991 First volume of Unicode standard 1996 XML work started 1998 XML 1. 0 released as a W 3 C standard 2001 XML Schema language 2004 XML 1. 1 released (not widely used) 2007 Unicode 5. 0 published CIT 383: Administrative Scripting

XML Tree Structure <todo> <title> Monday’s List </title> <item> Study for midterm </item> <priority=10/>

XML Tree Structure <todo> <title> Monday’s List </title> <item> Study for midterm </item> <priority=10/> Scripting Class </item> <item> Bathe cat </item> </html> CIT 383: Administrative Scripting todo title item Tuesday’s List Study for midterm Scripting Class Bathe Cat priority 10

Elements and Attributes An element consists of tags and contents <title>Learn to Program</title> Begin

Elements and Attributes An element consists of tags and contents <title>Learn to Program</title> Begin and end tags are mandatory. <isbn number=“ 0976694042” /> Attributes number=“ 0976694042” Elements may have zero or more attributes. Attribute values must always be quoted. CIT 383: Administrative Scripting

Text XML declaration specifies character encoding <? xml version="1. 0" encoding="UTF-8"? > Encodings Unicode:

Text XML declaration specifies character encoding <? xml version="1. 0" encoding="UTF-8"? > Encodings Unicode: universal character set, UTF-8, UTF-32 ISO-8859: 8 -bit encodings, 8859 -1 is West Europe Entities &#nnnn; encodes specified Unicode character &name; are named character entities, such as < is < > is > & is & currency symbols, fractions, Greek letters, math symbols, etc. CIT 383: Administrative Scripting

XML Syntax Rules 1. 2. 3. 4. 5. 6. 7. 8. There is one

XML Syntax Rules 1. 2. 3. 4. 5. 6. 7. 8. There is one and only one root tag. Begin tags must be matched by an end tag. XML tags must be properly nested. XML tags are case sensitive. All attribute values must be quoted. Whitespace within tags is part of text. Newlines are always stored as LF. HTML-style comments: <!-- comment --> CIT 383: Administrative Scripting

Correctness Well-formed – Conforms to XML syntax rules. – A conforming parser will not

Correctness Well-formed – Conforms to XML syntax rules. – A conforming parser will not parse documents that are not well-formed. Valid – Conforms to XML semantics rules as defined in • Document Type Definition (DTD) • XML Schema – A validating parser will not parse invalid documents. CIT 383: Administrative Scripting

XML Schema Languages Document Type Definitions Inherited from SGML. No support for all XML

XML Schema Languages Document Type Definitions Inherited from SGML. No support for all XML Schema Most commonly used. Schemas are XML docs. A/K/A WXS, XSD RELAX NG REgular LAnguage for XML Next Generation XML and non-XML forms. <? xml version="1. 0" encoding="utf-8" ? > <xs: schema element. Form. Default="qualified" xmlns: xs="http: //www. w 3. org/2001/ <xs: element name="Address"> <xs: complex. Type> <xs: sequence> <xs: element name="Recipient" type="xs: string" /> <xs: element name="House" type="xs: string" /> <xs: element name="Street" type="xs: string" /> <xs: element name="Town" type="xs: string" /> <xs: element min. Occurs="0" name="County" type="xs: string" /> <xs: element name="Post. Code" type="xs: string" /> <xs: element name="Country"> <xs: simple. Type> <xs: restriction base="xs: string"> <xs: enumeration value="FR" /> <xs: enumeration value="DE" /> </xs: restriction> </xs: simple. Type> </xs: element> </xs: sequence> </xs: complex. Type> </xs: element> </xs: schema> CIT 383: Administrative Scripting

Ruby XML Parsers REXML: Ruby Electric XML – Standard with the ruby language. –

Ruby XML Parsers REXML: Ruby Electric XML – Standard with the ruby language. – Slow on large documents. libxml-ruby – Ruby bindings for Gnome libxml 2 XML toolkit. – Very fast (30 X as fast as REXML). HPricot – Parses XML as well as HTML. – Fast (3 -4 X as fast as REXML). – Does not check for well-formedness or validity. CIT 383: Administrative Scripting

Types of Parsing Tree Parsing (DOM-like) – Good for small documents. – Loads entire

Types of Parsing Tree Parsing (DOM-like) – Good for small documents. – Loads entire document into memory. – Simple API Stream Parsing (SAX-like) – Good for large documents. – User defines callback methods, passes to API. – Parser runs callback methods on pattern match. CIT 383: Administrative Scripting

Tree Parsing Loads entire XML doc into memory. require ‘rexml/document’ include REXML input =

Tree Parsing Loads entire XML doc into memory. require ‘rexml/document’ include REXML input = File. new(‘data. xml’) doc = Document. new(input) root = doc. root Search document as a tree using XPath doc. elements. each(“ch/section”) do |e| puts e. attributes[“title”] end CIT 383: Administrative Scripting

Stream Parsing Define listener class My. Listener include REXML: : Stream. Listener def tag_start(*args)

Stream Parsing Define listener class My. Listener include REXML: : Stream. Listener def tag_start(*args) puts “start: #{args. map {|x| x. inspect}. join(‘, ’” end Invoke parser require ‘rexml/document’ require ‘rexml/streamlistener’ include REXML listen = My. Listener. new source = File. new(‘data. xml’) Document. parse_stream(source, listen) CIT 383: Administrative Scripting

XPath Searches h. search("p") Find all paragraph tags in document. doc. search("/html/body//p") Find all

XPath Searches h. search("p") Find all paragraph tags in document. doc. search("/html/body//p") Find all paragraph tags within the body tag. doc. search("//a[@src]") Find all anchor tags with a src attribute. doc. search("//a[@src='google. com']") Find all a tags with a src attribute of google. com. CIT 383: Administrative Scripting

References 1. Michael Fitzgerald, Learning Ruby, O’Reilly, 2008. 2. David Flanagan and Yukihiro Matsumoto,

References 1. Michael Fitzgerald, Learning Ruby, O’Reilly, 2008. 2. David Flanagan and Yukihiro Matsumoto, The Ruby Programming Language, O’Reilly, 2008. 3. Hal Fulton, The Ruby Way, 2 nd edition, Addison. Wesley, 2007. 4. Robert C. Martin, Clean Code, Prentice Hall, 2008. 5. Dave Thomas with Chad Fowler and Andy Hunt, Programming Ruby, 2 nd edition, Pragmatic Programmers, 2005. Computer Security: Art and Science 18