HTML and XML Venkat Subramaniam svenkatcs uh edu
HTML and XML Venkat Subramaniam – svenkat@cs. uh. edu 1
HTML • Hyper Text Markup Language • HTML 4. 0 has strict compliance with XML standard • Presentation details presented with information – using markups • Browsers act as interpreters/parsers in – parsing through HTML documents – displaying the contents of the documents Venkat Subramaniam – svenkat@cs. uh. edu 2
Tags, Elements and Attributes <STRONG>boldface Text</STRONG> <HR> <TABLE BORDER="1">…</TABLE> • Tag starts with < and ends with > • Elements generally have start and end tags – starts with <Tag. Name> – ends with </Tag. Name> (optional in some cases) – contents of elements included between tags • Attributes – Name=Value specifies information about contents in an element – Provided between tag name and ending > – Multiple attributes separated by space Venkat Subramaniam – svenkat@cs. uh. edu 3
Tags, Case, well-formedness • HTML is relaxed when it comes to case and well -formedness • <HR> is as good as <hr> as are <Hr> and <h. R> • <STRONG>This is <I> italics</I> Text</STRONG> • However, – <STRONG>This is <I> italics</STRONG> </I> Text – Is generally accepted, though not well-formed – How does a browser handle this? Try it on different browsers • XML on the other hand is well-formed and case sensitive • Venkat XHMTL is HTML following XML restrictions Subramaniam – svenkat@cs. uh. edu 4
Tags, Line Breaks, Special Characters • Block-level tags affect a block of text/content – HEAD, BODY, P, H 1, BR, UL, TABLE • Inline tags affect only a few letters or words – EM, B, IMG • Line breaks – generally include automatic in block-level tags – Not so with inline tags • Special characters – <, >, & and " are special characters – To display these use names (< , > , & , " ) or numbers () Venkat Subramaniam – svenkat@cs. uh. edu 5
• • • Common Tags <HTML> Optional tag indicating content type <TITLE> Title of a web page <BODY> Content of a web page <Hn ALIGN=direction> Level 1 to 6 of header (Times New Roman 24, 18, 14, 12, 10 and 8 points) direction = left, right or center <P ALIGN=direction> Space between paragraphs Venkat Subramaniam – svenkat@cs. uh. edu 6
Text Formatting – Font, Size • Specifying Font (deprecated in HTML 4. 0) • <FONT SIZE="value" FACE="name 1, name 2" COLOR="value"> – Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24, 36) – Size may also be +n or –n to specify a point higher or lower • Also may be altered with <BIG> or <SMALL> tags – If name 1 is not available on system, select name 2 • More alternatives may be specified – If none of the alternatives available, choose default • You may set default size for entire document using <BASEFONT SIZE=“value”> Venkat Subramaniam – svenkat@cs. uh. edu 7
Text Formatting - Color • Color value can be specified – using either #rrggbb value – Or using “color” for one of 16 predefined colors • <BODY TEXT=“value”> – Sets the default color for text in the document • <FONT COLOR=“value”> – Sets the color for the content of this element Venkat Subramaniam – svenkat@cs. uh. edu 8
Text Formatting - Miscellaneous • • • <SUB> for subscript <SUP> for superscript <STRIKE> for strikeout <U> for underline <B> or <STRONG> for boldface <I> or <EM> for italics <CODE>, <KBD>, <SAMP>, <TT> for monospace <BLINK> for blinking text <!– to start comments and end with --> All these tags have a start and end tag Venkat Subramaniam – svenkat@cs. uh. edu 9
Links • Links are used to relate documents together – to navigate, to view, to take some action, etc. • Link has three parts destination, label and target <A HREF=“another. Page. html” >Next</A> – HREF provides target, Next is the label – A special attribute called TARGET may be used to tell browser to display in another frame or new window (_blank) • target names are case sensitive • <BASE TARGET=“…”> in head section sets default target for page • Good practice to use relative URL – use absolute for outside web pages • Links may be of other types: ftp, news, mailto, etc. Venkat Subramaniam – svenkat@cs. uh. edu 10
Links and Anchors • You may define an anchor within a document – <A NAME=“anchor. Name”>…</A> • You may link to that location in document by – <A HREF=“#anchor. Name”>label</A> – <A HREF=“URL#anchor. Name”>label</A> Venkat Subramaniam – svenkat@cs. uh. edu 11
Tables <TABLE> <TR> <TD>cell 1 content</TD><TD>cell 2 content</TD> </TR> … </TABLE> • TABLE attribute BORDER=n defines thickness – default is 2 – If you do not specify, the border is drawn with space, not line – to add extra space around table, use HSPACE or VSPACE • TABLE attribute ALIGN=center will center the table • TABLE or TD attribute WIDTH=n sets cell width pixels – size specified ignored if specified space is too small for contents • Attribute of TD, COLSPAN=n specifies number of columns to span – use ROWSPAN to span across rows • Use <TH> for table header, centered and boldfact • Use <CAPTION> for a table caption – attribute ALIGN=direction (top, bottom, left, right) Venkat Subramaniam – svenkat@cs. uh. edu 12
Lists • You may create (un)ordered list and definitions lists – May be plain, numbered, bulleted <OL TYPE=X> <LI> list item 1</LI> <LI> list item 2</LI> </OL> – Type is optional (defaults to 1 for numbers) – A for capital letters, a for small letters, I for capital roman numerals, i for small roman numerals – Use START=n for initial value for list item • always numeric and converted automatically to proper type – In LI, may override TYPE, VALUE for this & following items Venkat Subramaniam – svenkat@cs. uh. edu 13
Unordered List • Use <UL> to create unordered list • Use attribute TYPE=shape for bullet type – disc for solid round bullet (default for 1 st level) – circle for an empty round bullet (default for 2 nd level) – square for square bullets (default for >= 3 rd level) • <LI> may override the type Venkat Subramaniam – svenkat@cs. uh. edu 14
Definition Lists • Great to create lists that describe items – Like glossaries <DL>Text here will appear on own line <DT>Text To Appear On Own Line Aligned Left</DT> <DD> Definition text </DD> … </DL> – You may have multiple of DLs and DTs to allow multiple words or definitions Venkat Subramaniam – svenkat@cs. uh. edu 15
Images • HTML tag IMG allows placement of images • <IMG SRC=“Location. And. Name. Of. Image. File”> • Attributes – BORDER=“n” – ALT=“tooltip or alternate text” • specify a text that may appear instead of image • this also serves a tool tip on windows • a required attribute in HTML 4 – WIDTH=“x” HEIGHT=“y” • allows browser to optimize size for image while displaying text – LOWSRC • specify a fast load low resolution image to be shown first • high resolution image is loaded slowly replacing the low resolution image – ALIGN • align left or right to allow text wrapping around image – HSPACE=“pixel” VSPACE=“pixel” • Provides padding on sides (horizontal and vertical) around image Venkat Subramaniam – svenkat@cs. uh. edu 16
BR, CLEAR and Text Wrapping • <BR> command provides a line break • CLEAR attribute says do not begin text until the specified margin is clear – <BR CLEAR=“left”> • Do not begin text until left margin is clear of images – <BR CLEAR=“right”> • Do not begin text until right margin is clear of images – <BR CLEAR=“all”> • Do not begin text until both margins are clear of images Venkat Subramaniam – svenkat@cs. uh. edu 17
Forms • Form has three parts – FORM tag with URL of the action script – form elements, text, radio buttons, etc. – Submit button to send data to the script <FORM METHOD=POST ACTION=“script. URL”> … </FORM> • The method may be POST or GET – GET is limiting in the amount of information sent • sent as part of query string Venkat Subramaniam – svenkat@cs. uh. edu 18
FORM elements • Elements are created using <INPUT TYPE=“type” NAME=“name” VALUE=“initvalue”> – name and user given value are sent as name=value – Use attributes DISABLED or READONLY if desired • Text box – TYPE=“text” – Attributes: SIZE=“n” MAXLENGTH=n – last two attributes are in number of characters, optional – SIZE defaults to 20 • Password box – A text box where what you type is not shown (asterisks) – Not encrypted when sent to server, though Venkat Subramaniam – svenkat@cs. uh. edu 19
FORM elements… • Radio button – TYPE=“radio” – NAME=“radioset” • where radioset is group name for mutually exclusive buttons • verifies that only one of the group is set • This is the name sent to server side script, as well – attribute CHECKED if you like button checked initially – VALUE=“value” is the value sent if this button checked • Check box – TYPE=“checkbox” – attribute CHECKED if you like button checked initially – VALUE=“value” is the value sent if this button checked Venkat Subramaniam – svenkat@cs. uh. edu 20
FORM elements… • Uploading files – TYPE=“file” – NAME=“title” for server to identify – SIZE=n number of chars of field to enter path/file • default 20 – In the FORM tag, use attribute ENCTYPE=“multipart/form-data” – METHOD on FORM should be POST • Hidden fields – Useful to maintain session information – TYPE=“hidden” Venkat Subramaniam – svenkat@cs. uh. edu 21
FORM elements… • Menu <SELECT NAME=“name” SIZE=“n” MULTIPLE> <OPTION SELECTED VALUE=“value”>label</OPTION> … </SELECT> – SIZE is height in lines – SELECTED is optional, initial selection of menu item • Text Area – When one line is not enough – <TEXTAREA NAME=“name” ROWS=“n” COLS=“n” WRAP> – ROWS defaults to 4 and COLS to 40, WRAP optional – User may provide up to 32, 700 chars Venkat Subramaniam – svenkat@cs. uh. edu 22
FORM elements… • Submit button <INPUT TYPE=“submit” VALUE=“button text”> – if you do not provide value, the word Submit appears – if you set the name attribute, value is sent to server • Use TYPE=“reset” to provide a clear/reset button • HTML 4 adds BUTTON tag that allows you to – change the font – background color – image <BUTTON TYPE=“submit” NAME=“name” VALUE=“value” STYLE=“font: size Font. Name; background: color”> Text to left of image <IMG SRC=“image. File. Name”> Text to right of image </BUTTON> Venkat Subramaniam – svenkat@cs. uh. edu 23
FORM elements… • You may also use an image to send information • <INPUT TYPE=“image” SRC=“image. File. Name” NAME=“name”> • Mouse coordinate on which user clicks is sent – as name. x and name. y – Top-left of image is (0, 0) Venkat Subramaniam – svenkat@cs. uh. edu 24
Organizing Form Elements • You may put a box around elements <FORM…> <FIELDSET> <LEGEND ALIGN=right>box caption</LEGEND> … elements … </FIELDSET> … other fieldsets </FORM> • Simply surround elements with FIELDSET element Venkat Subramaniam – svenkat@cs. uh. edu 25
Running a Script on Input • It is useful to run a script when user makes a selection – Java. Script is the default scripting language • Simply add an attribute of an event type to the tag • Specify the code to execute – You may either type the code right there or refer to it <BUTTON TYPE=“button” NAME=“Time” ONCLICK=“alert(‘Today is ‘ + Date())”> Current Time</Button> We will see this put to work in Java. Script session Venkat Subramaniam – svenkat@cs. uh. edu 26
HTML Events • • ONBLUR ONCHANGE ONCLICK / ONDBLCLICK ONFOCUS • ONKEYDOWN / ONKEYPRESS • • • ONKEYUP ONLOAD ONMOUSEDOWN ONMOUSEMOVE ONMOUSEOVER over ONMOUSEUP ONRESET ONSELECT ONSUBMIT ONUNLOAD page • • • Venkat Subramaniam – svenkat@cs. uh. edu user leaves an element that has focus user modifies content of element (like INPUT) user clicks / double clicks on specified area user selects, clicks or tabs to element user types something in the specified area user releases key after typing page is loaded in browser mouse pressed down over the element mouse moved over after pointing at element mouse moved away from element after being mouse released after the click form’s reset button clicked selected one or more words in element form’s submit button clicked browser loads different page after specified 27
Cascading Style Sheets • HTML allows specification of fonts, colors, etc. • These may be placed through out the document – results in poor maintainability – What if you want to change these • This is where CSS comes in • You specify the formatting or styling separately in – the top of the document – or in a separate document Venkat Subramaniam – svenkat@cs. uh. edu 28
CSS: Specifying Style • Instead of defining style all over document, • specify at the top and simply refer to it in document • Specification has two parts: – selector • this is a name you associate a style with – declarations • this is definition of how it should look • The specification may be local, internal or external • The cascade: – local overrides internal which in turn may override external specifications Venkat Subramaniam – svenkat@cs. uh. edu 29
CSS: Local Style • This style applies to the element on which it is declared • This takes a local effect • Useful to alter the style specified internally in the document or externally from another file Venkat Subramaniam – svenkat@cs. uh. edu 30
CSS: Internal Style • Specified between the <HEAD> and the </HEAD> • Provide one or more selectors – Separate by comma for declarations to apply to all of selectors – Separate by space if declarations to apply to only nested selectors and not other appearances • Provide the declarations – within the {}, separated by ; Venkat Subramaniam – svenkat@cs. uh. edu 31
CSS: External Style Sheet • Writing the style in a separate file allows sharing of the style and applying it to more than one page • Pages link the style sheet that specifies the style • You may apply internal style sheet as well as local at the same time Venkat Subramaniam – svenkat@cs. uh. edu 32
CSS: Defining Classes • You can define a class or category and style for that class • Any element defined to be as part of that class will use the specified style for that class • Classes are defined to belong to a certain selector type using the format selectory. Name. class. Name Venkat Subramaniam – svenkat@cs. uh. edu 33
CSS: Defining IDs • ID can be defined for individual elements in your document – The ID must be unique • Style can be specified for that tag/element – Tag name followed by # followed by the ID • The style applies only for that element with that ID • Scripts may also identify that element in document Venkat Subramaniam – svenkat@cs. uh. edu 34
CSS: DIV and SPAN • Style may be specified on pre-defined tags – like Hn and P – how to apply style on a wide range of items? • DIV and SPAN allows you to define areas of document over which a style may be applied • DIV is a block-level tag while SPAN in an inline tag Venkat Subramaniam – svenkat@cs. uh. edu 35
CSS: Font Styles • font-family • specify a list of fonts to choose from • font-family: ”Times Roman”, “Helvetica”, “Ariel” • font-style • specify whether font should be italic, oblique, or normal • font-style: italic • to remove italic font-style: normal • font-weight • specifies boldness of text; possible values: bold, bolder, lighter • or multiple of 100 s between 100 and 900, with 400 for book weight and 700 for bold • normal will remove bold • font-size • specify absolute font size: xx-small, medium, large, x-large, xx-large • specify relative font size: large, small • exact point size: 18 pt • percentage relative size: 200% Venkat Subramaniam – svenkat@cs. uh. edu 36
CSS: Font Style… • line-height – specifies the space between lines (leading) within a paragraph – line-height: 15 pt or line-height: 50% • All the font-styles may be specified in one shot as well – Specify in the following order, space separated: • font-size/line-height font-weight small-cap fontsize font-family – / separates font-size from line-height Venkat Subramaniam – svenkat@cs. uh. edu 37
• color CSS: Text Color Style – specify one of 16 colors or #rrggbb or rgb(r, g, b) or (r%, g%, b%) • background – transparent or a color value – url(image. gif) to specify an image file name – repeat to tile the image, repeat-x for horizontal tiling, repeat-y for vertical tiling – fixed or scroll for background to scroll along canvas – x y for position of background image from top -left corner Venkat Subramaniam – svenkat@cs. uh. edu 38
CSS: Text Spacing Style • • word-spacing letter-spacing text-indent white-space – pre to preserve extra spaces; nowrap to keep elements on same line; normal to return to normal behavior • text-align – left, center, right, justify • text-decoration – underline, overline, line-through, none, blink • blink not supported by IE, generally not recommended as well • text-transform – capitalize, uppercase, lowercase, none • font-variant: small-caps will type uppercase in lowercase size Venkat Subramaniam – svenkat@cs. uh. edu 39
• Markup and XML – conveying metadata with literals/tags to delimit, describe – Generalized Markup Language (GML) – Standard Generalized Markup Language (SGML) • adopted by ISO • Popular use, however, too complex • e. Xtensible Markup Language (XML) – designed by World Wide Web Consortium (W 3 C) – subset of SGML – simpler to read, write and develop parsers Venkat Subramaniam – svenkat@cs. uh. edu 40
Why XML? • HTML is de facto standard for mark up – Markup for information presentation – Talks about how information looks, is presented – Does not let you add more markups of your own • What about the information itself? • Need to – describe information – Extend the descriptions – Must be structured, easy to express and validate Venkat Subramaniam – svenkat@cs. uh. edu 41
What is XML? • XML is about extensibility and flexibility • tags describe and surround the data • Open, extensible • Example: <? xml version = "1. 0" ? > • Platform <equipment> independent <pump> <name> p 01 </name> • Self describing data <pressure units="psi"> 32. 23 </pressure> – Data Exchange </pump> <name> p 02 </name> <pressure units="psi"> 22. 887 </pressure> </pump> </equipment> • Supports query and discovery of data • Dynamic Data Exchange Venkat Subramaniam – svenkat@cs. uh. edu 42
What does XML provide? • Tags delimit content – lets you define structure of arbitrary complexity • Self Describing Data – tags describe and name the data being defined – name related to the information it models/represents • standard e. Xtensibility – in defining new tags & semantics • Vocabularies – description of data used for information exchange – within specific domains • Separates contents from presentation Venkat Subramaniam – svenkat@cs. uh. edu 43
XML System XML Document XML Parser /Processor/Styling XML APP XML Constraint (DTD, Schema) Venkat Subramaniam – svenkat@cs. uh. edu 44
Features of XML technologies • Well-Formed syntax • Document Type Definitions (DTDs) – Captures rules added to extend core syntax rules • Document Object Model (DOM) – API for manipulating, parsing, creating XML documents – provides a tree-structured view of the document – Standard API • Simple API for XML (SAX) – Provides events as document is being parsed – Leaves it to application to keep state and content information • Styling and Transformation (XSL and XSLT) Venkat Subramaniam – svenkat@cs. uh. edu 45
The Markup Syntax • XML Entity – A file or stream with a well-formed structure • Tags delimit the elements of the structure • XML Tags are case-sensitive • XML uses Unicode character set • Names are used to identify structures – Names begin with letter, underscore or colon • Followed by any chars, including numbers, hyphen & period Start Tag Attributes Venkat Subramaniam – svenkat@cs. uh. edu Content End Tag 46
Structure of a Document Prolog (Optional) : comments, processing instructions Document Type Declaration comments processing instructions Document Type Definitions Element Declarations Attribute Declarations Entity Declarations Notation Declarations BODY : Root Element comments processing instructions Elements Attributes CDATA, Entities, ID, … PCDATA Entity References CDATA Sections Epilog (Optional) : comments, processing instructions Venkat Subramaniam – svenkat@cs. uh. edu 47
Markups that go in XML Document • The following tags may be contained in any XML document – Element start and end tags – Attributes – Comments – Entity references – Processing instructions – Character data sections (CDATA) – Document type declarations Venkat Subramaniam – svenkat@cs. uh. edu 48
A Sample XML File Venkat Subramaniam – svenkat@cs. uh. edu 49
Elements • Building blocks of an XML document • Element content may include – Other elements Start. Tag Content End. Tag – Character data – Character references – Entity references – Processing instructions – Comments <Element. Type. Name> </Element. Type. Name> – CDATA sections • Empty elements may be abbreviated to save space – <Element. Type. Name/> indicates an empty element Venkat Subramaniam – svenkat@cs. uh. edu 50
Document and Elements • XML document may be viewed as a hierarchical tree Prolog Document Root Document Element * * Element Epilog Represents containment/aggregation Venkat Subramaniam – svenkat@cs. uh. edu 51
Contents • Element Content – Contains other elements but no character data • Mixed content – Contains character data and other elements • Character content – Contains nothing but character data • Empty element – Contains nothing Venkat Subramaniam – svenkat@cs. uh. edu 52
Nesting • XML requires proper nesting of elements • Items must be fully contained within their nested level • XML is strict about proper nesting unlike HTML – Allowing ambiguity leads to programming complexity – Keep it simply policy – Gives not well-formed error if encountered – Results in fatal error/termination of parsing Venkat Subramaniam – svenkat@cs. uh. edu 53
Name • A name – begins with an alphabetic character or an underscore – followed by alphanumeric characters, periods, hyphens, underscores or full stops Name = (Letter | '_') (Char)* Char = Letter | Digit | '. ' | '-' | '_' Venkat Subramaniam – svenkat@cs. uh. edu 54
XML String Literals • Literals are delimited by apostrophe or quote • "hello" 'hi' • Character used as delimiter can’t appear in literal • "George, What's up!" • 'He said "what a nice day!"' • Following is not valid: 'what's up' – apostrophe may be used as an escape character in front of a quote • "He said '"what a nice day!'"" – quote may be used as an escape character in front of an apostrophe • 'George, What"'s up!' • What if you need to use apostrophe and quote – You may use entity reference: the ' or " • 'I asked George, What' s up, "He said, fine"' Venkat Subramaniam – svenkat@cs. uh. edu 55
Attributes • Element generally describes & contains information • Attributes provide information that are part of element rather than being contained in it – Generally talks about the information format, etc. • Name-value pair • attribute. Name="value" • attribute. Name='value' – The value must be a string literal; numbers not allowed – An attribute may appear only ones within a tag Venkat Subramaniam – svenkat@cs. uh. edu 56
Special Attributes • xml: space – White spaces are not generally preserved – How does one indicate that there is a space – xml: space tells that a space is encoded into the document – Recommends that the space must be preserved – Applications may choose to honor or ignore the space – Must take a value of "preserve" or "default" • xml: lang – Indicates the language/locale info of the XML document • If present, these two attributes apply on all nested elements as well Venkat Subramaniam – svenkat@cs. uh. edu 57
Special Characters • White spaces: – Horizontal Tab(09), Line-feed(0 A), Carriage-return(0 D), space (20) – Parsers preserve white spaces within element content – May remove from attributes and element tags • End-of-line – End of line is generally indicated by • A carriage-return followed by line-feed • Only a carriage-return • XML parsers required to convert to single line-feed – UNIX-style favored Venkat Subramaniam – svenkat@cs. uh. edu 58
Character References • Character References – Represent displayable characters that can’t be placed in a well-formed document as is – The character may be represented using • &# prefixed before a decimal number representing char • &#x prefixed before a hexadecimal number representing char Venkat Subramaniam – svenkat@cs. uh. edu 59
Entity References • Entity References – Think of these as macro definitions – Allows insertion of string literals – Provides mnemonic equivalence – Starts with an & and ends with a ; – Predefined Entity references: • & , < , > , ' , " • Rather than repeating content, you can refer where to find it – Declare the substitution text in doctype – Refer to it by &name; Venkat Subramaniam – svenkat@cs. uh. edu 60
Processing Instructions • Processing Instructions (PI) allows you to provide hints to applications as part of the document • PI consists of two things: – a target tag followed by instruction • <? target instruction ? > – The target tag is an XML name that identifies the application the instruction is intended for – Instruction is a string literal • To avoid confusion with – <? xml version = "1. 0" ? > – PI can’t be a string "xml" or "XML" Venkat Subramaniam – svenkat@cs. uh. edu 61
XML Comments • Comments may be present any where in a document – Except as part of other markup • Comments start with <!-- and end with --> • May contain any string that does not – have -– does not end with - • Entities within comments are not expanded • Markups within comments are not interpreted Venkat Subramaniam – svenkat@cs. uh. edu 62
CDATA Sections • CDATA sections are bulk of document that will not be interpreted for markup <![CDATA[ ]]> non parsed data • Starts with the tag: – <![CDATA[ • Ends with the tag – ]]> • The contained text can’t have – String that contains the delimiter ]]> – Nested CDATA Venkat Subramaniam – svenkat@cs. uh. edu 63
Prolog • Optional member of an XML document • Provides hints and information on encoding methods • Contains – Optional XML declaration – Optional comments (several) – PIs – White space characters – Optional Document Type Declarations (not DTDs) • Ties DTD to the document Venkat Subramaniam – svenkat@cs. uh. edu 64
XML Declaration • XML declaration is optional • If present – Must be the first in the document • No comments or white spaces allowed to precede – The xml tag must be lowercase • <? xml version="1. 0" ? > • Attributes: – version required. For future versions – encoding optional. UTF-8, UTF-16, IS-8859 -1 (Latin-1), etc. – standalone optional. yes or no (external DTD required) Venkat Subramaniam – svenkat@cs. uh. edu 65
Epilog • Optional member of an XML document • Contains – Optional comments (several) – PIs – White space characters • Use of this is ambiguous since it is optional and most applications may not wait for reading this Venkat Subramaniam – svenkat@cs. uh. edu 66
Well-formed Document • An XML document is said to be wellformed if – The document syntax conforms to XML specifications – Elements form a hierarchical tree with a single root node – There are no references to external entities • Unless DTD is provided – A Well-formed XML document is • case sensitive • expects you to close tags • does not allow overlapping tags Venkat Subramaniam – svenkat@cs. uh. edu 67
Parsers • An XML Processor or Parser is an application that will read through an XML document and interpret it • Parser Types – Non-validating • Ensures data object/document is well-formed XML – Validating • Validates, using DTD, well-formed data object’s form and content • Parser Implementations – Event-driven Parsers • • Parser calls back into application as it identifies data Applications handle the data Parser does not keep the tree structure or the data upon parsing Memory resource usage is minimal – Tree-based Parsers • A tree structure of the document is built in memory • This tree is then manipulated using an interface Venkat Subramaniam – svenkat@cs. uh. edu 68
XML Parsers • Several parsers available in the market – Xerces (Apache) – JAXP (More of an API from Sun) – MSXML (Microsoft) – Expat (James Clark) – RXP (Richard Tobin) – XP (James Clark) – XML 4 J (IBM) – XML: : Parser (Clark Cooper) – Pyexpat (Jack Jansen) – Lark (Tim Bray) – Tcl. XML (Steve Ball) Venkat Subramaniam – svenkat@cs. uh. edu 69
Major APIs • DOM API • SAX API • JDOM • XSLT • XPath Venkat Subramaniam – svenkat@cs. uh. edu 70
- Slides: 70