LIS 650 lecture 0 Introductory lecture Thomas Krichel

  • Slides: 69
Download presentation
LIS 650 lecture 0 Introductory lecture Thomas Krichel 2004 -11 -07

LIS 650 lecture 0 Introductory lecture Thomas Krichel 2004 -11 -07

today • Administrative introduction to the course • Talk about you • Substantive introduction

today • Administrative introduction to the course • Talk about you • Substantive introduction to the course. The subject matter is not just about HTML – web • servers • client – XML – HTML • Fairly general but abstract • Probably the toughest lecture in the course

course resources • Course home page is at http: //wotan. liu. edu/home/krichel/lis 650 w

course resources • Course home page is at http: //wotan. liu. edu/home/krichel/lis 650 w 04 a • Subscribe to class mailing list https: //lists. liu. edu/mailman/listinfo/cwp-lis 650 krichel • Me. Do not hesitate to ask. Send me email. I will usually answer to the class mailing list. • I plan to come here on several days to council students. I will announce all times publicly. Students who are in need of extra tuition should ask.

general assessment • First quiz next lecture. • If you miss a lecture, let

general assessment • First quiz next lecture. • If you miss a lecture, let me know in advance. • In addition to the quizzes, we have – the web site assessment – the final web site • Final grade is calculated by computer. Quizzes go through a complicated discounting scheme. It disregards the worst performance.

Web site assessment • Look at the web site of a university Library and

Web site assessment • Look at the web site of a university Library and Information Science department. • A list is at http: //informationr. net/wl/ • Write a text not describing, but commenting on the web site. • State the site URL, I will look at it. • Try to keep you text short please, no more than 2 pages. • Ask others for opinions if you want.

the final web site • • Contents should be equivalent to a student essay.

the final web site • • Contents should be equivalent to a student essay. It should be a contribution to knowledge on a topic. Personal sites are no longer allowed. Deadline to finish web site: one week after the end of the last lecture. • You will not be able to change your web site between the deadline and the time that the grade is issued.

course history • Course was first run as an institute 2002 -05 -13 to

course history • Course was first run as an institute 2002 -05 -13 to 2002 -05 -17 • Title was “Webmastering I: the static web site”. • To the curriculum committee, this title did not sound academic enough. • Since “Web Site Architecture and Design” is now the full title, We. SAD (pronounced like “wizard”) is the official abbreviation. • Webmastering is still what we want to learn.

teaching We. SAD • We. SAD combines many aspects: – – – Authoring pages

teaching We. SAD • We. SAD combines many aspects: – – – Authoring pages Work on the organization of data to fit onto pages Set display style of different pages Organize the contribution of data Maintain a technical web installation • Some of them can be learned in a course, but others can not. • Emphasis has to be on learnable elements.

teaching philosophy • Point and click on a computer software is not enough •

teaching philosophy • Point and click on a computer software is not enough • Explain underlying principles • Promote standards – XHTML 1. 0 – CSS level 2. 1 • Avoid proprietary software

We. SAD contents • Deals with the maintenance of a passive web site. Such

We. SAD contents • Deals with the maintenance of a passive web site. Such a web site remains the same whatever the user does with it. • Topics include – (x)html – css – site usability and information architecture, as far as relevant for static web sites – http, uri, web server

things this course does not do • Forms: allow you to design forms that

things this course does not do • Forms: allow you to design forms that users fill in. But you do not have the programming skills to do something with the form. • Frames: allow you to put several documents into one physical document. Most experts advise against them. • We do not cover image maps. • We don’t do some advanced CSS properties. • Some other exotic features of HTML are overlooked.

Other courses: webmastering II • Deals with building active web sites. – Users fill

Other courses: webmastering II • Deals with building active web sites. – Users fill in a form – Users submit the form – Web server return a page that is specific to the request of the user. • Teaches a language called PHP, that is widely used to generate such web sites. – Gets you introduced to computer programming – Gets you to train analytical thinking.

other courses: webmastering III • It deals with XML – XML is a syntax

other courses: webmastering III • It deals with XML – XML is a syntax to encode any kind of data. – XML can be constrained to only allow certain types of data (XML Schema) – XML can be transformed to render the data in various ways (XSLT) • The aim is to achieve a separation of contents and presentation of a web page. • This is an advanced course. It covers both Schema and Transformation

literature • I work from the text of the official standard at http: //www.

literature • I work from the text of the official standard at http: //www. w 3. org/TR/html 4/ • You can work from any HTML book. • The W 3 C is the standard making body for the Web. Anything that they say is the standard. • But some people don't behave according to the standard.

The world wide web The World Wide Web (Web) is a network of information

The world wide web The World Wide Web (Web) is a network of information resources. The Web relies on three mechanisms to make these resources readily available to the widest possible audience: – A uniform naming scheme for locating resources on the Web (i. e. URIs). – Protocols, for access to named resources over the Internet (e. g. , http). – Hypertext, for easy navigation among resources (e. g. , HTML).

URI introduction • Every resource available on the Web -- HTML document, image, video

URI introduction • Every resource available on the Web -- HTML document, image, video clip, program, etc. -- has an address that may be encoded by a Universal Resource Identifier, or "URI". • URIs typically consist of three pieces: – The name of the mechanism used • to access the resource • or the otherwise “resolve” it – The name of the machine hosting the resource. – The name of the resource itself, given as a path.

example URI • http: //openlib. org/home/krichel This URI may be read as follows: There

example URI • http: //openlib. org/home/krichel This URI may be read as follows: There is a document available via the HTTP protocol, residing on the site openlib. org, accessible via the path "/home/krichel". • mailto: krichel@openlib. org This URI may be read as follows: There is email user krichel in a domain openlib. org to whom email may be sent.

Internet application protocols • On the Internet machines use different application level protocols to

Internet application protocols • On the Internet machines use different application level protocols to do things • Common protocols include – http – smtp -- dns --telnet -- ssh --ftp • All of the ones cited are client/server protocols – client issues a request – server gives a response • All of them use a different port. A port is a number that tells the machine what to do with the incoming stream of data.

http • The web operates mostly on http, the hypertext transfer protocol. • The

http • The web operates mostly on http, the hypertext transfer protocol. • The client software is run on the local PC that you are using, called – a web browser (not politically correct) – a user agent (that's better) • Our server is a piece of hardware called wotan. liu. edu, “wotan” for short – It runs the Debian GNU/Linux operating system on a Intel architecture. – It provides http daemon software that serves http requests. The particular software is called Apache.

main features of http • http is insecure. the contents of http transactions (requests/responses)

main features of http • http is insecure. the contents of http transactions (requests/responses) can be observed. • http is stateless. Each transaction is selfcontained. Each transaction has no relationship to the previous one. • http has a limited vocabulary of requests and responses. It is no good, say, to operate a machine remotely. • We can therefore not use it communicate with the server.

working with a remote machine • There are two traditional ways to work with

working with a remote machine • There are two traditional ways to work with a remote machine – issue commands to it • used to be done with “telnet” – transfer files to and from it • used to be done with “ftp” • Telnet and ftp servers are not available on wotan. liu. edu. Telnet and ftp do not encrypt the communication stream. Therefore they are not secure.

communication with wotan • The protocol that we use for communicating with the server

communication with wotan • The protocol that we use for communicating with the server is the secure shell, short ssh. It is based public-key cryptography. • There are two PC programs commonly used as ssh clients – putty for issuing commands – winscp for file transfer. • winscp is the one we will use. In offers a range of other facilities besides file transfer. • Mac users should investigate a software called “fugu”.

registration time • As part of the course, you are being provided with web

registration time • As part of the course, you are being provided with web space on the server wotan. liu. edu, at the URL http: //wotan. liu. edu/~username where username is a user name that you will chose now. • It is my intention to maintain this web space for you into the foreseeable future. • You should also choose a password, now. • I will now register you.

free software • I maintain wotan. liu. edu server but you can build your

free software • I maintain wotan. liu. edu server but you can build your own server if – you have Internet access – you have an old PC to spare • All the server software, as well as putty and winscp are free, open-source. It is one of my fundamental beliefs that free information should run on free software. • The library community can learn a lot from the free software community. • See my talk at http: //openlib. org/home/krichel/ presentations/new_york_2003 -11 -07. ppt

installing winscp • http: //winscp. sourceforge. net/eng/download. php has – “installation package”. for use

installing winscp • http: //winscp. sourceforge. net/eng/download. php has – “installation package”. for use if you have administrator rights on the machine where you are installing to – “application”. for use otherwise, i. e. to just download and run the application • At installation time, when/if asked about the default interface, I suggest you use “Windows explorer style”, rather than the default “Norton commander style”. You can change that later, so no panic.

other stuff: installing “user agents” • Download and install a recent version of at

other stuff: installing “user agents” • Download and install a recent version of at least two browsers. I suggest – Mozilla Firefox at http: //www. mozilla. org/products/firefox/ – Opera at http: //www. opera. com – Netscape Navigator at http: //channels. netscape. com/ns/browsers/download. jsp

open a wotan session with winscp • the host name is “wotan. liu. edu”

open a wotan session with winscp • the host name is “wotan. liu. edu” • give your user name • click on “save”, this will save the session, after “ok” • you will be lead to the list of saved sessions • double click to open the session • at first connection you will see a warning you can ignore • note: – you can save the password as part of the session – it is risky to do that in a public classroom

initial remote files on wotan • a set of files starting with a dot.

initial remote files on wotan • a set of files starting with a dot. – Lhese are places where Linux Masters exert their black magic. – Leave them alone. • a directory called public_html – This is the place where web masters exert their magic. You can go into that directory to see the files that you have on your web site at the moment. – There should be two files • empty. html • validated. html

public_html • Imagine you are user and you have a file in public_html. •

public_html • Imagine you are user and you have a file in public_html. • The web server will map requests to http: //wotan. liu. edu/~user/file to show the file public_html/file. • Here user stands for your user id, and file is the file name, and “/” is the directory separator. • If file ends with “. html” or “. htm” the web browser will be told that the file is a HTML file. It will be rendered accordingly by the browser.

index. html • The web server on wotan will map requests to http: //wotan.

index. html • The web server on wotan will map requests to http: //wotan. liu. edu/~user to show the file public_html/index. html • If this file is not there, the server will prepare a HTML document from the list of files that it finds in the directory and send it to the user agent. • Once you have a file index. html, the web user can no longer see the individual files in your directory.

HTML and XHTML • HTML is the hypertext markup language • HTML is a

HTML and XHTML • HTML is the hypertext markup language • HTML is a markup language that is widely used on the Web. • The latest, and probably last version of HTML is at http: //www. w 3. org/TR/html 4/ • The W 3 C, the standard making body for the Web, have issued XHTML, a replacement of HTML that is compatible with XML. • We will work with XHTML.

SGML HTML XML • You will probably have come across these terms. • SGML

SGML HTML XML • You will probably have come across these terms. • SGML was developed first. HTML and XML are developed from SGML in different ways. – HTML is an SGML DTD. – XML is an SGML application. • One common thing here is the ML. It stands for Markup Language. • Markup is everything in a document that is not content.

procedural/descriptive • Markup can be given in two ways • 1: Procedural – Codes

procedural/descriptive • Markup can be given in two ways • 1: Procedural – Codes identify point size, style, font, etc. – Usually only understood by defining tool – Example: Microsoft Word • 2: Descriptive – – Describes purpose of text within the document Chapter head, Paragraph, Section Head, TOC Structure and Style are kept separate Example: La. Te. X, SGML

SGML • Standard Generalized Markup Language • Descriptive approach with three separate layers –

SGML • Standard Generalized Markup Language • Descriptive approach with three separate layers – structure: types of information in document – content: the information itself – style: defines how to typeset the document • Developed for the publishing industry by a group of consultants. • So complicated that no software implements it fully. • But an important idea that remains of it is the document type definition.

Document Type Definition (DTD) • The DTD is a non-SGML language that describes SGML

Document Type Definition (DTD) • The DTD is a non-SGML language that describes SGML document types • Describes information the document handles, e. g. – title – chapter • Relationships between fields e. g. – a chapter contains sections – a title comes at the top of the document

XML • Since SGML is so complicated, it is not good for use on

XML • Since SGML is so complicated, it is not good for use on the Web. • So the W 3 C has issued XML, the e. Xtensible markup language. • Every XML document is SGML, but not the opposite. • Thus XML is like SGML but with many features removed. • XML defines the syntax that we will use in the course. We have to study that syntax in some detail.

XML elements • XML is based on elements. There are basically three ways of

XML elements • XML is based on elements. There are basically three ways of writing an element. • The first way is write <element/>. • Here element is the name of the element. • Such an element is called an empty element. • Example: <bang/> • This is an empty element, the name of which is “bang”.

non-empty elements • If name is the name of the element, you can give

non-empty elements • If name is the name of the element, you can give an element contents by writing <name>contents</name>. • Here <name> is called a start tag. </name> is called the end tag. Both tags surround the contents of the element. • Remember the previous slide? Then note that <name/> is just a shortcut for <name></name>.

Examples • <greeting>bonjour</greeting> • <greeting>здравствуйте</greeting> • <sentence>She says <greeting>hello</greeting> to you. </sentence> • <examples>

Examples • <greeting>bonjour</greeting> • <greeting>здравствуйте</greeting> • <sentence>She says <greeting>hello</greeting> to you. </sentence> • <examples> <example>I koh Glos essa, und es duard ma ned wei. </example><example>Ja mogu esti staklo, i ne boli me. </example> <example>Kristala jan dezaket, ez det minik ematen. </example></examples>

attributes to elements • Elements can have attributes. Here is an element with two

attributes to elements • Elements can have attributes. Here is an element with two attributes • <name attribute_name_one="value_one" attribute_name_two="value_two"/> • Here attribute_name_one and attribute_name_two are attribute names and value_one and value_two are attribute values. The element itself is empty. • Example: <greeting language=”french”>bonjour</greeting>

more on attributes • There can be no two attributes to the same element

more on attributes • There can be no two attributes to the same element with the same names. • Attribute values are simple strings. You can not have an element inside an attribute value. • Attribute names are separated from their values by the = sign. • Attribute values can be enclosed in single or double quotes. It does not matter. Double quotes are more common, so I suggest you use those.

more examples <poet born="1799" died="1837"> <name lang="ru">Александер Сергеевич Пушкин</name> <name lang="en">Alexander S. Pushkin</name> <name

more examples <poet born="1799" died="1837"> <name lang="ru">Александер Сергеевич Пушкин</name> <name lang="en">Alexander S. Pushkin</name> <name lang="fr">Alexandre Pouchkine</name> </poet>

XML document • An XML document is a piece of data that is written

XML document • An XML document is a piece of data that is written in XML. • But sometimes the author of a document makes a mistake, and, in fact the XML is wrong in some ways. • If there is no mistake, the document is called wellformed. • If a document is not well-formed, it really is not an XML document.

some rules for well-formedness • All elements must be properly nested. You can only

some rules for well-formedness • All elements must be properly nested. You can only close the outer element after all inner elements are closed. Examples – <a><b></a></b> not well-formed – <a><b></a> well formed • An attribute must have a value. Thus you can not write <result abstract>. . . </result>. The value may be empty like in <result abstract=''>. . . </result> or <result abstract="">. . . </result>. • You can not have element contents in attributes. Thus you can not have <structure note="<b>something</b>">. . .

more rules for well-formedness • There must be one single element in the document.

more rules for well-formedness • There must be one single element in the document. – – It is called the root element. All other elements are called children of the root. Whitespace that surrounds the root element is ignored. The root element may be preceded by a prologue. A prologue is anything before the root element. • There can be other things, i. e. that are not elements in an XML document.

other things: comments • In an XML document, you can make comments about your

other things: comments • In an XML document, you can make comments about your code. These are notes to yourself. • Comments start with <!- • Comments end with --> • Example: <!-- this is a comment --> • Comments can not be nested. • Can appear anywhere in the document. • They can enclose elements.

other things: XML declaration • The XML declaration is a special line that says

other things: XML declaration • The XML declaration is a special line that says that what follows is XML and give some very basic information about that XML. It is trendy to use it. • It is optional, but if it is there it has to be on the first line. • You will need to have an XML declaration if your character encoding is not UTF-8. We will come back to this point later.

other things: XML declaration • Normally the XML declaration looks like • <? xml

other things: XML declaration • Normally the XML declaration looks like • <? xml version="1. 0" encoding="encoding"? > • where encoding is the character encoding. By default, the character encoding is UTF-8, so if you use that, you do not need to mention it. • There is now a version "1. 1" of XML around, but – it is not widely deployed – it is not much different from version 1. 0

other stuff: document type declaration • XML documents, like any SGML documents, accept document

other stuff: document type declaration • XML documents, like any SGML documents, accept document type declarations. • A document type declaration tells us something about the vocabulary of elements and attributes used in the document. • It should appear before the root element, after the XML declaration, if you have one. • It takes the form <!DOCTYPE mumbojumbo > • We will come back to the document type declaration later.

HTML • Hyper. Text Markup Language • HTML is an SGML DTD – –

HTML • Hyper. Text Markup Language • HTML is an SGML DTD – – – Head, Title, Body, Paragraph, etc. Headings, Bold, Italic, etc. Table, List, Image, etc. Links to other documents Forms and many others

HTML history • HTML was a very bare-bones language when first invented by Tim

HTML history • HTML was a very bare-bones language when first invented by Tim Berners-Lee. It did not describe pages with much of a visual appeal. • In the 90 s, successful browsers invented “extensions” that aimed to stretch the visual boundaries of HTML. • Some of these extensions found their way in the official HTML spec issued by the W 3 C. • Later the W 3 C developed style sheets as a way to accommodate for display requirements without having to extend HTML.

HTML versions • HTML 4. 01 is the last version of HTML This version

HTML versions • HTML 4. 01 is the last version of HTML This version has two different DTDs: – the loose DTD – the strict DTD • I only the cover the elements of the strict DTD. • The loose DTD has more elements, but all the functionality of these elements is best done with style sheets. • Thus, the pages created with HTML only will look rather boring. • But we do cover style sheets later.

XHTML • XHTML is HTML written in an XML syntax. • Every XHTML document

XHTML • XHTML is HTML written in an XML syntax. • Every XHTML document has to be well-formed XML. • non-XHTML documents can violate some well-formedness constraints, including – HTML element names are not case sensitive – some HTML elements do not need closing. – there is no need for a single root element in a HTML document.

XHTML: pain without gain? • In this course we study XHTML. • When I

XHTML: pain without gain? • In this course we study XHTML. • When I say HTML in the following, I mean XHTML. • Reasons to study XHTML rather than HTML – syntactic rules of XML are easier to understand. – any tool that can work with XML can be applied to XHTML, but can not be applied to HTML. – in general XML documents are more computer understandable. This is crucial in the age of the search engine.

Example HTML snippet <a href="http: //openlib. org/home/krichel" title="homepage of Thomas Krichel">Thomas Krichel</a> – the

Example HTML snippet <a href="http: //openlib. org/home/krichel" title="homepage of Thomas Krichel">Thomas Krichel</a> – the whole thing is an <a> element. It creates an anchor. (I use < and > to surround element names. ) – “href” is an attribute name – “http: //openlib. org/home/krichel” is the value of the "href" attribute (I surround attribute names with straight quotes) – 'Thomas Krichel' is character data.

Characters: concept • A character set combine two things – Character repertoire: a set

Characters: concept • A character set combine two things – Character repertoire: a set of characters e. g. "A", " "ﺽ "‼", "₣" – Character code positions: defines a number for each character in the repertoire. • Character encoding is a way to encode the code positions in bytes. • To correctly display a document, the user agent needs to know both!

playing safe with characters • Only use the characters on the US keyboard, don't

playing safe with characters • Only use the characters on the US keyboard, don't insert symbols. • Save as ASCII or UTF-8. All ASCII files are also UTF-8 files. • Never save as "Unicode" within MS Notepad. • If you encounter a character that is not on your keyboard, use an SGML entity. • The SGML entity is the last special SGML thing that we have to study.

SGML entities • SGML entities are something like a way to represent non-ASCII characters

SGML entities • SGML entities are something like a way to represent non-ASCII characters when only ASCII input is possible. • Codes can be &code; – Ex. é • Inserts and e with acute accent. – this is called a character entity – Codes are often abbreviation of the character names • Codes can be in hex form • Ex. & to insert an ampersand • this is called a numeric entity

XHTML entities • They are officially defined in three files that are maintained by

XHTML entities • They are officially defined in three files that are maintained by the W 3 C – http: //www. w 3. org/TR/xhtml 1/DTD/xhtml-lat 1. ent – http: //www. w 3. org/TR/xhtml 1/DTD/xhtml-special. ent – http: //www. w 3. org/TR/xhtml 1/DTD/xhtml-symbol. ent • A sample line is <!ENTITY ccedil "ç "> <!-- latin small letter c with cedilla, U+00 E 7 ISOlat 1 --> • <!ENTITY is DTD speak for defining an entity • it is followed by the character form and the numeric form of the entity • the rest of the line is a comment, of course

entities used in XML • There are three that you need to know and

entities used in XML • There are three that you need to know and use. – < stands for < – > stands for > – & stands for & • Every time you want to insert <, > or & in the documents, you have to use the entities instead. • Examples: – krichel@ openlib. org – je suis Franç ais – Marks & Spencers

another look at empty. html <!DOCTYPE html PUBLIC "-//W 3 C//DTD XHTML 1. 0

another look at empty. html <!DOCTYPE html PUBLIC "-//W 3 C//DTD XHTML 1. 0 Strict//EN" "http: //www. w 3. org/TR/xhtml 1/DTD/xhtml 1 -strict. dtd"> <html> <head> <title></title> <meta http-equiv="content-type" content="text/html; charset=UTF-8"/> </head> <body></body> </html>

empty. html dissected • the <!DOCTYPE. . . > is an SGML document type

empty. html dissected • the <!DOCTYPE. . . > is an SGML document type declaration. It says that the document contains XHTML of the “strict” flavor. • The document type declaration is the only thing that we have in the prologue. We could have placed an XML declaration before it but chose not to do so. • <html> is the root element. It contains some other elements. Some of these we discuss now, others later.

special topic: images • The appeal of the web to the masses has a

special topic: images • The appeal of the web to the masses has a lot to do with its capability to transport image. • Image formats are independent of the web, but there are two classic format that are widely supported by user agents. – GIF – JPEG • There is also a more recent one, the portable network graphic, PNG.

GIF • stands for graphics interchange format. • developed by Compu. Serve. • unresolved

GIF • stands for graphics interchange format. • developed by Compu. Serve. • unresolved copyright issues make the format abhorred by the free software community. • 250 colors maximum • uses a loss-less compression technique

GIF has three tricks • interlacing: – when downloading the file, the browser can

GIF has three tricks • interlacing: – when downloading the file, the browser can show every forth row first – user gets in an idea of the picture before it is sharp • transparency – some GIFs are transparent, so you can see them on top of already exist – technically, the GIF has one color as the background color, and pixels of that color are ignored by the user agent • animation – some GIFs are in fact sequences of GIFs that can be rendered one after the other.

JPEG • The Joint Photographic Experts Group is a standard-making body for images •

JPEG • The Joint Photographic Experts Group is a standard-making body for images • They can support thousands of colors. • The compression is lossy, i. e. the JPEG file will look like the original image, but not be the same. • The compression does not work well with drawings. • There are no copyright and patent problems with JPEG

Homework • Look at course home page. • Install winscp and browsers at home.

Homework • Look at course home page. • Install winscp and browsers at home. • Prepare a one-page max summary of the type of website that you want to build, bring printed copy with you next week. • Prepare for quiz at the beginning of next lecture.

http: //openlib. org/home/krichel Please shutdown the computers when you are done. Thank you for

http: //openlib. org/home/krichel Please shutdown the computers when you are done. Thank you for your attention!