Basics of Web Technologies 1 Learning Outcomes The
Basics of Web Technologies 1
Learning Outcomes ◦ The evolution of the Internet, Internet standards organizations, and the difference between the Internet, intranets, and extranets. ◦ The beginning of the World Wide Web, ethical use of information on the Web, Web Accessibility, and future Internet trends. ◦ The Client/Server Model, Internet Protocols, Networks, URLs and Domain Names. 2
The Internet • Interconnected network of computer networks. • Internet is composed of collection of networks connecting millions of computers together. • Internet consists of hardware and software infrastructure that enables computers to communicate and exchange information. 3
The Evolution of the Internet ◦ ARPAnet Advanced Research Project Agency 1969 – four computers connected ◦ NSFnet National Science Foundation ◦ Use of the Internet was originally limited to government, research and academic use ◦ 1991 Commercial ban lifted 4
Intranet & Extranets Intranet ◦ A private network contained within an organization or business used to share information and resources among coworkers. Extranet ◦ A private network that securely shares part of an organization’s information or operations with external partners 5
The World Wide Web • The World Wide Web (WWW) is a system of interlinked hypertext documents accessed via the Internet. • With a web browser, one can view web pages that may contain text, images, videos, and other multimedia and navigate between them by using hyperlinks. 6
WWW: The World Wide Web • The World Wide Web (WWW) was developed by Tim Berners. Lee and other research scientists at CERN, the European center for nuclear research, in the late 1980 s and early 1990 s. • WWW is a client-server model and uses TCP connections to transfer information or web pages from server to client. • WWW uses a Hypertext model. Hypertext allows interactive accesses to a collection of documents. • Documents can hold • Text (hypertext), Graphics, Sound, Animations, Video • Documents are linked together • Non-distributed – all documents stored locally (e. g on CDRom). • Distributed – documents stored at remote servers on the Internet. 7
WWW - Hyperlinks (or links) • Each document contains links (pointers) to other documents. • The link represented by "active area" on screen • Graphic - button • Text - highlighted • By selecting a particular link, the client fetches the referenced document from a server for display. • Links may become invalid. • Link is simply a text name for a remote document. • Remote document may be moved to a new location while name in link remains in place. 8
WWW – Document Representation • Each WWW document is called a page. • Initial page for individual or organization is called a home page. • Page can contain many different types of information; page must specify: • Content – The actual information • Type of content – The type of information, e. g. text, pictures etc • Links to other documents • Rather than having a fixed representation for every browser, pages are formatted with a mark up language. • This allows browser to format page to fit display. • Standard is called Hyper. Text Markup Language (HTML). 9
Web Documents – Static Documents • • Fixed-content documents that are created and stored in a server. The client can get only a copy of the document. The content of the file is determined when the file is created, not when it is used. When a client accesses the document, a copy of the document is sent. The user can then use a browsing program to display the document. 10
Web Documents – Dynamic Documents • • • Created by a Web server whenever a browser requests the document. When a request arrives, the Web server runs an application program or a script that creates a dynamic document. The server returns the output of the program or script as a response to the browser that requested the document. 11
Basic Internet Protocols • TCP/IP is fundamental to the Internet • Email, web browsing, file downloads, accessing database are built on top of TCP and IP protocols • TCP is the Transmission Control Protocol • TCP extends IP to provide added functionality • However, only IP is fundamental to the definition of Internet • IP address: • Each device connected to the Internet has a unique numeric IP address. • 32 bit number (sequence of 4 decimal numbers separated by dots, for example: http: //74. 125. 224. 72) • Other protocols: UDP, FTP, SMTP etc. 12
UDP- User Datagram Protocol • • • An alternative protocol to TCP It builds on IP Does not provide two-way connection Does not provide guaranteed delivery unlike TCP Faster than TCP for simple tasks Used for tasks like downloading video, short message etc. 13
FTP: File Transfer Protocol – People required a protocol to reliably transfer files between any two computers connected to the Internet. – FTP is a client/server program – An FTP client program enables the user to interact with an ftp server in order to access files on the ftp server computer. – FTP provides similar services to those available on most filesystems: list directories, create new files, download files, delete files. – FTP uses TCP connections and the default server port for FTP is 21. 14
Web Standards and the W 3 C Consortium W 3 C – World Wide Web Consortium ◦ Develops recommendations and prototype technologies related to the Web ◦ Produces specifications, called Recommendations, in an effort to standardize web technologies 15
Web Accessibility WAI – Web Accessibility Initiative ◦ Develops recommendations for Web content developers, Web authoring tool developers, Developers of web browsers, and Developers of other user agents to facilitate use of the web by those with special needs. ◦ WCAG Web Content Accessibility Guidelines http: //www. w 3. org/WAI/WCAG 20/quickref/ 16
Web Architecture • Resources are identified by URIs. • URIs identify any resource (e. g. , multimedia, databases and publications). • Resources have different representations (e. g. , HTML and PDF). 17
Uniform Resource Locator (URL) • A client that wants to access a Web page needs the address to that page. • The Uniform Resource Locator (URL) is a standard for specifying any kind of information on the Internet. • The URL defines four things: protocol, host computer, port and path. Protocol : // Host : Port / Path 18
Domain Name What is “Domain Name”? It is a unique name for your website. To get a domain name you have to register the name from domain name registration companies like e. Host. com. Example: www. google. com What is “Domain Name System”? • It is a hierarchical decentralized naming system for computers, services, or any resource connected to the Internet or a private network. • It associates various information with domain names , including IP addresses. 19
Top-Level Domain Name (TLD) • A top-level domain (TLD) identifies the right-most label of the domain name. • Current generic TLDs: . com, . org, . net, . mil, . gov, . edu, . int, . aero, . asia, . cat, . jobs, . name, . biz, . mobi, . museum, . info, . coop, . post, . pro, . tel, . travel. 20
WWW – Identifying a web page – A web page is identified by: • The protocol used to access the web page. • The host is the computer on which the web page is stored. • The TCP port that the server is listening on to allow a client to access the web page. • The path is the directory pathname of web page on server (the file where the information is located). – Specific syntax for Uniform Resource Locator (URL): protocol: //computer_name: port/document_name • Protocol can be http, UDP, SMTP, FTP, mailto. 21
WWW – Identifying a web page – Computer name can be DNS name or IP address. – TCP port is optional (http uses port 80 as its default port). – document_name is path on server to web page (file). – E. g. http: //www. yahoo. com/Recreation/Sports/Soccer/index. html – Protocol is http – Computer name or DNS name is www. yahoo. com – Port number is the default port for http, i. e. port 80. – Document name is /Recreation/Sports/Soccer/index. html 22
Checkpoint 1. 1 1. Describe the difference between the Internet and an intranet. 2. Describe the difference between the Internet and the Web. 23
The Web Experience • A user clicks on a link in a browser • The browser communicates with a web server using HTTP. • The user clicks on another link and activates another URL. Request • The browser displays the document. Reponse Web client (browser) • The server sends an HTML document back. Web server 24
WWW – Client Server Model – The browser is the client, WWW (or web) server is the server. – Browser: • The browser makes TCP connection to the web server. • The browser sends request for the particular web page that it wishes to display. • The browser reads the contents of the web page from the TCP connection and displays it in the browsers window. • The browser closes the TCP connection used to transfer the web page. – Each separate item in a web page (e. g. , pictures, audio) require a separate TCP connection. – Hyper. Text Transport Protocol (HTTP) specifies commands that the client (browser) issues to the server (web server) and the responses that the server sends back to the client. 25
WWW – Client Server Model Figure 1 -1: Web client/server architecture 26
Web Server Basics • • Web servers are continually connected to the Internet All communications between browsers and servers use Hypertext Transfer Protocol (HTTP). Web servers provide responses to browser requests, either existing documents or dynamically built documents. When a client is connected. The server reads the HTTP request, performs some lookup function and sends HTTP response and the requested data. 27
Serving a Page • User of client machine types in a URL 28
Serving a Page • Server name is translated to an IP address via DNS client (Netscape) http: // www. smallco. com server (Apache) /index. html 192. 22. 107. 5 29
Serving a Page • Client connects to server using IP address and port number 30
Serving a Page • Client determines path and file to request 31
Serving a Page • Client sends HTTP request to server 32
Serving a Page • Server determines which file to send 33
Serving a Page • Server sends response code and the document 34
Serving a Page • Connection is broken 35
The basic client/server request/response sequence 36
A dynamic client/server request/response sequence 37
HTTP Protocol • The Hypertext Transfer Protocol (HTTP) is: – A protocol used mainly to access data on the World Wide Web. – Designed for document transfer – Specifies how a server and client communicate – Most HTTP messages are sent using TCP – Stateless • no persistent client/server connection 38
WWW – HTTP Protocol – When a user types in http: //www. yahoo. com/Recreation/Sports/Soccer/index. html , the broswer creates a HTTP GET Request message and sends it over a TCP connection to the web server. – In the above case, the HTTP GET Request message would be GET /Recreation/Sports/Soccer/index. html HTTP/1. 0 User-Agent: Internet. Explorer/5. 0 Accept: text/html, text/plain, image/gif, audio/au “rn” 39
WWW – HTTP Request messages are sent from client to server. Request Line Type of Request (e. g. GET) HTTP Headers Additional information such as brower being used, media types accepted A blank line Message body (Optional Data) Delimiter Carriage return Line feed User data e. g. contents of completed form 40
WWW – HTTP Request messages – Request line Request type URL Space HTTP version Space – Request type: This field is used in request messages. In version 1. 1 of HTTP, several request types as defined. The request type is categorized into methods as follows: • Get – Used to request a web page from a web server • Post – Used to send data (e. g. results of registration form) to a web server • Head – Requests information about a document but not the document itself. Used by search engines to test the validity of hyperlinks • Put / Delete – Not typically implemented by browsers. 41
• WWW – HTTP Response messages are sent from server to client. Status Line Success/Failure Indication Number between 200 and 599 HTTP Headers Type of content returned e. g. text/html or image/gif A blank line Message body (Optional Data) Delimiter Requested Data e. g. web page 42
WWW – HTTP response messages – Status line HTTP version Status code Space Status phrase Space – The Status Line gives information about the success of the previous HTTP Request • Status code: this field is used in the response message. It consists of three digits, as follows: • The codes in the 200 range indicate a successful request (200 – 299). • The codes in the 300 range redirect the client to another URL; meaning that the document has been moved (300 – 399). • The codes in the 400 range indicate an error at the client side– Bad Request, Unauthorised, Not found (400 – 499) • The codes in the 500 range indicate an error at the server side– Internal Error, Service Overloaded (500 – 599). • Status phrase: this field explains the status code in text form (e. g. , OK for 200 and Bad request for 400) 43
Example: HTTP request and response messages Request message Client POST /cgi-bin/doc. pl HTTP/1. 1 Accept: */* Accept: image/gif Accept: image/jpeg Content-length: 50 Server (Input information) Response message HTTP/1. 1 200 OK Date: Mon, 07 -Jan-02 13: 15: 14 GMT Server: Challenger MIME-version: 1. 0 Content-length: 2000 (Body of the document) 44
MIME • MIME – Multipurpose Internet Mail Extensions – Standards for encoding different media types in a message – Originally developed for emailing files and messages in different languages 45
WWW – Caching Web pages – Downloading HTML documents from servers can be slow due to a number of conditions: • Parts of the Internet can be congested • Dialup connection is typically very slow, 33 Kbps or 56 Kbps • Web server can have a lot of clients connecting to it at the same time, causing it to be overloaded. – If a user returns to previous HTML document, then this could require downloading the document from the server again. – A browser can hold copies of recently visited pages. This avoids having to download pages again. – An organization can use a HTTP proxy that caches documents for multiple users. Thus improving the speed at which pages can be displayed on each users computer. 46
WWW – Browser Architecture Input from keyboard and mouse Controller html interpreter … optional plugins HTTP client … Other client D i s p l a y Output sent to display D r i v e r Network Interface Communication with remote server 47
WWW – Browser Architecture – Browser has more components than a server: • Display driver for painting screen. • HTML interpreter formatting HTML documents. • Plugins to display different content (e. g. , Shockwave or Real Audio content) • HTTP client to fetch HTML documents from WWW server. • Other clients for other protocols (e. g. , ftp, mail) • Controller also must accept input from the computer user through the mouse or keyboard. 48
- Slides: 48