Web and the HTTP Hyper Text Transfer Protocol
Web and the HTTP (Hyper. Text Transfer Protocol)
World Wide Web • The Web is the collection of machines (Web servers) on the Internet that provide information, • particularly HTML documents, via HTTP. • Machines that access information on the Web are known as Web clients. • A Web browser is software used by an end user to access the Web 2
Web and HTTP • web page consists of objects, each object can be: • HTML file, • JPEG image, • Java applet, • audio file, • … • web page consists of base HTML-file which includes several referenced objects • each object is addressable by a URL, e. g. , http: //www. someschool. edu/some. Dept/pic. gif path name host name HTTP: 2 -3
HTTP: 2 -4
Internet Protocols • Communication protocol: how computers talk • telephone “protocol”: how you answer and end a call, what language you speak, etc. • Internet protocols developed as part of ARPANET research • ARPANET began using TCP/IP in 1982 • Designed for use both within local area networks (LAN’s) and between networks 5
What is a protocol and http? • A Protocol is a standard procedure for defining and regulating communication. • i. e. TCP, UDP, HTTP, etc. • HTTP is the foundation of data communication for the World Wide Web • The HTTP is the Web’s application-layer protocol for transferring various forms of data between server and client such as • plaintext • hypertext • image • videos and • Sounds • etc.
HTTP: • It is the standard protocol for communication between web browsers and web servers. • It defines: • • • how a client and server establish a connection, how the client requests data from the server how the server responds to that request how data is transferred from the server back to the client. and finally, how the connection is closed • It assumes very little about a particular system, and does not keep state between different message exchanges. • This makes HTTP a stateless protocol. • The communication usually takes place over TCP/IP, • The default port for TCP/IP is 80, but other ports can also be used.
Hypertext Transport Protocol (HTTP) • HTTP is based on the request-response communication model: • Client sends a request • Server sends a response • HTTP is a stateless protocol: • The protocol does not require the server to remember anything about the client between requests. • If a particular client asks for the same object twice in a period of a few seconds, the server does not respond by saying that it just served the object to the client; instead, the server resends the object, as it has completely forgotten what it did earlier.
HTTP overview (continued) HTTP is “stateless” ? ? • Server maintains no information about past client requests • Server does not remember any previous requests. • If a particular client asks for the same object twice in a period of a few seconds, • the server does not respond by saying that it just served the object to the client; • instead, the server resends the object, as it has completely forgotten what it did earlier. • protocols that maintain “state” are complex! • • past history (state) must be maintained if server/client crashes, their views of “state” may be inconsistent, must be reconciled HTTP: 2 -9
How http works? • HTTP is implemented in two programs: • a client program and a server program, executing on different end systems, • talk to each other by exchanging HTTP messages. • The HTTP client first initiates a TCP connection with the server. Once the connection is established, the browser and the server processes access TCP through their socket interfaces.
HTTP • Normally implemented over a TCP connection • (80 is standard port number for HTTP) • Typical browser-server interaction: 1. Client • User enters Web address in browser • Browser uses DNS to locate IP address • Browser opens TCP connection to server • Browser sends HTTP request over connection 2. Server • Server sends HTTP response to browser over connection 3. Client • Browser displays body of response in the client area of the browser window
HTTP connections non-persistent HTTP • At most one object sent over TCP connection, connection then closed • Downloading multiple objects required multiple connections • Separate TCP connection is needed to serve each resource (object). persistent HTTP • Multiple objects can be sent over single TCP connection between client and server • Single TCP connection is needed to serve multiple resources. • Server leaves the connection open even after serving the request and closes connection on timeout. HTTP: 2 -12
Non-persistent HTTP (contains text, and a references to style. css ) suppose user enters URL: www. just. edu. jo/~zasharif/Web/SE 432/se 432. html 1 a. HTTP client initiates TCP connection to HTTP server (process) at www. just. edu. jo on port 80 1 b. HTTP server at host www. just. edu. jo waiting for TCP connection at port 80. “accepts” connection, notifying client 2. HTTP client sends HTTP request message (containing URL) into TCP connection socket. Message indicates that client wants object from the folders: Web/SE 432/se 432. html 3. HTTP server receives request message, forms response message containing requested object, and sends message into its socket time HTTP: 2 -13
Non-persistent HTTP (cont. ) 4. HTTP server closes TCP connection. 5. HTTP client receives response message containing html file, displays html. Parsing html file, finds 10 referenced jpeg objects time 6. Steps 1 -5 repeated for each of the referenced objects
HTTP Messages • Request and Response Message Formats
HTTP request message: general format • Structure of the request: 1. 2. 3. 4. start line/request header field(s) blank line/empty optional body
Example: HTTP request message • HTTP request message: • ASCII (human-readable format) request line (GET, POST, HEAD commands) header lines carriage return, line feed at start of line indicates end of header lines carriage return character line-feed character GET /~zasharif/Web/SE 432. html HTTP/1. 1rn Host: www. just. edu. jorn User-Agent: Firefox/3. 6. 10rn Accept: text/html, application/xhtml+xmlrn Accept-Language: en-us, en; q=0. 5rn Accept-Encoding: gzip, deflatern Accept-Charset: ISO-8859 -1, utf-8; q=0. 7rn Keep-Alive: 115rn Connection: keep-alivern
HTTP Request: Start Line • Start line • Example: GET /~zasharif/Web/SE 432. html HTTP/1. 1 • Three space-separated parts: • HTTP request method • Request-URI (Uniform Resource Identifier) • Request-URI is the portion of the requested URI that follows the host name (which is supplied by the required Host header field) • In addition to http, some other URL schemes are https, ftp, mailto, and file • HTTP version • We will cover 1. 1, in which version part of start line must be exactly as shown
HTTP Request: Common Request Methods • GET • Used if link is clicked or address typed in browser • No message-body in request with GET method • POST • Used when submit button is clicked on a form • Form information contained in the message-body of request • HEAD • Requests that only header fields (no body) be returned in the response
What about HTTP 2 ? Method types HTTP/1. 0: • GET • POST • HEAD • asks server to leave requested object out of response HTTP/1. 1: • GET, POST, HEAD • PUT • uploads file in entity body to path specified in URL field • DELETE • deletes file specified in the URL field HTTP: 2 -20
HTTP Request: Header field(s) • Header field structure: • Field-name : Field-value • Syntax • Field name is not case sensitive • Field value may continue on multiple lines by starting continuation lines with white space • Field values may contain MIME types, quality values, and wildcard characters(*’s) • Multipurpose Internet Mail Extensions (MIME) • Standardized way to indicate the nature and format of a document (IETF RFC 6838) • Convention for specifying content type of a message • In HTTP, typically used to specify content type of the body of the response • MIME content type syntax: • top-level type / subtype • Examples: text/html, image/jpeg • Example header field with quality values: accept: text/xml, text/html; q=0. 9, text/plain; q=0. 8, image/jpeg, image/gif; q=0. 2, */*; q=0. 1
HTTP Request: Common header fields • Host: host name from URL (required) • User-Agent: type of browser sending request • Accept: MIME types of acceptable documents • Connection: value close tells server to close connection after single request/response • Content-Type: MIME type of (POST) body, normally application/x-www-form-urlencoded • Content-Length: bytes in body • Referrer: URL of document containing link that supplied URI for this HTTP request
HTTP Response • Structure of the response: 1. 2. 3. 4. status line header field(s)/lines blank line optional body 23
Example: HTTP response message status line (protocol status code status phrase) header lines data, e. g. , requested HTML file HTTP/1. 1 200 OKrn Date: Sun, 26 Sep 2010 20: 09: 20 GMTrn Server: Apache/2. 0. 52 (Cent. OS)rn Last-Modified: Tue, 30 Oct 2007 17: 00: 02 GMTrn ETag: "17 dc 6 -a 5 c-bf 716880"rn Accept-Ranges: bytesrn Content-Length: 2652rn Keep-Alive: timeout=10, max=100rn Connection: Keep-Alivern Content-Type: text/html; charset=ISO-8859 -1rn data data. . .
The Get. Responses program, I • Here’s just the skeleton of the program that provided the output on the last slide: • import java. net. *; import java. io. *; public class Get. Responses { public static void main(String [ ] args) { try {. . . interesting code goes here. . . } catch(Exception e) { e. print. Stack. Trace(); } } }
The Get. Responses program, II • Here’s the interesting part of the code: • URL url = new URL(args[0]); URLConnection c = url. open. Connection(); System. out. println("Status line: "); System. out. println('t' + c. get. Header. Field(0)); System. out. println("Response headers: "); String value = ""; int n = 1; while (true){ value = c. get. Header. Field(n); if (value == null) break; System. out. println('t' + c. get. Header. Field. Key(n++) + ": " + value); }
Viewing the response • Here’s an example (from Get. Responses): • > java Get. Responses http: //www. just. edu. jo • Status line: HTTP/1. 1 200 OK Response headers: Date: Wed, 10 Sep 2017 00: 26: 53 GMT Server: Apache/1. 3. 26 (Unix) PHP/4. 2. 2 mod_perl/1. 27 mod_ssl/2. 8. 10 Open. SSL/0. 9. 6 e Last-Modified: Tue, 09 Sep 2017 19: 24: 50 GMT ETag: "1 c 1 ad 5 -1654 -3 f 5 e 2902” Accept-Ranges: bytes Content-Length: 5716 Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Content-Type: text/html
HTTP Response: Common header fields • Connection, Content-Type, Content-Length • Date: date and time at which response was generated (required) • Location: alternate URI if status is redirection • Last-Modified: date and time the requested resource was last modified on the server • Expires: date and time after which the client’s copy of the resource will be out-of-date • ETag: a unique identifier (hashcode) for this version of the requested resource (changes if resource changes)
HTTP Response: Status Codes • The client can initiate requests to the server. • In return, the server responds with status codes and message payloads • (i. e. index. html) • The status code is important and tells the client how to interpret the server response • The HTTP specification defines certain number ranges for specific types of responses • Three-digit number • First digit is class of the status code: • 1 xx: Informational Messages • 2 xx: Successful • 3 xx: Redirection • 4 xx: Client Error • 5 xx: Server Error • Other two digits provide additional information • See http: //www. w 3. org/Protocols/rfc 2616 -sec 10. html And https: //www. w 3. org/Protocols/rfc 2616. html
Common Status Codes • 200 OK • Everything worked, here’s the data • 301 Moved Permanently • requested object moved, new location specified later in this msg (Location: ) • 302 Moved temporarily • URL temporarily out of service, keep the old one but use this one for now • 400 Bad Request • There is a syntax error in your request • 403 Forbidden • You can’t do this, and we won’t tell you why • 404 Not Found • No such document • 408 Request Time-out, 504 Gateway Time-out • Request took too long to fulfill for some reason • 505 HTTP Version Not Supported
Web Caching? Web Caching: 31
Web Caching? • Fetching something over the network is both slow and expensive. • Large responses require many roundtrips between the client and server, • This requires extra processing from the browser and incurs extra costs for the visitor (bandwidth) and the visited server. • The ability to cache and reuse previously fetched resources is a critical aspect of performance optimization • Almost, every browser coms with an implementation of an HTTP cache. • When you visit a web page, the browser stores the web page in cache to make it load faster in subsequent visits. • All you need is to ensure that each server response: • Provides the correct HTTP header directives • Use these headers to instruct the browser on: • when and for how long the browser can cache the response Web Caching:
Web Caching? • Storing copies of recently accessed web pages • Pages are delivered from the cache when requested again 1. Browser caches 2. Proxy caches • Why Cache? • Shorter response time • Reduced bandwidth requirement • Reduced load on servers • Access control and logging Web Caching:
Browser Caching vs. Proxy Caching • Browser Caching • Local hard drive space stores representation of viewed content. • Usefulness • Recently viewed pages (Back/Forward button) • Commonly used images • Proxy Caching • Similar concept, but for multiple users. • Usually implemented on a firewall or separate device known as intermediaries (Proxies). • Usefulness • Latency and network traffic are reduced
Cache Controlling • HTTP Headers • Automatically created by Web server • Sent before HTML • Seen by browser and proxy caches • HTTP Defining Mechanisms • Freshness – Content is able to be loaded from cache without having to check with the original server • Validation – The process of confirming with the original server whether or not cached content is still valid to load
HTTP Response Headers Example Sample Response Header Freshness Validation HTTP/1. 1 200 OK Date: Fri, 20 Oct 2017 13: 19: 41 GMT Server: Microsoft IIS/8. 5 Cache-Control: max-age=3600, must-revalidate Expires: Fri, 20 Oct 2017 14: 19: 41 GMT Last-Modified: Mon, 16 Oct 2017 02: 28: 12 GMT ETag: "3 e 86 -410 -3596 fbbc" Content-Length: 1024 Content-Type: text/html Web Caching: 36
Expires Header and Freshness Expires: Fri, 20 Oct 2017 14: 19: 41 GMT • Indicates how long the representation is fresh. • After this time passes, the cache will communicate with the original server to see if there have been any changes. • Beneficial for static page images as well as continually changing content • Web server and cache must be synchronized
Last-Modified and Validation Last-Modified: Mon, 16 Oct 2017 02: 28: 12 GMT • Validate cache by looking at the last time the document was altered (Last-Modified). • If-Modified-Since request is sent to the original server. • If changes have been made since the date given, the entire document is returned. Otherwise, the cached document can be loaded.
ETag and Validation ETag: "3 e 86 -410 -3596 fbbc" • Unique identifiers (hash code) created by server • Changed each time the representation is altered on the original server • If-None-Match request is sent to server and a simple comparison is used to validate the content.
Browser Caching Ex. • This example shows that the server returns: First Request • a 1024 -byte response, • instructs the client to cache it for up to 120 seconds, • and provides a validation token ("x 234 dff") that can be used after the response has expired to check if the resource has been modified. • ETag validation token enables efficient resource update checks • No data is transferred if the resource has not changed. • It is used as a fingerprint of the file contents • On the second request, the client only needs to send it to the server • The server checks the token against the current resource • If the token hasn't changed, the server returns a "304 Not Modified" response • This means, cache hasn't changed and it can be renewed for another 120 seconds. • Note: no need to download the response again, which saves time and bandwidth. When the server returns a response, it includes a set of HTTP headers. i. e. content-type, length, caching directives, validation token, etc. Second Request
Cache-Control • Each resource can define its caching policy via the Cache-Control HTTP header. • Cache-Control directives control • who can cache the response, • under which conditions, • and for how long. "no-cache" v the returned response can't be used to satisfy a subsequent request to the same URL without first checking with the server if the response has changed. v ETag token and “no-cache” incurs a roundtrip to validate the cached response, but eliminates the download if the resource has not changed. "no-store" v disallows the browser and all intermediate caches from storing any version of the returned response v Thus, every time the user requests this URL, a request is sent to the server and a full response is downloaded. "max-age" v specifies the maximum time in seconds that the fetched response is allowed to be reused from the time of the request. Web Caching: 41
Web Catching: Proxy
What is a Web Proxy (Web Cache)? • A proxy is a host which relays web access requests from clients • Used when clients do not access the web directly • Used for security, logging, accounting and performance • Typically a Web cache is purchased and installed by an ISP. • For example, a university might install a cache on its campus network and configure all of the campus browsers to point to the cache. browser proxy Web Caching: web
Web caches (proxy server) goal: satisfy client request without involving origin server • User sets browser: Web accesses via cache • Browser sends all HTTP requests to cache • If object is in cache: cache returns object • else cache requests object from origin server, then returns object to client HT TP H client TTP res proxy st e u req server req ues P e T t ons HT pon eq Pr T HT se est u p res P T HT origin server e ns o p es r TP HT client origin server
Conditional GET • Goal: don’t send object if cache has up-to-date cached version • no object transmission delay • lower link utilization • cache: specify date of cached copy in HTTP request server client HTTP request msg If-modified-since: <date> HTTP response HTTP/1. 0 304 Not Modified object not modified before <date> If-modified-since: <date> • server: response contains no object if cached copy is up-todate: HTTP/1. 0 304 Not Modified HTTP request msg If-modified-since: <date> HTTP response HTTP/1. 0 200 OK <data> object modified after <date>
Web caches (proxy server) 1. The browser: • establishes a TCP connection to the Web cache and • sends an HTTP request for the object to the Web cache 2. The Web cache: • checks to see if it has a copy of the object stored locally • If it does: • the Web cache returns the object within an HTTP response message to the client browser • No request from the original server is made Web Caching:
Web caches (proxy server) 3. If the Web cache does not have the object: • The Web cache: • opens a TCP connection to the origin server. • then sends an HTTP request for the object into the cache-toserver TCP connection. • Origin server • After receiving this request, it sends the object within an HTTP response to the Web cache • When the Web cache receives the object: • it stores a copy in its local storage and • sends a copy, within an HTTP response message, to the client browser (over the existing TCP connection between the client browser and the Web cache).
Web Caches (proxy server) • Note that a Web Cache is both a server and a client at the same time. • When it receives requests from and sends responses to a browser, it is a server. • When it sends requests to and receives responses from an origin server, it is a client. • Web caching has seen deployment in the Internet for two reasons: • First, a Web Cache can substantially reduce the response time for a client request, • Second, Web Caches can substantially reduce traffic on an institution’s access link to the Internet (reducing costs). • Web caches can substantially reduce Web traffic in the Internet as a whole, • thereby improving performance for all applications.
Summary: Web Caching • cache acts as both client and server • server for original requesting client • client to origin server • typically cache is installed by ISP (university, company, residential ISP) why Web caching? • reduce response time for client request • reduce traffic on an institution’s access link • Internet dense with caches: enables “poor” content providers to effectively deliver content (so too does P 2 P file sharing) Web Caching: 2 -49
- Slides: 49