URLs and Resources HerngYow Chen 1 Outline n

  • Slides: 31
Download presentation
URLs and Resources Herng-Yow Chen 1

URLs and Resources Herng-Yow Chen 1

Outline n n Navigating the Internet’s Resources URL syntax n n URL Shortcuts that

Outline n n Navigating the Internet’s Resources URL syntax n n URL Shortcuts that many web clients support: n n n and what the various URLs mean and do relative URLs and expanded URLs URL encoding and character rules Common URL schemes The future of URLs, including URNs 2

Navigating a resource by URL, which tells a web client 1. 2. 3. URL

Navigating a resource by URL, which tells a web client 1. 2. 3. URL scheme: how to access the resource Server location: where the resource is hosted Resource path: what particular local resource on the server is being requested http: //english. csie. ncnu. edu. tw/demo/index. html Web page Scheme (how) Host (where) Path (what) 3

URLs n URLs can direct you to resources available through protocols other than HTTP.

URLs n URLs can direct you to resources available through protocols other than HTTP. n n Email account: mailto: hychen@csie. ncnu. edu. tw A file resides on a FTP server: ftp: //ftp. ncnu. edu. tw/a_file. txt A video streamed by a video server: rtsp: //www. cnn. com/headline. rm Most URLs have the same “scheme: //server location/path” structure 4

Navigating a resource by URL, which tells a web client 1. 2. 3. URL

Navigating a resource by URL, which tells a web client 1. 2. 3. URL scheme: how to access the resource Server location: where the resource is hosted Resource path: what particular local resource on the server is being requested http: //english. csie. ncnu. edu. tw/demo/index. html Web page Scheme (how) Host (where) Path (what) 5

URL Syntax n <scheme>: //<user>: <password>@<host>: <port>/<path>; <params>? <query>#<frag> 6

URL Syntax n <scheme>: //<user>: <password>@<host>: <port>/<path>; <params>? <query>#<frag> 6

Scheme: what protocol to use n n n The scheme is really the main

Scheme: what protocol to use n n n The scheme is really the main identifier of how to access a given resource. The scheme must start with an alphabetic character, and it is separated from the rest of the URL by the first “: ” character. Scheme names are case-insensitive. 7

Usernames and Passwords n Many servers require a username and password before you can

Usernames and Passwords n Many servers require a username and password before you can access data through them. For examples: n n n ftp: //ftp. prep. ai. mit. edu/pub/gnu ftp: //anonymous@ftp. perp. ai. mit. edu/pub/gnu ftp: //anonymous: my_passwd@ftp. prep. ai. mit. edu/pub/gnu http: //joe: joespasswd@www. joes-hardware. com/sales_info. txt The default username and password n n “anonymous” for username “Internet Explorer sends “IEUser” for password, while Netscape send “mozilla”. 8

Hosts and Ports n n n The host component (IP or Domain Name) identifies

Hosts and Ports n n n The host component (IP or Domain Name) identifies that host machine on the Internet that has access to the resource. The port component identifies the network port on which the server is listing. Different services uses different default ports for a machine. n n HTTP: 80 FTP: 21 Telnet: 23 SMTP: 25 9

Paths n n The path component of the URL specifies where on the server

Paths n n The path component of the URL specifies where on the server machine the resource lives. The path often resembles a hierarchical filesystem path. For example: n n http: //www. csie. ncnu. edu. tw/course/1998. html The path in the URL is “ /course/1998. html”, which resembles a filesystem path on a UNIX filesystem. The path component for HTTP URLs can be divided into path segments separated by“ /”. Each path segment can have its own params component (described later). 10

Parameters n n n For many schemes, a simple host and path to the

Parameters n n n For many schemes, a simple host and path to the object just aren’t enough. Aside from what port the server is listening to and even whether or not you have access to the resource with a username and password, many protocols require more information to work. For example, n n ftp: //ftp. ncnu. edu. tw/image. gif; type=a ftp: //ftp. ncnu. edu. tw/program. exe; type=i 11

Query strings n Some resources, such as database, can be queried according to input

Query strings n Some resources, such as database, can be queried according to input strings. For example: n n http: //www. xxx. tw/a. cgi? id=123&name=abc There is no requirement for the format of the query component, except that some characters are illegal. By convention, many gateways except the query to be formatted as a series of “name=value” pairs, separated by “&” characters. 12

Query Strings http: //english. csie. ncnu. edu. tw/course/NWSMLViewer. php? lectureid=rctlee-20030909125212 lectureid=rctlee 20030909125212 Internet Server

Query Strings http: //english. csie. ncnu. edu. tw/course/NWSMLViewer. php? lectureid=rctlee-20030909125212 lectureid=rctlee 20030909125212 Internet Server “viewer” gateway 13

Fragments n Some finer resource fragments, such as sessions in a large HTML document

Fragments n Some finer resource fragments, such as sessions in a large HTML document , can friendly be accessed. For example, n n n http: //engquiz. csie. ncnu. edu. tw/e-book/html/B 001. html#page 10 Because HTTP servers generally deal only with entire objects, not with fragments of objects, clients don’t pass fragments along to servers. Namely, the whole object is retreived, but only the partial content is displayed. Note that in Range Request feature of HTTP/1. 1, agents may request byte ranges of objects. (later lectures) 14

Fragments (a)User selects link to “http: //www. csie. ncnu. edu. tw/~hy chen/web_tech/#Resource” (Fragment is

Fragments (a)User selects link to “http: //www. csie. ncnu. edu. tw/~hy chen/web_tech/#Resource” (Fragment is NOT sent to the server) (b)Browser makes request to http: //www. csie. ncnu. edu. tw/~hychen/web _tech/ Internet www. csie. ncnu. edu. tw Client (c)Server returns entire HTML page Browser scrolls down to star at named “Resource” fragment (d)Browser displays HTML page starting with named ”Resource”fragment 15

URL shortcuts n n n Web clients understand use a few URL shortcuts. Many

URL shortcuts n n n Web clients understand use a few URL shortcuts. Many browsers also support automatic expansion of URLs, where the user can type in a key (memorable) part of a URL, and the browser fills in the rest. Relative URLs Base URLs Resolving relative references Expanded URLs 16

Relative URLs n n n URLs comes in two flavors: absolute and relative. So

Relative URLs n n n URLs comes in two flavors: absolute and relative. So far, we have looked only at absolute URLs, all the information you need to access a resource. On the other hand, relative URL is incomplete. To get all the information need to access a resource, a relative URL must be interpreted on the basis of another URL, called its base. 17

HTML snippet with relative URL <HTML> <HEAD> <TITLE> Joe’s Tools </TITLE> </HEAD> <BODY> <H

HTML snippet with relative URL <HTML> <HEAD> <TITLE> Joe’s Tools </TITLE> </HEAD> <BODY> <H 1> Tools page </H 1> <H 2> Hammers </H 2> <P> Joe’s HARDWARE online has the largest selection of <A href= “. /hammers. html”> hammers </A> on earth. </BODY> </HTML> 18

Using a base URL Base URL: Relative URL: http: //www. joes-hardware. com/tools. html .

Using a base URL Base URL: Relative URL: http: //www. joes-hardware. com/tools. html . /hammers. html http: //www. joes-hardware. com/hammers. html New absolute URL 19

Base URLs n The first step in the conversion process is to find a

Base URLs n The first step in the conversion process is to find a base URL, which can come from a few places. n Explicitly provided in the resource n n Base URL of the encapsulating resource n n n Use <BASE> tag to define the base URL Does not explicitly specify a base URL. Use the URL of the resource in which the document is imbedded as a base, as the example in the preceding slide. No base URL n In some instances, there is no base URL. This often means that you have an absolute URL; however, sometimes you just have an incomplete or broken URL. 20

Resolving relative references 21

Resolving relative references 21

Expanded URLs n Some browser try to expand URLs automatically, either after you submit

Expanded URLs n Some browser try to expand URLs automatically, either after you submit the URL or while you’re typing. This provides users with a shortcut: they don’t have to type in the complete URL. n Hostname expansion n n Ex: yahoo www. yahoo. com History expansion n Ex: http: //www. ncnu. edu. tw 22

Shady characters in URLs n n n URLs were designed to be portable, to

Shady characters in URLs n n n URLs were designed to be portable, to uniformly name all the resources on the Internet. This means that the URLs will be transmitted through various protocol. Because different protocols (schemes) use different mechanisms for transmitting, it is important for the URLs to be transmitted safely, namely without losing information, through any protocols over network. Some protocols, such as the Simple Mail Transfer Protocol (SMTP) for email, use a 7 -bit encoding for message; this can strip off certain characters if the source is encoded in 8 bits or more. 23

Shady characters in URLs n n n URLs are permitted to contain only characters

Shady characters in URLs n n n URLs are permitted to contain only characters from a relatively small, universally safe alphabet. In addition to the transportable issue, URLs should be readable. Hence, some invisible, nonprinting characters also are prohibited in URLs, even though these character may pass through mailers. To complete matter further, URLs also need to be complete. One day people would want URLs to contain binary data or characters outside of the universally safe of alphabets. So, an escape mechanism was added. 24

The URL Character Set n n n US-ASCII is very portable, due to its

The URL Character Set n n n US-ASCII is very portable, due to its long legacy. It uses 7 bits to represent most keys available on an English typewriter and a few non-printing control character for text formatting and hardware signal. But it doesn’t support the inflected characters common in European languages or non-Romanic language read. Want to contain arbitrary binary data. Use escape sequences allow the encoding of arbitrary values using restricted subset of the USASCII character set, yielding portability and completeness. 25

Encoding mechanism n n Simply represents the unsafe character by an “escape” notation, consisting

Encoding mechanism n n Simply represents the unsafe character by an “escape” notation, consisting of a percent sign (%) followed by two hexadecimal digits. For example n n n ~ 0 x 7 E, http: //www. ncnu. edu. tw/%7 Ehychen Space 0 x 20, http: //www. abc. com/web%20 tools. html % 0 x 25, http: //www. abc. com/100%25 satisfaction. html 26

Character Restrictions n % /. . . # ? ; : $, + @&=

Character Restrictions n % /. . . # ? ; : $, + @&= {}|^~[]’ n <>” n n n n 0 x 00 -0 x 1 F, 0 x 7 F >0 x 7 F escape token path delimiter Path component fragment delimiter Query-string delimiter params delimiter to delimit the scheme, user/password, and host/port Reserved - special meaning in some scheme Restricted unsafe handling by various transport agent, such as gateway Unsafe; should be encoded have meaning outside the scope of URL Restricted fall within nonprintable range Restricted fall within this range do not fall within 7 -bit range of US-ASCII 27

Common scheme format n n n n http, https mailto ftp rtsp, rtspu file

Common scheme format n n n n http, https mailto ftp rtsp, rtspu file News telnet 28

The Future: URN? STEP 1: Ask the resource resolver what the Joe’s Hardware URL

The Future: URN? STEP 1: Ask the resource resolver what the Joe’s Hardware URL is. Receive from the resolver the current location of the resource STEP 2: Get the actual URL for the resource Get http: //purl. oclc. org/jhardware/ Internet Client Purl. oclc. org Actual: http: //www. joes-hardware. com/ Get http: //www. joes-hardware. com Internet Client www. joes-hardware. com 29

URI Universal Resource Identifier n URIs defined in RFC 1630. (1994) URI is a

URI Universal Resource Identifier n URIs defined in RFC 1630. (1994) URI is a superset of URL and URN. n Full URI: n proto: //hostname/path http: //www. csie. ncnu. edu. tw: 80/~hychen/ n Partial URI: /path Identifies the Server /~hychen/ No server mentioned 30

URLs information n http: //www. w 3. org/Addressing/ n n http: //www. ietf. org/rfc

URLs information n http: //www. w 3. org/Addressing/ n n http: //www. ietf. org/rfc 1738. txt n n RFC 2141, “URN Syntax, ” by R. Moats. http: //purl. oclc. org n n RFC 2396, “Uniform Resource Identifiers (URI): Generic Syntax, ” by T. Berners-Lee, R. Fielding, and L. Masinter. http: //www. ietf. org/rfc 2141. txt n n RFC 1738, “Uniform Resource Locators (URL), ” by T. Berners-Lee, L. Masinter, and M. Mc. Cahill. http: //www. ietf. org/rfc 2396. txt n n The W 3 C page about naming and addressing URIs and URLs. The persistent uniform resource locator web site. http: //www. ietf. org/rfc 1808. txt n RFC 1808, “Relative Uniform Resource Locators, ” by R. Fielding. 31