The Internet CT 101 Computing Systems Contents The

Contents • The Internet – – – – Architecture Addressing Protocols DNS E-Mail WWW

The Internet • The Internet: An internet that spans the world – Original goal

Internet Architecture • Internet Service Provider (ISP) – Tier-1 – Tier-2 – Tire 1

Internet Addressing • IP address: pattern of 32 or 128 bits often represented in

Internet Software Layers • Application: Constructs message with address • Transport: Chops message into

Following a message through the Internet 4 -9

Internet Protocols - TCP/IP Protocol Suite • Transport Layer – TCP – UDP •

DNS • IP addresses are difficult to remember – John. Doe@140. 203. 8. 22

DNS Name Space • Internet is divided into over 200 top level domains –

Domain Names • Can be either absolute (ends with a period i. e. eng.

Name servers • One DNS server could service all requests – In practice it

Name Servers – Lookup mechanism • In the example, a resolver on flits. cs.

Internet Corporation for Assigned Names & Numbers (ICANN) • Allocates IP addresses to ISPs

Traditional Internet Applications • Electronic Mail (email) – Domain mail server collects incoming mail

More Recent Applications • Voice Over IP (Vo. IP) • Internet Radio – N-unicast

E-Mail • • Architecture and services User agent Message formats Message transfer agents –

E-Mail Architecture • E-mail system consists of two parts – User agents, which allow

E-Mail functions • E-mail system functions – Composition – refers to the process of

E-mail functions • E-mail system functions – Reporting – has to do with telling

E-mail architecture and functions • Distinction between envelope and its contents; – envelope encapsulates

E-mail user agent • Sending e-mail – User must provide the message and the

E-mail message format • Basic ASCII e-mail message using RFC 822 – Messages consists

Email message transfer • Message transfer mechanism is concerned with relaying messages from the

SMTP Protocol • Connection establishment (on port 25) • Data exchange – the client

SMTP typical problems • Some old implementations can’t handle more than 64 KB message

Final delivery • Assuming that all machines can send and receive mail all the

Final delivery (a) Sending and reading mail when the receiver has a permanent Internet

POP 3 • Starts when the user starts the mail reader • Mail reader

IMAP • POP 3 works fine for users with one e-mail account with one

Web Mail • Various companies (i. e. Hotmail and Yahoo) provide e-mail service using

WWW • • • Architectural Overview Static Web Documents Dynamic Web Documents HTTP –

Architecture Overview Web is a collection of web pages Each page contains links to

Architectural Overview • Browser displays a page on the client machine – Click on

Client side • Pages are named using URL (Uniform Resource Locators) (i. e. http:

Client side • The browser may have problems interpreting all of these formats …

Server side • Typical web server operations: – Accept TCP connection – Get the

Server side • If too many requests come into one second, the CPU will

URL – Uniform Resource Locator • Some common URLs

Stateless and cookies • Web is stateless, no concept of login session; the browser

HTML – Hyper. Text Markup Language • (a) HTML source code – Markup language

Hypertext Document Format • Encoded as text file • Contains tags to communicate with

HTML versions • HTML 1. 0 was one way • Large demand for two-way

Forms • • • (a) HTML source code (containing forms) (b) Formatted page Possible

Dynamic Web documents • So far, the model we have used is a static

Server side dynamic web page • Steps in processing a HTML form

PHP form processing example (a) A Web page containing a form. (b) A PHP

Client-Side Web Page Generation • CGI, PHP, JSP and ASP solve the problem of

Java. Script used for processing a form • Same output as the PHP example

Dynamic web page generation (a) Server-side scripting with PHP. (b) Client-side scripting with Java.

HTTP • Hyper. Text Transfer Protocol is the transfer protocol used throughout the World

Internet Security • Attacks – Malware (viruses, worms, Trojan horses, spyware, phishing software) –

Attacks • Virus – sw that infects a computer by inserting itself into programs

Protection • Firewall – primary prevention technique to filter traffic passing through a point

Encryption • FTPS, HTTPS, SSL • Public-key Encryption – Public key: Used to encrypt

Encryption • Many traditional Internet applications have been altered to incorporate encryption techniques, producing

References • Andrew S. Tanenbaum – Computer Networks, ISBN: 0 -13066102 -3 • J

Slides: 68

Download presentation

The Internet CT 101 –Computing Systems

Contents • The Internet – – – – Architecture Addressing Protocols DNS E-Mail WWW Security

The Internet • The Internet: An internet that spans the world – Original goal was to develop a means of connecting networks that would not be disrupted by local disasters. – Today it has shifted from an academic research project to a commercial undertaking. 4 -3

Internet Architecture • Internet Service Provider (ISP) – Tier-1 – Tier-2 – Tire 1 and tier 2 ISPs are networks of routers that collectively provide the Internet’s communication infrastructure • Access ISP: Provides connectivity to the Internet – – Traditional telephone (dial up connection) Cable connections DSL Wireless

Internet Composition

Internet Addressing • IP address: pattern of 32 or 128 bits often represented in dotted decimal notation – E. g. 140. 203. 8. 22 represents an Internet address – 100011001011 00001000 00010110 • Mnemonic address (alternative addressing system more suitable for humans) – Domain names – Top-Level Domains

Package-shipping example

Internet Software Layers • Application: Constructs message with address • Transport: Chops message into packets • Network: Handles routing through the Internet • Link: Handles actual transmission of packets

Following a message through the Internet 4 -9

Internet Protocols - TCP/IP Protocol Suite • Transport Layer – TCP – UDP • Network Layer – IP (IPv 4 and IPv 6)

Choosing between TCP and UDP

DNS • IP addresses are difficult to remember – John. Doe@140. 203. 8. 22 would be difficult to remember – If John. Doe’s mail server moves to another machine, then his e-mail address would not be valid anymore – Something like John. Doe@wuzwuz. ucg. ie would be appropriate • Need some mechanisms to translate wuzwuz. ucg. ie to the IP address – To solve this problem DNS was invented • It is a hierarchical, domain-based naming scheme and a distributed database system for implementing this naming scheme • Usage: – Map name onto an IP address, an application program calls an library procedure, called resolver, passing it the name as a parameter (i. e. gethostbyname() is an resolver) – The resolver sends UDP packet to a local DNS server which looks up the name and returns the IP address to the resolver – The resolver returns the IP address to the application, which can establish an TCP/IP connection with the destination (or send UDP packets)

DNS Name Space • Internet is divided into over 200 top level domains – Each domain is divided into sub-domains, which are further partitioned, etc. . – All domains can be represented by a tree • The leaves of the tree represent domains that have no sub-domains (but contain machines) • A leaf domain may contain a single host or represent a company and contain thousands of hosts • Top level domains could be generic and country domains

Domain Names • Can be either absolute (ends with a period i. e. eng. sun. com. ) or relative (it doesn’t end with a dot) – Relative ones have to be interpreted in a context to find the true meaning – Both of them refers to a specific node in the tree and all the nodes under it • Are case insensitive (edu, EDU are same thing) • Components names can be up to 63 characters and full names should not exceed 255 characters • There is no rule against registering under two top level domains (sony. com and sony. nl) • Each domain controls how it allocates the domains under it – i. e. Japan makes a domains ac. jp and co. jp that mimic edu and com – Netherlands doesn’t make this distinction • To create a new domain, permission is required from the domain that will include it; once created, it can create sub-domains without having to ask permission from the higher up domains.

Name servers • One DNS server could service all requests – In practice it will be overloaded • To solve this, DNS name space is divided in non overlapping zones – Each zone contains some part of the tree and name servers holding zone info • A zone would have a primary DNS (gets info from disk) • One or more secondary DNS (get info from the primary DNS)

Name Servers – Lookup mechanism • In the example, a resolver on flits. cs. vu. nl is looking for IP address of linda. cs. yale. edu (using recursive query; some servers don’t implement recursive query and return the address of the next server to try) – The resolver sends a query containing the domain name sought – The query is forwarded by the local name server to the name server for domain edu, that is found in its database…. etc • Once the records get back to cs. vu. nl name server, they will be entered in a local cache, in case they are needed later;

Internet Corporation for Assigned Names & Numbers (ICANN) • Allocates IP addresses to ISPs who then assign those addresses within their regions. • Oversees the registration of domains and domain names.

Traditional Internet Applications • Electronic Mail (email) – Domain mail server collects incoming mail and transmits outgoing mail – Mail server delivers collected incoming mail to clients via POP 3 or IMAP • File Transfer Protocol (FTP) • Telnet and SSH • WWW (World Wide Web)

More Recent Applications • Voice Over IP (Vo. IP) • Internet Radio – N-unicast – Multicast

E-Mail • • Architecture and services User agent Message formats Message transfer agents – SMTP • Final delivery

E-Mail Architecture • E-mail system consists of two parts – User agents, which allow people to read and send email • Local programs that provide a command based or graphical method for interacting with e-mail system – Message transfer agents, which move the messages from source to destination • Are typically system daemons or processes that run in background, having the job to move messages

E-Mail functions • E-mail system functions – Composition – refers to the process of creating messages and answers; although any text editor can be used for the text of the message, the system itself can provide assistance with addressing and numerous header fields attached to each message – Transfer – refers to moving messages from the originator to the recipient; this requires establishing a connection to the destination or some intermediate machine, outputting the message and releasing the connection

E-mail functions • E-mail system functions – Reporting – has to do with telling the originator what happened to the message; Was it delivered? Was it rejected? Was it lost? – Displaying – showing the incoming message is important, so the people can read their e-mail; sometime conversion or a special viewer is required (i. e. if the message is a PS message or an audio file) – Disposition – what the recipient does after the message has been received; possibilities include throwing it away before reading it, throwing it away after reading it, saving it and so on.

E-mail architecture and functions • Distinction between envelope and its contents; – envelope encapsulates the message and contains info needed for transporting the message, such as destination address, priority and security level – Message has two parts: headers (interpreted by the user agent) and body (info for the human recipient)

E-mail user agent • Sending e-mail – User must provide the message and the destination address (user@dns-address) – User agents may support mailing lists • Receiving e-mail – When an user agent is started, it looks at user’s mailbox before displaying anything – Then it may announce the number of messages in the mailbox

E-mail message format • Basic ASCII e-mail message using RFC 822 – Messages consists of a primitive envelope (described in RFC 821), some number of header fields, a blank line and then the message body – Each header field (logically) consists of a single line of ASCII text, a colon and, for most fields, a value – RFC 822 was designed long ago and doesn’t clearly distinguish between the envelope fields and the header fields • This was revised in RFC 2822, however, wasn’t possible to completely redo it due to the widespread usage

Email message transfer • Message transfer mechanism is concerned with relaying messages from the originator to the destination – This can be done by establishing an transport level connection between the source and the destination and then just transfer the message • SMPT – Simple Mail Transfer Protocol – Source machine establishes a TCP connection on port 25 on destination machine, where SMPT daemon listens. This daemon accepts the incoming connections and copies messages from them into the appropriate mailboxes – If a message can’t be delivered, an error report containing the first part of the undeliverable message is returned to the sender – It is a simple ASCII protocol

SMTP Protocol • Connection establishment (on port 25) • Data exchange – the client machine (operating as a client) waits for the destination machine (operating as a server) to talk first; – the server begins by sending a line of text giving its identity and telling whether is prepared to receive mail; • if it is not, then the client releases the connection and tries again latter – If the server is willing to accept mail, then the client announces whom the e-mail is coming from and whom it is going to – If such recipient exists at the server end, then the client get the goahead to send the message – The client sends the message, server acknowledges it • Connection is released

SMTP typical problems • Some old implementations can’t handle more than 64 KB message length • If the server and client have different timeouts, one of them may give up while the other is still busy, unexpectedly terminating the connection • In some situations infinite mail storms can be triggered – If host 1 is holding mailing list A and host 2 holds mailing list B and each list contains an entry for other one, then a message sent to either list could generate a never ending amount of e-mail traffic unless it is checked • RFC 281 defines ESMTP (Extended SMTP) – Clients wanting to use it should start initially with EHLO instead HELO; if this is rejected then the server is regular SMTP server

Final delivery • Assuming that all machines can send and receive mail all the time, the e-mail model so far works • This model breaks for people accessing Internet over a dialup connection – What happens when Elinor wants to send Carolyn e-mail and Carolyn is not currently online? • One solution is to have a message transfer agent on ISP machine; since this transfer agent can be online all the time, e-mail can be sent 24 hours a day – This solution creates another problem: how does the user gets email from ISP’s message transfer agent • Solution to create another protocol that allows user transfer agents (on client PCs) to contact the message transfer agent (on ISP’s machine) and allow email to be copied from ISP to the user • One such protocol is POP 3 (Post Office Protocol Version 3), RFC 1939

Final delivery (a) Sending and reading mail when the receiver has a permanent Internet connection and the user agent runs on the same machine as the message transfer agent. (b) Reading e-mail when the receiver has a dial-up connection to an ISP.

POP 3 • Starts when the user starts the mail reader • Mail reader calls up the ISP (if there is no connection) and establishes a TCP connection with the message transfer agent on port 110; – Authorization • Having user logged in by sending its username and password – Transactions • User collecting the e-mails and marking them for deletion – Update • Causes the e-mails to be deleted

IMAP • POP 3 works fine for users with one e-mail account with one ISP, accessed from one PC – If mail was accessed from different locations, user may loose emails, security issues may appear, etc • An alternative final delivery protocol, IMAP (Internet Message Access Protocol), defined in RFC 2060 – Instead assuming that all messages will be downloaded and work offline after that (like POP 3), IMAP assumes that all e-mail will remain on the server indefinite in multiple mailboxes – Provides extensive mechanisms to read messages or parts of messages, mechanisms to create, destroy and manipulate multiple mailboxes.

POP 3 vs. IMAP

Web Mail • Various companies (i. e. Hotmail and Yahoo) provide e-mail service using Web mail. • Normal message transfer agents are listening on port 25 for incoming SMTP connections • Messages are delivered using special web pages; when the user goes to the e-mail Web page, a form is presented in which the user is asked for a login name and password.

WWW • • • Architectural Overview Static Web Documents Dynamic Web Documents HTTP – The Hyper. Text Transfer Protocol Performance Enhancements The Wireless Web

Architecture Overview Web is a collection of web pages Each page contains links to other pages Hypertext – idea of having one page point to another. It is text, displayed on a computer, with references (hyperlinks) to other text that the reader can immediately follow Browser – program to view pages Hyperlinks – strings of text that are links to other pages Example: • Typical web page • The page reached by clicking on Department of Animal Psychology.

Architectural Overview • Browser displays a page on the client machine – Click on a link, the browser sends a message to the abcd. com web server asking it for the page – When page arrives, it is displayed; if it contains a hyperlink on a page on xyz. com, that is clicked, then the browser will send a message to xyz. com server and the process continues

Client side • Pages are named using URL (Uniform Resource Locators) (i. e. http: //www. abcd. com/products. html) – Name of protocol (http) – DNS name of the machine where the page is located (www. abcd. com) – The name of the file containing the page (products. html) • When the link is selected: – – – – Browser detects the URL (by reading the input) Browser asks DNS server for IP address of www. abcd. com DNS replies with IP address: 156. 106. 192. 32 Browser makes a TCP connection on port 80 to 156. 106. 192. 32 It sends a request asking for file /products. html Web server www. abcd. com sends file /products. html TCP connection is released Browser displays all the text in /products. html • Web pages are written in standard language called HTML – A page may consist of a formatted document in PDF format, an icon in GIF format, a video in MPEF format, a song in MP 3 format, or any other format

Client side • The browser may have problems interpreting all of these formats … rather than making the browsers larger and larger, a more general solution is adopted. • When a server returns a page, it usually returns some information about the page – MIME type of the page • Pages of type text/html are just displayed directly • If MIME type is not of a built in type, then the browser consults an internal table with associations between MIME types and viewers • Two possibilities – Plug-ins – special modules that the browser loads in its memory space – Helper applications – separate process that takes as parameter the name of the file to display

Server side • Typical web server operations: – Accept TCP connection – Get the name of the file requested – Get the file (from disk) – this can be a lengthy operation, since every disk access takes in average 5 ms (access time) + time to read the file (up the file length); – Return the file to the client – Release the TCP connection • Improvements – Maintain a cache with last “n” most accessed files – Multithreaded server

Server side • If too many requests come into one second, the CPU will not be able to manage the load, no matter how many disks are used in parallel – The solution is to add more nodes (computers), possible with replicated disks (server farms) – A front end still accepts incoming requests and “sprays” them over multiple CPUs rather than multiple threads – Individual machines may be multithreaded and pipelined as before

URL – Uniform Resource Locator

URL – Uniform Resource Locator • Some common URLs

Stateless and cookies • Web is stateless, no concept of login session; the browser sends a request to the server, and gets back a file; server forgets it seen that particular client – A quick solution would be to observe client’s IP addresses (not good since they could be NAT-ed) • Cookies (Netscape) solve this problem, by having the server supply additional information when a client requests a page; this info may include a cookie, that is a small (4 KB) file or string – Contains up to five fields – When the browser sends a request to a page, it first checks to see if it has an associated cookie with the domain the request is going to. If yes, then it appends this cookie to the request; the server gets it and interpret it any way it wants

HTML – Hyper. Text Markup Language • (a) HTML source code – Markup language containing explicit commands formatting • (b) Formatted page

Hypertext Document Format • Encoded as text file • Contains tags to communicate with browser – Appearance • <h 1> to start a level one heading • <p> to start a new paragraph – Links to other documents and content • <a href =. . . > – Insert images • <img src =. . . >

A simple Web page

A simple Web page (continued)

An enhanced simple Web page

An enhanced simple Web page (continued)

Typical HTML tags

HTML versions • HTML 1. 0 was one way • Large demand for two-way traffic – HTML 2. 0 included forms; contain boxes or buttons that allow users to fill in information or make choices and send the information back to the server

Forms • • • (a) HTML source code (containing forms) (b) Formatted page Possible response from the user (request from the browser to the server, with filled in values)

Dynamic Web documents • So far, the model we have used is a static one – A client requests a file, the server sends it back • Content has become more and more dynamic (generated on demand) – Content generation can take place at both server side and client side.

Server side dynamic web page • Steps in processing a HTML form

PHP form processing example (a) A Web page containing a form. (b) A PHP script for handling the output of the form. (c) Output from the PHP script when the inputs are "Barbara" and 24 respectively.

Client-Side Web Page Generation • CGI, PHP, JSP and ASP solve the problem of handling forms and interaction from forms – They can accept incoming information from forms, look up info in one or more databases, generate HTML pages with the results – They can’t respond to mouse movements or interact directly with users • Client side web page generation – It is required to have scripts embedded into HTML pages, scripts that will be executed on client’s machine; starting with HTML 4. 0, such scripts are permitted starting with tag <script> • Java. Script is one popular client side scripting language – Use of applets (small java programs compiled for JVM) • They can be embedded into the HTML code between <applet> … </applet> tags and can be interpreted by JVM capable browsers – Use of Active. X controls (compiled for the native processor, very fast)

Java. Script used for processing a form • Same output as the PHP example

Dynamic web page generation (a) Server-side scripting with PHP. (b) Client-side scripting with Java. Script.

HTTP • Hyper. Text Transfer Protocol is the transfer protocol used throughout the World Wide Web • It specifies what messages clients may send to servers and what responses they get back in return • Each interaction consists of one ASCII request followed by one RFC 822 MIME-like response – All clients and all servers should obey this protocol – It is defined in RFC 2616 • Connections – A browser contacts a server by making a TCP connection on port 80 • In HTTP 1. 0 after a connection was setup, one request was sent over, one response was sent back; then TCP connection was released • In HTTP 1. 1 persistent connections are supported

Internet Security • Attacks – Malware (viruses, worms, Trojan horses, spyware, phishing software) – Denial of service – Spam • Protection – Firewalls – Spam filters – Proxy Servers – Antivirus software

Attacks • Virus – sw that infects a computer by inserting itself into programs that already reside in the machine. When the host program is executed, the virus is also executed. • Worm – autonomous program that transfers itself through a network, taking up residence in computers and forwarding copies of itself to other computers. • Trojan horse – program that enters a computer system disguised as a desirable program (such a game or utility package) that is willingly imported by a victim. Once in the computer, the trojan horse performs additional activities (most likely with harmful effects). • Spyware (sniffing) – software that collects information about the activities of computer that it resides and sends that information back to the instigator of the attack. • Phishing – rather than obtaining the information in secret, this techniques asks explicitly for it. • Do. S (Denial Of Service) – attack from other machines in the network, by overloading the victim computer with messages. • Spam – abundance of unwanted messages (e-mail)

Protection • Firewall – primary prevention technique to filter traffic passing through a point in the network (e. g. installed on the router/gateway into an organization). Used also to protect individual computers too (rather then entire network) • Spam filters – specific purpose firewalls, designed to block unwanted e-mail • Proxy server – software unit that acts as intermediary between a client and a server with the goal of shielding the client from adverse actions of the server. This prevents the server to find out certain unwanted details about the client. • Antivirus software – software used to detect and remove presence of known viruses and other infections, specialized in virus control and spyware protection.

Encryption • FTPS, HTTPS, SSL • Public-key Encryption – Public key: Used to encrypt messages – Private key: Used to decrypt messages • Certificates and Digital Signatures

Encryption • Many traditional Internet applications have been altered to incorporate encryption techniques, producing what are called secure versions. – FTPS – secure version of FTP – HTTPS – secure version of HTTP using SSL (Secure Sockets Layer) • They involve public key encryption – One public key is used to encrypt messages – One private key is used to decrypt messages – Public key is first distributed to those that need to send messages in confidence to a particular destination.

Public-key encryption

References • Andrew S. Tanenbaum – Computer Networks, ISBN: 0 -13066102 -3 • J Glenn Brookshear “Computer Science – An Overview”, ISBN: 0 -321 -54428 -5