Operating Systems Networks Distributed computing Simple task email

Operating Systems, Networks, Distributed computing

Simple task – email to your uncle • Use my modem to connect to my Internet Service Provider (ISP) • My ISP decides whether the email is in its local area network (LAN) or in the wide area network (WAN) – the internet • Handoff (possibly via internet) to his ISP • His ISP delivers mail to him

Courtesy of http: //home. about. com/

The Internet • • • How does it work Where did it come from How has it grown Why it should/shouldn’t work Who owns the internet

Levels of the internet 1. Physical 2. Data Link 3. Network 4. Transport 5. Session 6. Presentation 7. Application

Levels of the internet 1. Physical Basic hardware components for networks. Wire from modem to machine, …. 2. Data Link Format of frames (how data is to be put together) e. g. bit/byte stuffing, checksum 3. Network Address assignment, Packet's forwarding methods ARP (Address Resolution Protocol) used to map an IP address into a hardware address. 4. Transport TCP (Transmission Control Protocol) protocol for the transfer of packets of data to a remote computer. Packets are guaranteed to arrive and in the correct order.

Levels of the internet 5. Session Establishing a communication session, Security, Authentication i. e. passwords 6. Presentation Computers represent data in different ways (char, integer) thus the protocol need to translate the data to and from the local node. 7. Application Specifications for applications using the network, how to send a request, how to specify a filename over the net, how to respond to a request etc. .

Levels of the Internet Physical Wires, etc. Data Link Format of data Network ARP to find address Transport TCP/IP Session Attaching Presentation Format of Data Application Telnet, WWW, SMTP, …

Different protocols • TCP/IP – The protocol that allows machines to communicate • FTP File Transfer Protocol – FTP enables transferring of text and binary files over TCP connection. • Telnet – Telnet enables users to use a TCP connection to login to remote hosts • SMTP Simple Mail Transfer Protocol – This protocol enables sending EMail messages over a TCP connection. • HTTP Hyper Text Transport Protocol – A protocol used to transfer hypertext pages across the world wide web. • SNMP Simple Network Management Protocol – A simple protocol that allows network devices (e. g. routers) to be configured by any host on the LAN. (remote control)

TCP/IP

SMTP defined • This protocol is dedicated for sending EMail messages over a TCP connection. SMTP defines a set of rules which allows two programs to send and receive mail over the network. The protocol defines the data structure that would be delivered with information regarding the sender, the recipient (or several recipients) and, of course, the mail's body.

SMTP in practice • A machine will have an SMTP port that allows connections from the outside world to come in and speak SMTP. • SMTP port must understand a small number of commands – RCPT TO: <recipient> • Specifies the recipient. – MAIL FROM: <sender> • Specifies the sender. – DATA • Following text is collected as the message.

HTTP: • URL (Uniform Resource Locator) – – establishes link to document – http: //www. cs. princeton. edu/courses/archive/spring 01/cs 111/ • • http: gives the protocol www. cs. princeton. edu name of host holding the document courses/archive/spring 01/cs 111 folder chain on the host machine index. html default file name • HTML (Hyper. Text Markup Language) – Language for writing documents read by HTTP

History of the Internet • 1960’s Do. D built ARPANET – Experimental network – Would function even in event of nuclear war • 1985 – – NSF created NSFNET Based on ARPRANET National backbone service Free to educational and research institutions Corporations (eg Sprint, MCI) built networks • Networks linked to NSFNET

Internet, as we know it (cont. ) • 1993 Inter. NIC created by NSF – directory and database services (AT&T) – registration services (Network Solutions Inc. ) – information services (General Atomics/CERFnet) • 1989 http created by Tim Berners-Lee – Released to high energy physicists in 1991 – Mosaic in 1993 – Netscape in 1994

Internet – key events • 1961 Kleinrock – packet-switching theory • 1966 Roberts – Plan for ARPANET • 1969 – First ARPANET With much help from Hobbes' Internet Timeline http: //www. zakon. org/robert/internet/timeline Copyright (c)1993 -2000 by Robert H Zakon.

First packets sent by Charley Kline at UCLA as he tried logging into SRI. The first attempt resulted in the system crashing as the letter G of LOGIN was entered. (October 29)

Internet – key events (cont. ) • 1973 – Ethernet invented – Email is 75% of ARPANET traffic • 1974 – TCP designed • 1978 – TCP split into TCP and IP • 1980 – First virus

Internet – key events (cont. ) • 1983 – First nameserver – Cut over to TCP/IP • 1984 – DNS (Domain Name System) introduced • . com, . edu, … – Number of hosts above 1000 • 1988 – CERT (Computer Emergency Response Team) formed in response to Morris worm

Internet – key events (cont. ) • 1989 – Number of hosts above 100, 000 • 1990 – ARPANET ceases to exist • 1993 – Mosaic; WWW proliferates at a 341, 634% annual growth rate of service traffic • 1994 – 25 th anniversary of ARPANET

Growth in number of hosts

Growth in number of domains

Growth in number of networks

WWW Growth

The internet -- Why it should not work • Disorganized – Very heterogeneous – Large dose of anarchy – Grew quickly, continues to grow quickly

The Internet -- Why it does work • TCP/IP • Other protocols – http – Smtp

The Internet -- When it doesn’t work • Worms – Can eat up all available bandwidth • Denial of service attacks – Smurf attack • Aim at a network of many hosts on a LAN • Send a ping from a faked address (address spoofing) • Each host responds to your spoof’ed machine

Numeracy • Let's look at the scenario to paint a picture of the dangerous nature of this attack. Assume a co-location switched network with 100 hosts, and that the attacker has a T 1. The attacker sends, say, a 768 kb/s stream of ICMP echo (ping) packets, with a spoofed source address of the victim, to the broadcast address of the "bounce site". These ping packets hit the bounce site's broadcast network of 100 hosts; each of them takes the packet and responds to it, creating 100 ping replies out-bound. If you multiply the bandwidth, you'll see that 76. 8 Mbps is used outbound from the "bounce site" after the traffic is multiplied. This is then sent to the victim (the spoofed source of the originating packets). • Courtesy http: //www. pentics. net/denial-of-service/white-papers/smurf. cgi

Who owns the internet? • Central organizations – Inter. NIC and its successors (for names) – ISP’s for transportation • The networking community – Setting and following standards • IETF (Internet Engineering Task Force) • W 3 C (World Wide Web Consortium) – Network managers

Who owns the internet? (cont. ) • We all do – Information wants to be free – A lot of good social behavior is required

Networking (conclusion) • At the bottom – Hardware generating packets – Protocols for packet transmission – Collision, backoff, … • In the middle – TCP/IP is the key idea • At the top – Some nice tools • Ping, traceroute, … are useful – Ports on my machine for http, ftp, smtp, ping, fingerd, . . .

A few questions • What does it mean to be connected to the internet? – How is the web different from the internet? • What about security? – Child security in chatroom – Would/should you give a credit card number • What’s next?

The big picture • You want to do more than 1 thing • You want to be part of a larger system • Some problems can be spread across machines - distributed computing • Some data must be spread across machines – distributed data bases

Distributed computing • Key idea – Buying 1000 machines of speed x is significantly cheaper than buying one machine of speed 1000 x • Key challenge – Making the machines work together to get effective speedup. Communication between machines is the challenge. • Approaches – Find problems that can be distributed easily

Distributed problems • Problems that can use decentralized computing – Weather prediction • Weather in a location is most affected by weather nearby – Movie generation • Individual frames can be generated separately – Looking for the origin of the universe • Can be localized like weather prediction – Looking for extra terrestrials

Distributed computers • Scales of distributed computing – Beowulf cluster hundreds of machines • All dedicated to the task – NOW (Network of Workstations) thousands of machines • Using spare cycles – SETI cluster • Screen saver situation millions of machines

Beowulf cluster • Machines are dedicated to the network • All machines run similar software • Problem is divided into pieces – Each piece is assigned to a machine in the cluster • Problem pieces should be loosely linked – Computation is faster than communication

No. W (Network of Workstations) • • Loosely coupled on a LAN Workstations do other things at times When free cycles are available, they’re used Many more machines than Beowulf but less of each machine

SETI • Telescope at Arecibo collects data • Data is processed real time by fast machines • But, no one looks for weak signals – Too costly • SETI@Home project built to do this

SETI@Home • Receive data from Arecibo – 35 Gbytes per day by snail mail • Break into Work Units –. 25 Mbyte each, so 140, 000 WU’s per day • WU takes 20 hours to process • Need 115, 000 machines to process a day

SETI@Home • Get individual users to download software • Machine idle and screen saver runs software – Download WU – Compute – When finished send back result • Database at Berkeley reassembles results • Progress to date -- Seti@Home. Stats/

Distributed data bases • Data spread across machines in different ways – Web pages • E. g. HTTP – MP 3 collections • E. g. Napster, Gnutella – Auction items • E. g. EBay

Client-Server Model • Central server • Clients store and retrieve data from server – File manager – HTTP:

Napster Model • Server is only used to make connections • Communication is peer-to-peer (P 2 P) – Client to client transfer without a real server • Sits on top of TCP/IP • Uses proprietary protocol

How Napster works • Initial registration – name, password, local directory for files, … • When client connects (TCP/IP connection) – Client provides list of files it will share – Napster updates its central index of available files • When client asks for a file – Napster gives client with list of online clients with file

How Napster works II • When client asks for download – – – Napster asks supplier to accept a request Napster tells client how to contact supplier (IP address and port) Client opens port and fetches file from supplier Supplier and client report progress/completion status to Napster Download is done according to a protocol • Napster server directory is continually updated • Client ranks potential servers by bandwidth and ping time

Napster Model • Server is only used to make connections • Communication is peer-to-peer (P 2 P) – Client to client transfer without a real server • Can we do this without a central server? ? ?

Gnutella • • Gnutella design has no central server Built on 6 DOS/Milgram experiment Every machine is both client and server To connect you need – One IP address

How Gnutella works • To connect to the network, – you only need to know the IP address and port of any servant that is already connected. – Your servant does announces your presence to all of the servants it is already connected to, and so on until the message propagates throughout the entire network. – Each of these servants then responds to this message with a bit of information about itself: how many files it is sharing, how many KBs of space they take up, etc. • By connecting, you immediately know how much is available on the network to search through.

How Gnutella works - II • To search – You send out a search request, it is propagated through the network, and each servant that has matching terms passes back its result set. – Each servant handles the search query in its own way. – To save on bandwidth, a servant does not have to respond to a query if it has no matching items. The servant also has the option of returning only a limited result set.

How Gnutella works - III • For file sharing, each servant acts as a miniature HTTP web server. • Just like TCP/IP packets, gnutella packets have a TTL (Time To Live). – The TTL starts off at some low number, like 5. – Each time a packet is routed through a servant, the servant lowers the TTL by 1. – Once the TTL hits 0 the packet is no longer forwarded. This helps to keep packets from circling the network forever. .

What happens next? • Various scenarios… – Which one do you believe?

A few additional topics • Wireless – Is it any different? • Security – Firewalls • Intranets and extranets – Viruses – Worms
- Slides: 53