Socket Layer COMS W 6998 Spring 2010 Erich

  • Slides: 35
Download presentation
Socket Layer COMS W 6998 Spring 2010 Erich Nahum

Socket Layer COMS W 6998 Spring 2010 Erich Nahum

Outline l l l Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets

Outline l l l Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface between AF_INET and TCP/UDP Receive Path Send Path

BSD Socket API l l Originally developed by UC Berkeley at the dawn of

BSD Socket API l l Originally developed by UC Berkeley at the dawn of time Used by 90% of network oriented programs Standard interface across operating systems Simple, well understood by programmers

User Space Socket API l socket() / bind() / accept() / listen() l l

User Space Socket API l socket() / bind() / accept() / listen() l l select() / poll() / epoll() l l Stream oriented (e. g. TCP) Rx / Tx sendto() / recvfrom() l l Waiting for events send() / recv() l l Initialization, addressing and hand shaking Datagram oriented (e. g. UDP) Rx / TX close(), shutdown() l Closing down an association

Standard Socket Sequence The ‘server’ application socket() bind() The ‘client’ application socket() listen() bind()

Standard Socket Sequence The ‘server’ application socket() bind() The ‘client’ application socket() listen() bind() accept() read() write() close() 3 -way handshake data flow to server data flow to client 4 -way handshake connect() write() read() close()

Socket() System Call l l Creating a socket from user space is done by

Socket() System Call l l Creating a socket from user space is done by the socket() system call: l int socket (int family, int type, int protocol); l On success, a file descriptor for the new socket is returned. l For open() system call (for files), we also get a file descriptor as the return value. l “Everything is a file” Unix paradigm. The first parameter, family, is also sometimes referred to as “domain”.

Socket(): Family l l A family is a suite of protocols Each family is

Socket(): Family l l A family is a suite of protocols Each family is a subdirectory of linux/net l l IPv 4: PF_INET IPv 6: PF_INET 6. Packet sockets: PF_PACKET l l E. g. , linux/net/ipv 4, linux/net/decnet, linux/net/packet Operate at the device driver layer. pcap library for Linux uses PF_PACKET sockets pcap library is in use by sniffers such as tcpdump. Protocol Family == Address Family l PF_INET == AF_INET (in /include/linux/socket. h)

Address/Protocol Families /* Supported address families. */ #define AF_UNSPEC 0 #define AF_UNIX 1 /*

Address/Protocol Families /* Supported address families. */ #define AF_UNSPEC 0 #define AF_UNIX 1 /* #define AF_LOCAL 1 /* #define AF_INET 2 /* #define AF_AX 25 3 /* #define AF_IPX 4 /* #define AF_APPLETALK 5 /* #define AF_NETROM 6 /* #define AF_BRIDGE 7 /* #define AF_ATMPVC 8 /* #define AF_X 25 9 /* #define AF_INET 6 10 /* #define AF_ROSE 11 /* #define AF_DECnet 12 /* #define AF_NETBEUI 13 /* #define AF_SECURITY 14 /* #define AF_KEY 15 /*. . #define AF_ISDN 34 /* #define AF_PHONET 35 /* #define AF_IEEE 802154 36 /* #define AF_MAX 37 /* Unix domain sockets */ POSIX name for AF_UNIX */ Internet IP Protocol */ Amateur Radio AX. 25 */ Novell IPX */ Apple. Talk DDP */ Amateur Radio NET/ROM */ Multiprotocol bridge */ ATM PVCs */ Reserved for X. 25 project */ IP version 6 */ Amateur Radio X. 25 PLP */ Reserved for DECnet project */ Reserved for 802. 2 LLC project*/ Security callback pseudo AF */ PF_KEY key management API */ m. ISDN sockets Phonet sockets IEEE 802154 sockets For now. . */ */ include/linux/socket. h

Socket(): Type l SOCK_STREAM and SOCK_DGRAM are the mostly used types. l l SOCK_STREAM

Socket(): Type l SOCK_STREAM and SOCK_DGRAM are the mostly used types. l l SOCK_STREAM for TCP, SCTP SOCK_DGRAM for UDP. SOCK_RAW for RAW sockets. There are cases where protocol can be either SOCK_STREAM or SOCK_DGRAM; for example, Unix domain socket (AF_UNIX).

Socket(): Protocol l l Protocol is protocol number within a family. Internet protocols are

Socket(): Protocol l l Protocol is protocol number within a family. Internet protocols are assigned by IANA l l For AF_INET, it’s usually 0. l l http: //www. iana. org/assignments/protocol-numbers/ IPPROTO_IP is 0, see: include/linux/in. h. For SCTP: l protocol is IPPROTO_SCTP (132) sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP); l For UDP-Lite: l protocol is IPPROTO_UDPLITE (136)

Socket Layer Architecture PF_INET SOCK_ STREAM TCP SOCK_ DGRAM UDP Application User BSD Socket

Socket Layer Architecture PF_INET SOCK_ STREAM TCP SOCK_ DGRAM UDP Application User BSD Socket Layer Socket Interface PF_PACKET SOCK _RAW PF_UNIX …. SOCK_ DGRAM PF_IPX Protocol Layers IPV 4 Kernel Network Device Layer Ethernet Intel E 1000 Token Ring PPP SLIP FDDI Device Layer Hardware

Key Concepts l Function pointer tables (“ops”) l l In-kernel interfaces for socket functions

Key Concepts l Function pointer tables (“ops”) l l In-kernel interfaces for socket functions l Binding between BSD sockets and AF_XXX families l Binding between AF_INET and transports (TCP, UDP) Socket data structures l l struct socket (BSD socket) struct sock (protocol family socket, network state) l struct packet_sock (PF_PACKET) l struct inet_sock (PF_INET) § § struct udp_sock struct tcp_sock

Socket Data Structures l l For every socket which is created by a user

Socket Data Structures l l For every socket which is created by a user space application, there is a corresponding struct socket and struct sock in the kernel. These are confusing. struct socket: include/linux/net. h l Data common to the BSD socket layer l Has only 8 members l Any variable “sock” always refers to a struct socket struct sock : include/net/sock/h l Data common to the Network Protocol layer (i. e. , AF_INET) l has more than 30 members, and is one of the biggest structures in the networking stack. l Any variable “sk” always refers to a struct sock.

struct socket { socket_state short unsigned long struct fasync_struct wait_queue_head_t struct file struct sock

struct socket { socket_state short unsigned long struct fasync_struct wait_queue_head_t struct file struct sock const struct proto_ops }; state; // SS_CONNECTING etc. type; // SOCK_STREAM etc. flags; *fasync_list; wait; // tasks waiting *file; // back ptr to inode *sk; // AF specific state *ops; // AF specific operations include/linux/net. h

Socket State typedef enum { SS_FREE = 0, SS_UNCONNECTED, SS_CONNECTING, SS_CONNECTED, SS_DISCONNECTING } socket_state;

Socket State typedef enum { SS_FREE = 0, SS_UNCONNECTED, SS_CONNECTING, SS_CONNECTED, SS_DISCONNECTING } socket_state; l /* /* /* not allocated unconnected to an socket in process of connecting connected to socket in process of disconnecting */ */ */ These states are not layer 4 states (like TCP_ESTABLISHED or TCP_CLOSE). include/linux/net. h

Socket Types enum sock_type { SOCK_STREAM SOCK_DGRAM SOCK_RAW SOCK_RDM SOCK_SEQPACKET SOCK_DCCP SOCK_PACKET }; =

Socket Types enum sock_type { SOCK_STREAM SOCK_DGRAM SOCK_RAW SOCK_RDM SOCK_SEQPACKET SOCK_DCCP SOCK_PACKET }; = = = = 1, 2, 3, 4, 5, 6, 10, include/linux/net. h

Comment in include/net/sock. h /* * This structure really needs to be cleaned up.

Comment in include/net/sock. h /* * This structure really needs to be cleaned up. * Most of it is for TCP, and not used by any of * the other protocols. */

struct sock_common /* minimal network layer representation of sockets */ struct sock_common { /*

struct sock_common /* minimal network layer representation of sockets */ struct sock_common { /* * first fields are not copied in sock_copy() */ union { struct hlist_node skc_node; // main hash linkage for lookup struct hlist_nulls_node skc_nulls_node; // main hash for TCP/UDP }; atomic_t skc_refcnt; int skc_tx_queue_mapping; // tx queue for this connection union { unsigned int skc_hash; // hash value for lookup __u 16 skc_u 16 hashes[2]; }; unsigned short skc_family; // network address family volatile unsigned char skc_state; // Connection state unsigned char skc_reuse; // SO_REUSEADDR setting int skc_bound_dev_if; // bound if !=0 union { struct hlist_node skc_bind_node; // bind hash linkage struct hlist_nulls_node skc_portaddr_node; // bind hash for UDP/Lite }; struct proto *skc_prot; // protocol handlers in a net family }; include/net/sock. h

Outline l l l Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets

Outline l l l Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface between AF_INET and TCP/UDP Receive Path Send Path

BSD Socket AF Interface l Main data structures l l l struct net_proto_family struct

BSD Socket AF Interface l Main data structures l l l struct net_proto_family struct proto_ops Key function sock_register(struct net_proto_family *ops) l Each address family: l l l Implements the struct net _proto_family. Calls the function sock_register( ) when the protocol family is initialized. Implement the struct proto_ops for binding the BSD socket layer and protocol family layer.

BSD Socket Layer net_proto_family l AF Socket Layer Describes each of the supported protocol

BSD Socket Layer net_proto_family l AF Socket Layer Describes each of the supported protocol families struct net_proto_family { int family; int (*create)(struct net *net, struct socket *sock, int protocol, int kern); struct module *owner; } l Specifies the handler for socket creation l create() function is called whenever a new socket of this type is created

BSD Socket Layer AF Socket Layer INET and PACKET proto_family static const struct net_proto_family

BSD Socket Layer AF Socket Layer INET and PACKET proto_family static const struct net_proto_family inet_family_ops = {. family = PF_INET, . create = inet_create, . owner = THIS_MODULE, /* af_inet. c */ }; static const struct net_proto_family packet_family_ops = {. family = PF_PACKET, . create = packet_create, . owner = THIS_MODULE, /* af_packet. c */ };

BSD Socket Layer proto_ops l l l AF Socket Layer Defines the binding between

BSD Socket Layer proto_ops l l l AF Socket Layer Defines the binding between the BSD socket layer and address family (AF_*) layer. The proto_ops tables contain function exported by the AF socket layer to the BSD socket layer It consists of the address family type and a set of pointers to socket operation routines specific to a particular address family.

BSD Socket Layer struct proto_ops { int struct module int int int unsigned int

BSD Socket Layer struct proto_ops { int struct module int int int unsigned int int int ssize_t }; AF Socket Layer family; *owner; (*release); (*bind); (*connect); (*socketpair); (*accept); (*getname); (*poll); (*ioctl); (*compat_ioctl); (*listen); (*shutdown); (*setsockopt); (*getsockopt); (*compat_setsockopt); (*compat_getsockopt); (*sendmsg); (*recvmsg); (*mmap); (*sendpage); (*splice_read); include/linux/net. h

BSD Socket Layer PF_PACKET proto_ops. AF Socket Layer static const struct. family =. owner

BSD Socket Layer PF_PACKET proto_ops. AF Socket Layer static const struct. family =. owner =. release =. bind =. connect =. socketpair. accept =. getname =. poll =. ioctl =. listen =. shutdown =. setsockopt. getsockopt. sendmsg =. recvmsg =. mmap =. sendpage = }; proto_ops packet_ops = { PF_PACKET, THIS_MODULE, packet_release, packet_bind, sock_no_connect, = sock_no_socketpair, sock_no_accept, packet_getname, packet_poll, packet_ioctl, sock_no_listen, sock_no_shutdown, = packet_setsockopt, = packet_getsockopt, packet_sendmsg, packet_recvmsg, packet_mmap, sock_no_sendpage, net/packet/af_packet. c

BSD Socket Layer PF_INET proto_ops AF Socket Layer inet_stream_ops (TCP) inet_dgram_ops (UDP) inet_sockraw_ops (RAW)

BSD Socket Layer PF_INET proto_ops AF Socket Layer inet_stream_ops (TCP) inet_dgram_ops (UDP) inet_sockraw_ops (RAW) . family PF_INET . owner THIS_MODULE . release inet_release . bind inet_bind . connect inet_stream_connect inet_dgram_connect . socketpair sock_no_socketpair . accept inet_accept sock_no_accept . getname inet_getname . poll tcp_poll udp_poll datagram_poll . ioctl inet_ioctl . listen inet_listen sock_no_listen . shutdown inet_shutdown . setsockopt sock_common_setsockopt . getsockopt sock_common_getsockop . sendmsg tcp_sendmsg inet_sendmsg . recvmsg sock_common_recvmsg . mmap sock_no_mmap . sendpage tcp_sendpage inet_sendpage . splice_read tcp_splice_read -- -- net/ipv 4/af_inet. c

Outline l l Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and

Outline l l Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface between AF_INET and TCP/UDP l l Binding between IP and TCP/UDP (upcall) Binding between AF_INET and TCP (downcall) Receive Path Send Path

AF_INET Layer AF_INET Transport APILayer l l l struct inet_protos l Interface between IP

AF_INET Layer AF_INET Transport APILayer l l l struct inet_protos l Interface between IP and the transport layer l Is the upcall binding from IP to transport l Method for demultiplexing IP packets to proper transport struct proto l Defines interface for individual protocols (TCP, UDP, etc) l Is the downcall binding for AF_INET to transport l Transport-specific functions for socket API struct inet_protosw l Describes the PF_INET protocols l Defines the different SOCK types for PF_INET l SOCK_STREAM (TCP), SOCK_DGRAM (UDP), SOCK_RAW

BSD Socket Layer Recall IP’s inet_protos AF Socket Layer inet_protos[MAX_INET_PROTOS] 0 net_protocol handler l

BSD Socket Layer Recall IP’s inet_protos AF Socket Layer inet_protos[MAX_INET_PROTOS] 0 net_protocol handler l udp_rcv() udp_err() err_handler gso_send_check l gso_segment gro_receive gro_complete 1 net_protocol handler err_handler gso_send_check gso_segment gro_receive gro_complete MAX_INET_ PROTOS net_protocol igmp_rcv() Null Receive binding from the IP layer to the transport layer. init_inet( ) calls inet_add_protocol (p) to add each protocol to the hash queues.

BSD Socket Layer struct proto AF Socket Layer /* Networking protocol blocks we attach

BSD Socket Layer struct proto AF Socket Layer /* Networking protocol blocks we attach to sockets. * socket layer -> transport layer interface */ struct proto { void (*close); int (*connect); int (*disconnect); struct sock * (*accept); int (*ioctl); int (*init); void (*destroy); void (*shutdown); int (*setsockopt); int (*getsockopt); int (*sendmsg); int (*recvmsg); int (*sendpage); int (*bind); int (*backlog_rcv); void (*hash); void (*unhash); int (*get_port); } include/linux/net. h

BSD Socket Layer udp_prot struct proto udp_prot = {. name. owner. close. connect. disconnect.

BSD Socket Layer udp_prot struct proto udp_prot = {. name. owner. close. connect. disconnect. ioctl. destroy. setsockopt. getsockopt. sendmsg. recvmsg. sendpage. backlog_rcv. hash. unhash. get_port. memory_allocated. sysctl_mem. sysctl_wmem. sysctl_rmem. obj_size. slab_flags. h. udp_table #ifdef CONFIG_COMPAT. compat_setsockopt. compat_getsockopt #endif }; AF Socket Layer = = = = = = "UDP", THIS_MODULE, udp_lib_close, ip 4_datagram_connect, udp_disconnect, udp_ioctl, udp_destroy_sock, udp_setsockopt, udp_getsockopt, udp_sendmsg, udp_recvmsg, udp_sendpage, __udp_queue_rcv_skb, udp_lib_hash, udp_lib_unhash, udp_v 4_get_port, &udp_memory_allocated, sysctl_udp_mem, &sysctl_udp_wmem_min, &sysctl_udp_rmem_min, sizeof(struct udp_sock), SLAB_DESTROY_BY_RCU, &udp_table, = compat_udp_setsockopt, = compat_udp_getsockopt, net/ipv 4/af_inet. c

BSD Socket Layer inet_protosw static struct inet_protosw inetsw_array[] = { { l. type =

BSD Socket Layer inet_protosw static struct inet_protosw inetsw_array[] = { { l. type = SOCK_STREAM, . protocol = IPPROTO_TCP, . prot = &tcp_prot, . ops = & inet_stream_ops, . no_check = 0, . flags = INET_PROTOSW_PERMANENT | INET_PROTOSW_ICSK, }, l {. type = SOCK_DGRAM, . protocol = IPPROTO_UDP, . prot = &udp_prot, l. ops = & inet_dgram_ops, . no_check = UDP_CSUM_DEFAULT, . flags = INET_PROTOSW_PERMANENT, }, {. type = SOCK_RAW, . protocol = IPPROTO_IP, /* wild card */. prot = &raw_prot, . ops = & inet_sockraw_ops, . no_check = UDP_CSUM_DEFAULT, . flags = INET_PROTOSW_REUSE, } }; AF Socket Layer On startup (inet_init()), TCP, UDP, and Raw socket protocols are inserted into the inetsw_array[]. Other protocols call inet_register_protosw() inet_unregister_protosw() will not remove protocols with PERMANENT set. net/ipv 4/af_inet. c

Relationships struct socket state type flags fasync_list wait file sk proto_ops struct sock sk_common

Relationships struct socket state type flags fasync_list wait file sk proto_ops struct sock sk_common sk_lock sk_backlog. . . (*sk_prot_creator) sk_socket sk_send_head. . . struct proto_ops PF_INET af_inet. c inet_release inet_bind inet_accept. . . struct sock_common skc_node skc_refcnt skc_hash. . . skc_proto skc_net struct proto udp_lib_close ipv 4_dgram_connect udp_sendmsg udp_recvmsg. . .

Example: inet_accept() int inet_accept(struct socket *sock, struct socket *newsock, int flags) { struct sock

Example: inet_accept() int inet_accept(struct socket *sock, struct socket *newsock, int flags) { struct sock *sk 1 = sock->sk; int err = -EINVAL; struct sock *sk 2 = sk 1 ->sk_prot->accept(sk 1, flags, &err); if (!sk 2) goto do_err; lock_sock(sk 2); WARN_ON(!((1 << sk 2 ->sk_state) & (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT | TCPF_CLOSE))); sock_graft(sk 2, newsock); newsock->state = SS_CONNECTED; err = 0; release_sock(sk 2); do_err: return err; }

Backup

Backup