CSC 660 Advanced OS System Calls CSC 660

  • Slides: 31
Download presentation
CSC 660: Advanced OS System Calls CSC 660: Advanced Operating Systems 1

CSC 660: Advanced OS System Calls CSC 660: Advanced Operating Systems 1

A Different Kind of C 1. 2. 3. 4. 5. 6. 7. 8. 9.

A Different Kind of C 1. 2. 3. 4. 5. 6. 7. 8. 9. No access to C library. ISO C 99 + GNU C extensions. No memory protection. Small fixed-size (8 KB) stack. Limited floating point support. Concurrency and synchronization. Portability. Coding style and idioms. Debugging. CSC 660: Advanced Operating Systems 2

No access to C library Why not? Bootstrapping (C library uses system calls…) Performance

No access to C library Why not? Bootstrapping (C library uses system calls…) Performance and size. Kernel equivalent functions Use lib/string. c for string operations. Use printk() instead of printf() CSC 660: Advanced Operating Systems 3

ISO C 99 Inline Functions static inline void dog(int tail) Struct Assignment struct file_operations

ISO C 99 Inline Functions static inline void dog(int tail) Struct Assignment struct file_operations fops = {. read = device_read, . write = device_write, . open = device_open, . release = device_release }; CSC 660: Advanced Operating Systems 4

GNU C Inline Assembly (asm or __asm__ keyword) asm ( assembler template : output

GNU C Inline Assembly (asm or __asm__ keyword) asm ( assembler template : output operands : input operands : list of clobbered registers ); Example from arch/i 386/signal. c: __asm__("movl %%gs, %0" : "=r"(tmp): "0"(tmp)); Branch Annotation Optimize branch for most likely decision. likely() and unlikely() macros CSC 660: Advanced Operating Systems 5

GNU C asmlinkage Function attribute to allow C functions to be called from assembly

GNU C asmlinkage Function attribute to allow C functions to be called from assembly language (prevents parameters being placed in registers. ) volatile Warns compiler that variable may be changed asynchronously by other threads (prevents compiler from optimizing away reads. ) static inline Inline function expansion to improve speed. CSC 660: Advanced Operating Systems 6

No Memory Protection Kernel traps illegal memory access for users Sends SIGSEGV to kill

No Memory Protection Kernel traps illegal memory access for users Sends SIGSEGV to kill offending process. No one to look out for kernel. Memory violations result in kernel oops. Kernel memory is not pageable. Uses physical memory, not swap space. CSC 660: Advanced Operating Systems 7

Small Fixed Stack Kernel stack is 2 4 KB pages Cannot create many local

Small Fixed Stack Kernel stack is 2 4 KB pages Cannot create many local variables. No deep recursion. CSC 660: Advanced Operating Systems 8

Floating Point Floating point used to be handled by FPU. Integrated into CPU with

Floating Point Floating point used to be handled by FPU. Integrated into CPU with 80486 DX. Still performed with ESCAPE instructions. FPU has own FP registers. Shared with MMX unit. Not saved by default on context switch. Must use FP carefully in kernel Call kernel_fpu_begin() before using FPU. Call kernel_fpu_end() after using FPU. CSC 660: Advanced Operating Systems 9

Concurrency Asynchronous interrupts Interrupt handlers may access resources at the same time as your

Concurrency Asynchronous interrupts Interrupt handlers may access resources at the same time as your function. Multiprocessing Another processor may be executing function at the same time. Preemptive kernel Scheduler can preempt your kernel thread in favor of another thread. Synchronization Solutions Spinlocks Semaphors CSC 660: Advanced Operating Systems 10

Portability Kernel runs on 22 architectures. Different endianess. Different word sizes. Different page sizes.

Portability Kernel runs on 22 architectures. Different endianess. Different word sizes. Different page sizes. Kernel code must be Endian neutral 64 -bit clean No assumptions about word or page size. CSC 660: Advanced Operating Systems 11

Portability A char is always 8 bits (may be signed or unsigned). A short

Portability A char is always 8 bits (may be signed or unsigned). A short is currently 16 bits on all archs. An int is currently 32 bits on all archs. A long may be 32 or 64 bits. A pointer may be 32 or 64 bits. Use explicitly sized types when necessary: s 8, u 8, s 16, u 16, s 32, u 32, s 64, u 64 Use opaque types for portability atomic_t, pid_t CSC 660: Advanced Operating Systems 12

Coding Style Indentation Tabs that are 8 -characters in length. Braces Conditionals/loops: initial {

Coding Style Indentation Tabs that are 8 -characters in length. Braces Conditionals/loops: initial { at end of statement if (foo) { … } else { … } Functions: { on separate line int foo() { … } CSC 660: Advanced Operating Systems 13

Coding Style Naming Lower case, words separated by underscores. Use descriptive names, especially for

Coding Style Naming Lower case, words separated by underscores. Use descriptive names, especially for globals. Functions No longer than 2 screens of text. Fewer than 10 local variables. Comments Describe what and why, not how your code works. Ifdefs Restrict them to include (. h) files. CSC 660: Advanced Operating Systems 14

Idioms do { stmt 1; stmt 2 } while (0) Found in macros. Allows

Idioms do { stmt 1; stmt 2 } while (0) Found in macros. Allows multi-statement macros in if/else Heavy use of bit operators and(&), or(|), xor(^), not(~) Heavy use of goto Often used to exit control structures on error. CSC 660: Advanced Operating Systems 15

Kernel Debugging: Oops An oops is a major kernel failure. Ex: dereferencing a null

Kernel Debugging: Oops An oops is a major kernel failure. Ex: dereferencing a null pointer If kernel cannot recover, a panic results. Information sent to console Text description Register contents Stack backtrace CSC 660: Advanced Operating Systems 16

Kernel Debugging: Oops Unable to handle kernel NULL pointer dereference at virtual address 0000

Kernel Debugging: Oops Unable to handle kernel NULL pointer dereference at virtual address 0000 c 0203 c 18 EIP: 0060: [<c 0203 c 18>] Not tainted Using defaults from ksymoops -t elf 32 -i 386 -a i 386 EFLAGS: 00010086 eax: c 137 a 800 ebx: c 0 e 80200 ecx: c 1379050 edx: 0000 esi: c 137 a 800 edi: c 13 d 0000 ebp: 00000246 esp: c 13 d 1 f 2 c ds: 007 b es: 007 b ss: 0068 Stack: c 1379050 00000002 c 137 a 800 00000008 0000 c 137 a 800 c 02060 b 3 c 137 a 800 0001221 e 0000 c 030 b 004 c 030 b 000 c 13 fdc 10 c 02037 c 0 c 137 a 800 00000293 c 0125 b 6 d 0000 c 13 fdc 28 c 13 fdc 20 c 13 d 00000000 Call Trace: [<c 02060 b 3>] is_complete+0 x 2 c 3/0 x 310 [<c 02037 c 0>] run+0 x 30/0 x 40 [<c 0125 b 6 d>] worker_thread+0 x 1 bd/0 x 2 b 0 [<c 0203790>] run+0 x 0/0 x 40 [<c 0113 b 10>] default_wake_function+0 x 0/0 x 20 [<c 0108 fd 6>] ret_from_fork+0 x 6/0 x 20 [<c 0113 b 10>] default_wake_function+0 x 0/0 x 20 [<c 01259 b 0>] worker_thread+0 x 0/0 x 2 b 0 CSC 660: Advanced Operating Systems 17

printk() Robust and callable except early in boot Enable early_printk() option for that. Circular

printk() Robust and callable except early in boot Enable early_printk() option for that. Circular log buffer klogd reads /proc/kmsg syslogd gets data from klogd writes to /var/log/syslog can also access with dmesg Message priorities 0(high). . 7(low) Named: KERN_EMERG, _ALERT, _CRIT, _ERR, _WARNING, _NOTICE, _INFO, _DEBUG CSC 660: Advanced Operating Systems 18

Printing Debugging Information printk() Assertions BUG_ON(bad_condition) causes oops Panics if (terrible_condition) panic(“Terrible condition!”); Stack

Printing Debugging Information printk() Assertions BUG_ON(bad_condition) causes oops Panics if (terrible_condition) panic(“Terrible condition!”); Stack traces if (!debug_check) { printk(KERN_DEBUG “Check x failedn”); dump_stack(); } CSC 660: Advanced Operating Systems 19

System Calls System calls provide the interface between user programs and kernel. 1. Abstracted

System Calls System calls provide the interface between user programs and kernel. 1. Abstracted hardware interface. 2. Security and stability. 3. Allows virtualization. CSC 660: Advanced Operating Systems 20

Hello World > cat >hello. c #include <stdio. h> int main(int argc, char *argv[])

Hello World > cat >hello. c #include <stdio. h> int main(int argc, char *argv[]) { printf("Hello world!n"); return 0; } > gcc –o hello. c > ltrace. /hello __libc_start_main(0 x 8048394, 1, 0 xbffff 914, 0 x 80483 b 8, 0 x 8048400 <unfinished. . . > printf("Hello world!n"Hello world! ) = 13 +++ exited (status 0) +++ CSC 660: Advanced Operating Systems 21

Hello World >strace. /hello execve(". /hello", [". /hello"], [/* 40 vars */]) = 0

Hello World >strace. /hello execve(". /hello", [". /hello"], [/* 40 vars */]) = 0 uname({sys="Linux", node="tara", . . . }) = 0 brk(0) = 0 x 804 a 000 access("/etc/ld. so. nohwcap", F_OK) = -1 ENOENT (No such file or directory) old_mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0 xb 7 fe 9000 open("/etc/ld. so. preload", O_RDONLY) = -1 ENOENT (No such file or directory) open("/etc/ld. so. cache", O_RDONLY) = 3 fstat 64(3, {st_mode=S_IFREG|0644, st_size=50648, . . . }) = 0 old_mmap(NULL, 50648, PROT_READ, MAP_PRIVATE, 3, 0) = 0 xb 7 fdc 000 close(3) = 0 access("/etc/ld. so. nohwcap", F_OK) = -1 ENOENT (No such file or directory) open("/lib/tls/i 686/cmov/libc. so. 6", O_RDONLY) = 3 read(3, "177 ELF11131215 Y1". . . , 512) = 512 fstat 64(3, {st_mode=S_IFREG|0644, st_size=1222116, . . . }) = 0 CSC 660: Advanced Operating Systems 22

Hello World old_mmap(NULL, 1232428, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0 xb 7 eaf 000

Hello World old_mmap(NULL, 1232428, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0 xb 7 eaf 000 old_mmap(0 xb 7 fd 1000, 36864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0 x 121000) = 0 xb 7 fd 1000 old_mmap(0 xb 7 fda 000, 7724, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0 xb 7 fda 000 close(3) = 0 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0 xb 7 eae 000 set_thread_area({entry_number: -1 -> 6, base_addr: 0 xb 7 eae 080, limit: 1048575, seg_32 bit: 1, contents: 0, read_exec_only: 0, limit_in_pages: 1, seg_not_present: 0, useable: 1}) = 0 munmap(0 xb 7 fdc 000, 50648) = 0 fstat 64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 3), . . . }) = 0 mmap 2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0 xb 7 fe 8000 write(1, "Hello world!n", 13 Hello world!) = 13 munmap(0 xb 7 fe 8000, 4096) = 0 exit_group(0) = ? CSC 660: Advanced Operating Systems 23

Using a System Call Application Calls printf() C library (glibc) printf() function issues write()

Using a System Call Application Calls printf() C library (glibc) printf() function issues write() system call. Kernel write() system call manages output. sets global errno variable if an error occurs. returns to user application CSC 660: Advanced Operating Systems 24

Making a System Call Software Interrupt Historically: int $0 x 80 Modern: sysenter System

Making a System Call Software Interrupt Historically: int $0 x 80 Modern: sysenter System Call Number Put in %eax register before interrupt sys_call_table in arch/i 386/kernel/entry. S Parameters 1 -5 args: %ebx, %ecx, %edx, %esi, %edi 6+ args: one register has pointer to user space params Returning Return from software interrupt: iret or sysexit Return value stored in %eax register. CSC 660: Advanced Operating Systems 25

System Call Macros include/asm-i 386/unistd. h #define _syscall 0(type, name)  type name(void)

System Call Macros include/asm-i 386/unistd. h #define _syscall 0(type, name) type name(void) { long __res; __asm__ volatile ("int $0 x 80" : "=a" (__res) : "0" (__NR_##name)); __syscall_return(type, __res); } #define _syscall 2(type, name, type 1, arg 1, type 2, arg 2) type name(type 1 arg 1, type 2 arg 2) { long __res; __asm__ volatile ("int $0 x 80" : "=a" (__res) : "0" (__NR_##name), "b" ((long)(arg 1)), "c" ((long)(arg 2))); __syscall_return(type, __res); } CSC 660: Advanced Operating Systems 26

Kernel System Call arch/i 386/entry. S ENTRY(system_call) pushl %eax # save orig_eax SAVE_ALL GET_THREAD_INFO(%ebp)

Kernel System Call arch/i 386/entry. S ENTRY(system_call) pushl %eax # save orig_eax SAVE_ALL GET_THREAD_INFO(%ebp) # system call tracing in operation testb $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT), TI_flags(%ebp) jnz syscall_trace_entry cmpl $(nr_syscalls), %eax jae syscall_badsys syscall_call: call *sys_call_table(, %eax, 4) movl %eax, EAX(%esp) # store return value syscall_exit: cli movl TI_flags(%ebp), %ecx testw $_TIF_ALLWORK_MASK, %cx # current->work jne syscall_exit_work restore_all: RESTORE_ALL CSC 660: Advanced Operating Systems 27

Defining a System Call System call name: getpid() System call function: sys_getpid() asmlinkage long

Defining a System Call System call name: getpid() System call function: sys_getpid() asmlinkage long sys_getpid(void) { return current->tgid; } CSC 660: Advanced Operating Systems 28

Adding a System Call 1. Write system call function 2. Add entry to end

Adding a System Call 1. Write system call function 2. Add entry to end of sys_call_table In arch/i 386/kernel/entry. S add. long sys_mycall 3. Define system call number for user In include/asm-i 386/unistd. h #define __NR_mycall 289 4. Compile kernel CSC 660: Advanced Operating Systems 29

Calling your new syscall #include <linux/unistd. h> #define __NR_current_time 289 _syscall 0(long, current_time) #include

Calling your new syscall #include <linux/unistd. h> #define __NR_current_time 289 _syscall 0(long, current_time) #include <stdio. h> int main() { long retval = 1; retval = current_time(); printf("The return value is %ldn", retval); return 0; } CSC 660: Advanced Operating Systems 30

References 1. 2. 3. 4. 5. 6. Daniel P. Bovet and Marco Cesati, Understanding

References 1. 2. 3. 4. 5. 6. Daniel P. Bovet and Marco Cesati, Understanding the Linux Kernel, 3 rd edition, O’Reilly, 2005. GNU, GNU C Library Manual, http: //www. gnu. org/software/libc/manual/, 2003. Robert Love, Linux Kernel Development, 2 nd edition, Prentice-Hall, 2005. Claudia Rodriguez et al, The Linux Kernel Primer, Prentice-Hall, 2005. Peter Salzman et. al. , Linux Kernel Module Programming Guide, version 2. 6. 1, 2005. Andrew S. Tanenbaum, Modern Operating Systems, 2 nd edition, Prentice-Hall, 2001. CSC 660: Advanced Operating Systems 31