Debugging Techniques Sarah Diesburg COP 5641 Overview n
Debugging Techniques Sarah Diesburg COP 5641
Overview n n n Several tools are available Some are more difficult to set up and learn Will go over basic tools, then use next assignment to go over interesting tools
Kernel- vs User-Space Debugging n Difficulty is higher n n n No built-in debuggers Bugs may be hard to reproduce Stakes are higher n Fault in kernel can bring down whole system or cause unexplained behaviors
Types of Bugs n Incorrect code n n Synchronization error n n Example: not storing correct value in proper place Example: not properly locking a shared variable Incorrectly managing hardware n Example: sending wrong operation to wrong control register
Pitfalls from Personal Experience n n n Beware NULL or garbage pointers Zero-out memory before using Do not re-create the wheel n n n Use functions already available (e. g. linked list, strings) Beware of any warnings in compilation Minimize complexity
Debugging Support in the Kernel n Under the “kernel hacking” menu n n CONFIG_DEBUG_KERNEL n n Not supported by all architectures Enables other debugging features CONFIG_DEBUG_SLUB n Checks kernel memory allocation functions n n Memory overrun Memory initialization
Debugging Support in the Kernel n CONFIG_LOCKUP_DETECTOR n n n Detect hard and soft lockups Softlockups – cause kernel to loop for more than 60 seconds Hardlockups – cause cpu (or core) to loop for more than 60 seconds
Debugging Support in the Kernel n CONFIG_DEBUG_PAGEALLOC n n CONFIG_DEBUG_SPINLOCK n n Pages are removed from the kernel address space when freed Catches operations on uninitialized spinlocks and double unlocking CONFIG_DEBUG_MUTEXES n Detects and reports various mutex violations
Debugging Support in the Kernel n CONFIG_DEBUG_INFO n n CONFIG_DEBUG_ATOMIC_SLEEP n n Enables gdb debugging Reporting if calling a routine that may sleep inside a critical section CONFIG_KGDB* n Remotely debug the kernel using gdb
Debugging Support in the Kernel n CONFIG_MAGIC_SYSRQ n n CONFIG_DEBUG_STACKOVERFLOW n n For debugging system hangs Helps track down kernel stack overflows CONFIG_DEBUG_STACK_USAGE n Monitors stack usage and makes statistics available via magic Sys. Rq key
Debugging Support in the Kernel n CONFIG_KALLSYMS n n CONFIG_FRAME_POINTER n n Causes kernel symbol information to be built into the kernel Produces more reliable stack backtraces CONFIG_PROFILING n For performance tuning
Debugging Support in the Kernel n Not an exhaustive list
printk (vs. printf) n Lets one classify messages according to their priority by associating with different loglevels n n printk(KERN_DEBUG “Here I am: %s: %in”, __FILE__, __LINE__); Eight possible loglevels (0 - 7), defined in <linux/kernel. h>
printk (vs. printf) n KERN_EMERG n n KERN_ALERT n n For emergency messages For a situation requiring immediate action KERN_CRIT n Critical conditions, related to serious hardware or software failures
printk (vs. printf) n KERN_ERR n n Used to report error conditions; device drivers often use it to report hardware difficulties KERN_WARNING n Warnings for less serious problems
printk (vs. printf) n KERN_NOTICE n n KERN_INFO n n Normal situations worthy of note (e. g. , security-related) Informational messages KERN_DEBUG n Used for debugging messages
printk (vs. printf) n Without specified priority n n DEFAULT_MESSAGE_LOGLEVEL = KERNEL_WARNING If current priority < console_loglevel n n console_loglevel initialized to DEFAULT_CONSOLE_LOGLEVEL Message is printed to the console one line at a time
printk (vs. printf) n If both klogd and syslogd are running n n Messages are appended to /var/log/messages klog daemon doesn’t save consecutive identical lines, only the first line + the number of repetitions
printk (vs. printf) n console_loglevel can be modified using /proc/sys/kernel/printk n Contains 4 values n n n Current loglevel Default log level Minimum allowed loglevel Boot-timed default loglevel echo 6 > /proc/sys/kernel/printk
How Messages Get Logged n printk writes messages into a circular buffer that is __LOG_BUF_LEN bytes n n If the buffer fills up, printk wraps around and overwrite the beginning of the buffer Can specify the –f <file> option to klogd to save messages to a specific file
How Messages Get Logged n n Reading from /proc/kmsg consumes data syslog system call can leave data for other processes (try dmesg command)
Rate Limiting n n Too many messages may overwhelm the console To reduce repeated messages, use n n int printk_ratelimit(void); Example if (printk_ratelimit()) { printk(KERN_NOTICE “The printer is still on firen”); }
Rate Limiting n To modify the behavior of printk_ratelimit n /proc/sys/kernel/printk_ratelimit n n Number of seconds before re-enabling messages /proc/sys/kernel/printk_ratelimit_burst n Number of messages accepted before rate limiting
Using the /proc Filesystem n n Exports kernel information Each file under /proc tied to a kernel function n n /proc/cpuinfo, /proc/meminfo Will give in-depth example after introducing character driver next week
The ioctl Method n Implement additional commands to return debugging information n Advantages n n n More efficient Does not need to split data into pages Can be left in the driver unnoticed
Debugging by Watching n strace command n n n Shows system calls, arguments, and return values No need to compile a program with the –g option -t to display when each call is executed -T to display the time spent in the call -e to limit the types of calls -o to redirect the output to a file
Debugging System Faults n n A fault usually ends the current process, while the system continues to work Potential side effects n n Hardware left in an unusable state Kernel resources in an inconsistent state Corrupted memory Common remedy n Reboot
Oops Messages n Dereferencing invalid pointers often results in oops messages ssize_t faulty_write(struct file *filp, const char __user *buf, size_t count, loff_t *pos) { /* make a simple fault by dereferencing a NULL pointer */ *(int *)0 = 0; return 0; }
Oops Messages Unable to handle kernel NULL pointer dereference at virtual address 0000 printing eip: d 083 a 064 Error Oops: 0002 [#1] Function name, 4 message SMP bytes into the CPU: 0 function, kernel EIP: 0060: [<d 083 a 064>] Not tainted EFLAGS: 00010246 (2. 6. 6) module name Kernel address EIP is at faulty_write+0 x 4/0 x 10 [faulty] space if >= eax: 0000 ebx: 0000 ecx: 0000 edx: 0000 0 xc 000000 esi: cf 8 b 2460 edi: cf 8 b 2480 ebp: 00000005 esp: c 31 c 5 f 74 ds: 007 b es: 007 b ss: 0068 Process bash (pid: 2086, threadinfo=c 31 c 4000 task=cfa 0 a 6 c 0) Stack: c 0150558 cf 8 b 2460 080 e 9408 00000005 cf 8 b 2480 0000 cf 8 b 2460 fffffff 7 080 e 9408 c 31 c 4000 c 0150682 cf 8 b 2460 080 e 9408 00000005 cf 8 b 2480 00000001 00000005 c 0103 f 8 f 00000001 080 e 9408 00000005 Call Trace: [<c 0150558>] vfs_write+0 xb 8/0 x 130 Call stack [<c 0150682>] sys_write+0 x 42/0 x 70 [<c 0103 f 8 f>] syscall_call+0 x 7/0 xb Code: 89 15 00 00 c 3 90 8 d 74 26 00 83 ec 0 c b 8 00 a 6 83 d 0
Oops Messages n Buffer overflow ssize_t faulty_read(struct file *filp, char __user *buf, size_t count, loff_t *pos) { int ret; char stack_buf[4]; memset(stack_buf, 0 xff, 20); /* buffer overflow */ if (count > 4) { count = 4; /* copy 4 bytes to the user */ } ret = copy_to_user(buf, stack_buf, count); if (!ret) { return count; } return ret; }
Oops Messages Bad EIP: 0010: [<0000>] Unable to handle kernel paging request at virtual address ffff printing eip: ffff Error Oops: 0000 [#5] message SMP 0 xffff CPU: 0 points to nowhere EIP: 0060: [<ffff>] Not tainted EFLAGS: 00010296 (2. 6. 6) EIP is at 0 xffff User-space eax: 0000000 c ebx: ffff ecx: 0000 edx: bfffda 7 c address space if esi: cf 434 f 00 edi: ffff ebp: 00002000 esp: c 27 fff 78 0 xc 000000 ds: 007 b es: 007 b ss: 0068 Process head (pid: 2331, threadinfo=c 27 fe 000 task=c 3226150) Stack: ffff bfffda 70 00002000 cf 434 f 20 00000001 00000286 cf 434 f 00 fffffff 7 bfffda 70 c 27 fe 000 c 0150612 cf 434 f 00 bfffda 70 00002000 cf 434 f 20 00000003 00002000 c 0103 f 8 f 00000003 bfffda 70 00002000 bfffda 70 Call Trace: [<c 0150612>] sys_read+0 x 42/0 x 70 Call stack [<c 0103 f 8 f>] syscall_call+0 x 7/0 xb Code: Bad EIP value. address <
Oops Messages n n Require CONFIG_KALLSYMS option turned on to see meaningful messages Other tricks n 0 xa 5 a 5 on stack memory not initialized
Asserting Bugs and Dumping Information n BUG() and BUG_ON(conditional) n n Cause an oops, which results in a stack trace and an error message panic() n Causes and oops and halts the kernel if (terrible_thing) panic(“terrible_thing is %ld!n”, terrible_thing);
Asserting Bugs and Dumping Information n dump_stack() n Dumps contents of the registers and a function backtrace to the console without an oops
System Hangs n If Ctrl-Alt-Del does not work n Two choices n n Prevent hangs Debug after the fact
System Hangs n Insert schedule() calls at strategic points n n Hand the CPU back to the scheduler Do not call if your driver holds a spinlock
System Hangs n Keyboard lockups, but other things are still working n Use the “magic Sys. Rq key” n To enable magic Sys. Rq n n n Compile kernel with CONFIG_MAGIC_SYSRQ on echo 1 > /proc/sys/kernel/sysrq To trigger magic Sys. Rq n n Alt-Sys. Rq-<command> echo <command> > /proc/sysrqtrigger
System Hangs n Key n n k: kills all processes running on the current console s: synchronize all disks u: umount and remount all disks in readonly mode b: reboot, make sure to synchronize and remount the disks first
System Hangs n n n p: prints processor registers information t: prints the current task list m: prints memory information See sysrq. txt for more Precaution for chasing system hangs n Mount all disks as read-only
LXR n n n Linux Cross-Reference General hypertext cross-referencing tool of Linux source code Can search for variable names, function names, freetext n n Figure out where something is defined and used http: //lxr. linux. no/#linux+v 3. 2. 36/
- Slides: 40