Advanced Char Driver Operations Linux Kernel Programming CIS

  • Slides: 44
Download presentation
Advanced Char Driver Operations Linux Kernel Programming CIS 4930/COP 5641

Advanced Char Driver Operations Linux Kernel Programming CIS 4930/COP 5641

Topics n Managing ioctl command numbers n Putting a thread to sleep Seeking on

Topics n Managing ioctl command numbers n Putting a thread to sleep Seeking on a device Access control n n

ioctl n n n input/output control system call For operations beyond simple data transfers

ioctl n n n input/output control system call For operations beyond simple data transfers q q n Eject the media Report error information Change hardware settings Self destruct Alternatives q q Embedded commands in the data stream Driver-specific file systems

ioctl n User-level interface (application view) int ioctl(int fd, int request, . . .

ioctl n User-level interface (application view) int ioctl(int fd, int request, . . . ); q. . . n Does not indicate variable number of arguments q n q Would be problematic for the system call interface In this context, it is meant to pass a single optional argument q Traditionally a char *argp q Just a way to bypass the type checking For more information, look at man page

ioctl n Driver-level interface int (*unlocked_ioctl) (struct file *filp, unsigned int cmd, unsigned long

ioctl n Driver-level interface int (*unlocked_ioctl) (struct file *filp, unsigned int cmd, unsigned long arg); q cmd is passed from the user unchanged q arg can be an integer or a pointer q Compiler does not type check n ioctl() has changed from the LDD 3 era q q Modified to remove the big kernel lock (BKL) http: //lwn. net/Articles/119652/

Choosing the ioctl Commands n Desire a numbering scheme to avoid mistakes q q

Choosing the ioctl Commands n Desire a numbering scheme to avoid mistakes q q q E. g. , issuing a command to the wrong device (changing the baud rate of an audio device) Unique ioctl command numbers across system Check ioctl. h files in the source and directory Documentation/ioctl/

Choosing the ioctl Commands n A command number uses four bitfields q q Defined

Choosing the ioctl Commands n A command number uses four bitfields q q Defined in include/uapi/asm-generic/ioctl. h (for most architectures) < direction, type, number, size> n direction: direction of data transfer q q _IOC_NONE _IOC_READ _IOC_WRITE _IOC_READ | WRITE

Choosing the ioctl Commands q < direction, type, number, size> n n type (ioctl

Choosing the ioctl Commands q < direction, type, number, size> n n type (ioctl device type) q 8 -bit (_IOC_TYPEBITS) magic number q Associated with the device number q 8 -bit (_IOC_NRBITS) sequential number q Unique within device

Choosing the ioctl Commands q < direction, type, number, size> n size: size of

Choosing the ioctl Commands q < direction, type, number, size> n size: size of user data involved q _IOC_SIZEBITS § n Usually 14 bits but could be overridden by architecture #define SCULL_IOCSQUANTUM _IOW(SCULL_IOC_MAGIC, 1, int) /* provoke compile error for invalid uses of size argument */ extern unsigned int __invalid_size_argument_for_IOC; #define _IOC_TYPECHECK(t) ((sizeof(t) == sizeof(t[1]) && sizeof(t) < (1 << _IOC_SIZEBITS)) ? sizeof(t) : __invalid_size_argument_for_IOC) /* See http: //lwn. net/Articles/48354/ */

Choosing the ioctl Commands n Useful macros to create ioctl command numbers q q

Choosing the ioctl Commands n Useful macros to create ioctl command numbers q q n _IO(type, nr) arg is unsigned long (integer) _IOR(type, nr, datatype) _IOWR(type, nr, datatype) arg is a pointer _IO*_BAD used for backward compatibility q Uses number (of bytes) rather than datatype q http: //lkml. iu. edu//hypermail/linux/kernel/0310. 1/0019. html

Choosing the ioctl Commands n Useful macros to decode ioctl command numbers q q

Choosing the ioctl Commands n Useful macros to decode ioctl command numbers q q _IOC_DIR(nr) _IOC_TYPE(nr) _IOC_NR(nr) _IOC_SIZE(nr)

Choosing the ioctl Commands n The scull example /* Use 'k' as magic number

Choosing the ioctl Commands n The scull example /* Use 'k' as magic number (type) field */ #define SCULL_IOC_MAGIC 'k‘ /* Please use a different 8 -bit number in your code */ #define SCULL_IOCRESET _IO(SCULL_IOC_MAGIC, 0)

Choosing the ioctl Commands n The scull example /* * S means "Set" through

Choosing the ioctl Commands n The scull example /* * S means "Set" through a ptr, * T means "Tell" directly with the argument value * G means "Get": reply by setting through a pointer * Q means "Query": response is on the return value * X means "e. Xchange": switch G and S atomically * H means "s. Hift": switch T and Q atomically */ #define SCULL_IOCSQUANTUM _IOW(SCULL_IOC_MAGIC, 1, int) #define SCULL_IOCSQSET _IOW(SCULL_IOC_MAGIC, 2, int) #define SCULL_IOCTQUANTUM _IO(SCULL_IOC_MAGIC, 3) #define SCULL_IOCTQSET _IO(SCULL_IOC_MAGIC, 4) #define SCULL_IOCGQUANTUM _IOR(SCULL_IOC_MAGIC, 5, int) Set new value and return the old value

Choosing the ioctl Commands n The scull example #define #define. . . #define SCULL_IOCGQSET

Choosing the ioctl Commands n The scull example #define #define. . . #define SCULL_IOCGQSET _IOR(SCULL_IOC_MAGIC, 6, int) SCULL_IOCQQUANTUM _IO(SCULL_IOC_MAGIC, 7) SCULL_IOCQQSET _IO(SCULL_IOC_MAGIC, 8) SCULL_IOCXQUANTUM _IOWR(SCULL_IOC_MAGIC, 9, int) SCULL_IOCXQSET _IOWR(SCULL_IOC_MAGIC, 10, int) SCULL_IOCHQUANTUM _IO(SCULL_IOC_MAGIC, 11) SCULL_IOCHQSET _IO(SCULL_IOC_MAGIC, 12) SCULL_IOC_MAXNR 14

The Return Value n When the command number is not supported q q –ENOTTY

The Return Value n When the command number is not supported q q –ENOTTY (according to the POSIX standard) Some drivers may (in conflict with the POSIX standard) return –EINVAL

The Predefined Commands n Handled by the kernel first q n Will not be

The Predefined Commands n Handled by the kernel first q n Will not be passed down to device drivers Three groups q For any file (regular, device, FIFO, socket) n q q Magic number: “T. ” For regular files only Specific to the file system type n E. g. , see ext 2_ioctl()

Using the ioctl Argument n n If it is an integer, just use it

Using the ioctl Argument n n If it is an integer, just use it directly If it is a pointer q Need to check for valid user address int access_ok(int type, const void *addr, unsigned long size); n type: either VERIFY_READ or VERIFY_WRITE n Returns 1 for success, 0 for failure q Driver then results –EFAULT to the caller Defined in <linux/uaccess. h> n Mostly called by memory-access routines n

Using the ioctl Argument n The scull example int scull_ioctl(struct file *filp, unsigned int

Using the ioctl Argument n The scull example int scull_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) { int err = 0, tmp; int retval = 0; /* check the magic number and whether the command is defined */ if (_IOC_TYPE(cmd) != SCULL_IOC_MAGIC) { return -ENOTTY; } if (_IOC_NR(cmd) > SCULL_IOC_MAXNR) { return -ENOTTY; } …

Using the ioctl Argument n The scull example … /* the concept of "read"

Using the ioctl Argument n The scull example … /* the concept of "read" and "write" is reversed here */ if (_IOC_DIR(cmd) & _IOC_READ) { err = !access_ok(VERIFY_WRITE, (void __user *) arg, _IOC_SIZE(cmd)); } else if (_IOC_DIR(cmd) & _IOC_WRITE) { err = !access_ok(VERIFY_READ, (void __user *) arg, _IOC_SIZE(cmd)); } if (err) return -EFAULT; …

Capabilities and Restricted Operations n n n Limit certain ioctl operations to privileged users

Capabilities and Restricted Operations n n n Limit certain ioctl operations to privileged users See <linux/capability. h> for the full set of capabilities To check a certain capability call int capable(int capability); n In the scull example if (!capable(CAP_SYS_ADMIN)) { return –EPERM; } n A catch-all capability for many system administration operations http: //lwn. net/Articles/486306/

The Implementation of the ioctl Commands n A giant switch statement … switch(cmd) {

The Implementation of the ioctl Commands n A giant switch statement … switch(cmd) { case SCULL_IOCRESET: scull_quantum = SCULL_QUANTUM; scull_qset = SCULL_QSET; break; case SCULL_IOCSQUANTUM: /* Set: arg points to the value */ if (!capable(CAP_SYS_ADMIN)) { return -EPERM; } retval = __get_user(scull_quantum, (int __user *)arg); break; …

The Implementation of the ioctl Commands n Six ways to pass and receive arguments

The Implementation of the ioctl Commands n Six ways to pass and receive arguments from the user space Need to know command number q int quantum; ioctl(fd, SCULL_IOCSQUANTUM, &quantum); /* Set by pointer */ ioctl(fd, SCULL_IOCTQUANTUM, quantum); /* Set by value */ ioctl(fd, SCULL_IOCGQUANTUM, &quantum); /* Get by pointer */ quantum = ioctl(fd, SCULL_IOCQQUANTUM); /* Get by return value */ ioctl(fd, SCULL_IOCXQUANTUM, &quantum); /* Exchange by pointer */ quantum = ioctl(fd, SCULL_IOCHQUANTUM, quantum); /* Exchange by value */

Pros/Cons of ioctl n Cons q Unregulated means to add new system call n

Pros/Cons of ioctl n Cons q Unregulated means to add new system call n API q q n 32/64 -bit compatibility No way to enumerate Pros q n Not reviewed Different for each device read and write with one call Ref q http: //lwn. net/Articles/191653/

Device Control Without ioctl n Writing control sequences into the data stream itself q

Device Control Without ioctl n Writing control sequences into the data stream itself q q Example: console escape sequences Advantages: n q No need to implement ioctl methods Disadvantages: n Need to make sure that escape sequences do not appear in the normal data stream (e. g. , cat a binary file) n Need to parse the data stream

Device Control Without ioctl n sysfs q q n Netlink q n Getting/setting socket

Device Control Without ioctl n sysfs q q n Netlink q n Getting/setting socket options debugfs q n Can be used to enumerate all exported components Use standard unix shell commands Probably not a good choice since its purpose is for debugging relay interface q https: //www. kernel. org/doc/Documentation/filesystems/relay. txt

SLEEPING

SLEEPING

Sleeping n n Suspend thread waiting for some condition Example usage: Blocking I/O q

Sleeping n n Suspend thread waiting for some condition Example usage: Blocking I/O q q Data is not immediately available for reads When the device is not ready to accept data n Output buffer is full

Introduction to Sleeping n n A process is removed from the scheduler’s run queue

Introduction to Sleeping n n A process is removed from the scheduler’s run queue Certain rules q Generally never sleep when running in an atomic context n n n Multiple steps must be performed without concurrent accesses Not while holding a spinlock, seqlock, or RCU lock Not while disabling interrupts

Introduction to Sleeping n After waking up q q q Make no assumptions about

Introduction to Sleeping n After waking up q q q Make no assumptions about the state of the system The resource one is waiting for might be gone again Must check the wait condition again

Introduction to Sleeping n Wait queue: contains a list of processes waiting for a

Introduction to Sleeping n Wait queue: contains a list of processes waiting for a specific event q #include <linux/wait. h> q To initialize statically, call DECLARE_WAIT_QUEUE_HEAD(my_queue); q To initialize dynamically, call wait_queue_head_t my_queue; init_waitqueue_head(&my_queue);

Simple Sleeping n Call variants of wait_event macros q wait_event(queue, condition) n queue =

Simple Sleeping n Call variants of wait_event macros q wait_event(queue, condition) n queue = wait queue head q n Waits until the boolean condition becomes true n Puts into an uninterruptible sleep q q Passed by value Usually is not what you want wait_event_interruptible(queue, condition) n n Can be interrupted by signals Returns nonzero if sleep was interrupted q Your driver should return -ERESTARTSYS

Simple Sleeping q wait_event_timeout(queue, condition, timeout) n n q Wait for a limited time

Simple Sleeping q wait_event_timeout(queue, condition, timeout) n n q Wait for a limited time (in jiffies) Returns 0 regardless of condition evaluations wait_event_interruptible_timeout(queue, condition, timeout)

Simple Sleeping n To wake up, call variants of wake_up functions void wake_up(wait_queue_head_t *queue);

Simple Sleeping n To wake up, call variants of wake_up functions void wake_up(wait_queue_head_t *queue); n Wakes up all processes waiting on the queue void wake_up_interruptible(wait_queue_head_t *queue); n Wakes up processes that perform an interruptible sleep

Simple Sleeping n Example module: sleepy static DECLARE_WAIT_QUEUE_HEAD(wq); static int flag = 0; ssize_t

Simple Sleeping n Example module: sleepy static DECLARE_WAIT_QUEUE_HEAD(wq); static int flag = 0; ssize_t sleepy_read(struct file *filp, char __user *buf, size_t count, loff_t *pos) { printk(KERN_DEBUG "process %i (%s) going to sleepn", current->pid, current->comm); wait_event_interruptible(wq, flag != 0); Multiple threads can flag = 0; wake up at printk(KERN_DEBUG "awoken %i (%s)n", current->pid, this point current->comm); return 0; /* EOF */ }

Simple Sleeping n Example module: sleepy ssize_t sleepy_write(struct file *filp, const char __user *buf,

Simple Sleeping n Example module: sleepy ssize_t sleepy_write(struct file *filp, const char __user *buf, size_t count, loff_t *pos) { printk(KERN_DEBUG "process %i (%s) awakening the readers. . . n", current->pid, current->comm); flag = 1; wake_up_interruptible(&wq); return count; /* succeed, to avoid retrial */ }

Blocking and Nonblocking Operations n By default, operations block q q n If no

Blocking and Nonblocking Operations n By default, operations block q q n If no data is available for reads If no space is available for writes Non-blocking I/O is indicated by the O_NONBLOCK flag in filp->f_flags q q Defined in <linux/fcntl. h> Only open, read, and write calls are affected Returns –EAGAIN immediately instead of block Applications need to distinguish non-blocking returns vs. EOFs

A Blocking I/O Example n scullpipe q A read process n n q Blocks

A Blocking I/O Example n scullpipe q A read process n n q Blocks when no data is available Wakes a blocking write when buffer space becomes available A write process n n Blocks when no buffer space is available Wakes a blocking read process when data arrives

A Blocking I/O Example n scullpipe data structure struct scull_pipe { wait_queue_head_t inq, outq;

A Blocking I/O Example n scullpipe data structure struct scull_pipe { wait_queue_head_t inq, outq; /* read and write queues */ char *buffer, *end; /* begin of buf, end of buf */ int buffersize; /* used in pointer arithmetic */ char *rp, *wp; /* where to read, where to write */ int nreaders, nwriters; /* number of openings for r/w */ struct fasync_struct *async_queue; /* asynchronous readers */ struct mutex; /* mutual exclusion */ struct cdev; /* Char device structure */ };

A Blocking I/O Example static ssize_t scull_p_read(struct file *filp, char __user *buf, size_t count,

A Blocking I/O Example static ssize_t scull_p_read(struct file *filp, char __user *buf, size_t count, loff_t *f_pos) { struct scull_pipe *dev = filp->private_data; if (mutex_lock_interruptible(&dev->mutex)) return -ERESTARTSYS; while (dev->rp == dev->wp) { /* nothing to read */ mutex_unlock(&dev->mutex); /* release the lock */ if (filp->f_flags & O_NONBLOCK) return -EAGAIN; if (wait_event_interruptible(dev->inq, (dev->rp != dev->wp))) return -ERESTARTSYS; if (mutex_lock_interruptible(&dev->mutex)) return -ERESTARTSYS; }

A Blocking I/O Example if (dev->wp > dev->rp) count = min(count, (size_t)(dev->wp - dev->rp));

A Blocking I/O Example if (dev->wp > dev->rp) count = min(count, (size_t)(dev->wp - dev->rp)); else /* the write pointer has wrapped */ count = min(count, (size_t)(dev->end - dev->rp)); if (copy_to_user(buf, dev->rp, count)) { mutex_lock(&dev->mutex); return -EFAULT; } dev->rp += count; if (dev->rp == dev->end) dev->rp = dev->buffer; /* wrapped */ mutex_unlock(&dev->mutex); /* finally, awake any writers and return */ wake_up_interruptible(&dev->outq); return count; }

LLSEEK()

LLSEEK()

The llseek Implementation n Implements lseek and llseek system calls q Modifies filp->f_pos loff_t

The llseek Implementation n Implements lseek and llseek system calls q Modifies filp->f_pos loff_t scull_llseek(struct file *filp, loff_t off, int whence) { struct scull_dev *dev = filp->private_data; loff_t newpos; switch(whence) { case 0: /* SEEK_SET */ newpos = off; break; case 1: /* SEEK_CUR, relative to the current position */ newpos = filp->f_pos + off; break;

The llseek Implementation case 2: /* SEEK_END, relative to the end of the file

The llseek Implementation case 2: /* SEEK_END, relative to the end of the file */ newpos = dev->size + off; break; default: /* can't happen */ return -EINVAL; } if (newpos < 0) return -EINVAL; filp->f_pos = newpos; return newpos; }

The llseek Implementation n May not make sense for serial ports and keyboard inputs

The llseek Implementation n May not make sense for serial ports and keyboard inputs q Need to inform the kernel via calling nonseekable_open in the open method int nonseekable_open(struct inode *inode, struct file *filp); q Replace llseek method with no_llseek (defined in <linux/fs. h> in your file_operations structure