Time Delays and Deferred Work Linux Kernel Programming

  • Slides: 59
Download presentation
Time, Delays, and Deferred Work Linux Kernel Programming CIS 4930/COP 5641

Time, Delays, and Deferred Work Linux Kernel Programming CIS 4930/COP 5641

Topics n Time Accounting q Points in time n q Lengths of time n

Topics n Time Accounting q Points in time n q Lengths of time n q “clock sources” Time delayed execution q q Start performing work at some future time instant Linux facilities “clock events” https: //lkml. org/lkml/2010/11/15/101 n n E. g. , 25 msec Linux facilities n n E. g. , Current time

PROCESSOR “CLOCKS”

PROCESSOR “CLOCKS”

Counting Time n Periodic timer interrupts q q n Generated by the timing hardware

Counting Time n Periodic timer interrupts q q n Generated by the timing hardware (e. g. , local APIC) Programmed at boot time according to HZ n Architecture-dependent value defined in <linux/param. h> n Usually 100 to 1, 000 interrupts per second Every time a timer interrupt occurs, a kernel counter called jiffies is incremented q Initialized to 0 at system boot

Using the jiffies Counter n jiffies are read-only n Example #include <linux/jiffies. h> unsigned

Using the jiffies Counter n jiffies are read-only n Example #include <linux/jiffies. h> unsigned long j, stamp_1, stamp_half, stamp_n; j = jiffies; /* read the current value */ stamp_1 = j + HZ; /* 1 second in the future */ stamp_half = j + HZ/2; /* half a second */ stamp_n = j + n*HZ/1000; /* n milliseconds */

Using the jiffies Counter n Jiffies may wrap - use these macro functions #include

Using the jiffies Counter n Jiffies may wrap - use these macro functions #include <linux/jiffies. h> /* check if a is after b */ int time_after(unsigned long a, unsigned long b); /* check if a is before b */ int time_before(unsigned long a, unsigned long b); /* check if a is after or equal to b */ int time_after_eq(unsigned long a, unsigned long b); /* check if a is before or equal to b */ int time_before_eq(unsigned long a, unsigned long b);

Using the jiffies Counter n n n 32 -bit counter could wrap every 50

Using the jiffies Counter n n n 32 -bit counter could wrap every 50 days Tracking larger time intervals struct timespec { time_t tv_sec; long tv_nsec; timeval and timespec #include <linux/time. h> }; unsigned long timespec_to_jiffies(struct timespec *value); void jiffies_to_timespec(unsigned long jiffies, struct timespec *value); unsigned long timeval_to_jiffies(struct timeval *value); void jiffies_to_timeval(unsigned long jiffies, struct timeval *value); struct timeval { time_t tv_sec; susecond_t tv_usec; };

Knowing the Current Time n n jiffies represents only the time since the last

Knowing the Current Time n n jiffies represents only the time since the last boot To obtain wall-clock time, use #include <linux/time. h> /* near microsecond resolution */ void do_gettimeofday(struct timeval *tv); /* based on xtime, near jiffy resolution */ struct timespec current_kernel_time(void);

Using the jiffies Counter n To access the 64 -bit counter jiffie_64 on 32

Using the jiffies Counter n To access the 64 -bit counter jiffie_64 on 32 -bit machines, call #include <linux/jiffies. h> u 64 get_jiffies_64(void);

Processor-Specific Clocks n Typically higher-resolution than jiffies q n n Typically drifts more than

Processor-Specific Clocks n Typically higher-resolution than jiffies q n n Typically drifts more than other hardware Implementation is often a CPU register q n Ticks faster than once per msec Incremented once per clock cycle Platform-dependent q q q Register may not exist May not be accessible from user space May not be writable n n q Resetting this counter discouraged Other users/CPUs might rely on it for synchronizations May be 64 -bit or 32 -bit wide n Need to worry about overflows for 32 -bit counters

Processor-Specific Registers n Pentium Timestamp counter (TSC) q q q Introduced with the Pentium

Processor-Specific Registers n Pentium Timestamp counter (TSC) q q q Introduced with the Pentium 64 -bit register that counts CPU clock cycles Readable from both kernel space and user space n q Can be set to require the CPU to be in supervisor mode May not be constant across cores n constant_tsc flag in /proc/cpuinfo

RDTSCP (Extension of RDTSC) n n Serializing variant of RDTSC “The RDTSCP instruction waits

RDTSCP (Extension of RDTSC) n n Serializing variant of RDTSC “The RDTSCP instruction waits until all previous instructions have been executed before reading the counter. However, subsequent instructions may begin execution before the read operation is performed” q http: //www. intel. com/content/dam/www/public/us/en/documents/w hite-papers/ia-32 -ia-64 -benchmark-code-execution-paper. pdf

Processor-Specific Registers n To access the counter, include <asm/msr. h> and use the following

Processor-Specific Registers n To access the counter, include <asm/msr. h> and use the following marcos /* read into two 32 -bit variables */ rdtsc(low 32, high 32); /* read low half into a 32 -bit variable */ rdtscl(low 32); /* read into a 64 -bit long variable */ rdtscll(var 64); n 1 -GHz CPU overflows the low half of the counter every 4. 2 seconds

Architecture Independent (somewhat) n get_cycles() #include <linux/tsc. h> cycles_t get_cycles(void); q n Returns 0

Architecture Independent (somewhat) n get_cycles() #include <linux/tsc. h> cycles_t get_cycles(void); q n Returns 0 on platforms that have no cycle-counter register sched_clock() q q q Used by CPU scheduler Number of elapsed nanosecs Monotonically increasing

OPERATION START TIME DELAYS

OPERATION START TIME DELAYS

ready (competing for execution time) blocked running (executing)

ready (competing for execution time) blocked running (executing)

Delaying Execution Start Time n n n Busy waiting Sleeping Mixtures of the two

Delaying Execution Start Time n n n Busy waiting Sleeping Mixtures of the two

Busy Waiting n Not recommended while (time_before(jiffie, j 1)) { cpu_relax(); } q j

Busy Waiting n Not recommended while (time_before(jiffie, j 1)) { cpu_relax(); } q j 1 is the jiffie value at the expiration of the delay q cpu_relax() is an architecture-specific way of saying that you’re not doing much with the CPU n E. g. , hyperthreaded processors

Busy Waiting n May degrade system performance q If the kernel does not allow

Busy Waiting n May degrade system performance q If the kernel does not allow preemption n n q Loop locks the processor for the duration of the delay Processor not performing “useful” work until time j 1 is reached If the interrupts are disabled when a process enters this loop n jiffies will not be updated n Even for a preemptive kernel

Busy Waiting n Behavior of a simple busy-waiting program loop { /* print begin

Busy Waiting n Behavior of a simple busy-waiting program loop { /* print begin jiffie */ /* busy wait for one second */ /* print end jiffie */ } q Nonpreemptive kernel, no background load n n n Begin: 1686518, end: 1687518 Begin: 1687519, end: 1688519 Begin: 1688520, end: 1689520

Busy Waiting q Nonpreemptive kernel, heavy background load n n q Begin: 1911226, end:

Busy Waiting q Nonpreemptive kernel, heavy background load n n q Begin: 1911226, end: 1912226 Begin: 1913323, end: 1914323 Preemptive kernel, heavy background load n n n Begin: 14940680, end: 14942777 Begin: 14942778, end: 14945430 The process has been interrupted during its delay

Yielding the Processor (Busy/Sleeping) n Explicitly releases the CPU when not using it while

Yielding the Processor (Busy/Sleeping) n Explicitly releases the CPU when not using it while (time_before(jiffie, j 1)) { schedule(); } q Behavior similar to busy waiting under a preemptive kernel n n Still consumes CPU cycles and battery power No guarantee that the process will get the CPU back soon

Questions you should ask? (Documentation/timers-howto. txt) n Atomic vs non-atomic context? q No sleeping!

Questions you should ask? (Documentation/timers-howto. txt) n Atomic vs non-atomic context? q No sleeping! n n Unexpected deadlocks Interrupted process will be delayed on an unrelated event q q q Higher priority process could be delayed Reduced concurrency Check if can sleep? n n n Driver must keep track Do not use in_atomic() http: //lwn. net/Articles/274695/

No Sleeping Functions n *delay q q q ndelay(unsigned long nsecs) udelay(unsigned long usecs)

No Sleeping Functions n *delay q q q ndelay(unsigned long nsecs) udelay(unsigned long usecs) mdelay(unsigned long msecs)

Sleeping Functions n Backed by hrtimers q n usleep_range(unsigned long min, unsigned long max)

Sleeping Functions n Backed by hrtimers q n usleep_range(unsigned long min, unsigned long max) Backed by jiffies / legacy_timers q q msleep(unsigned long msecs) msleep_interruptible(unsigned long msecs) msleep is not interruptible msleep_interruptible returns the remaining milliseconds

Desired Precision of Delay n The following may not always be true depending on

Desired Precision of Delay n The following may not always be true depending on the architecture, configuration, etc. q High precision n q usleep* Low precision n msleep*

TIMEOUTS

TIMEOUTS

Timeouts #include <linux/wait. h> long wait_event_timeout(wait_queue_head_t q, condition, long timeout); long wait_event_interruptible_timeout(wait_queue_head_t q, condition,

Timeouts #include <linux/wait. h> long wait_event_timeout(wait_queue_head_t q, condition, long timeout); long wait_event_interruptible_timeout(wait_queue_head_t q, condition, long timeout); q q Bounded sleep timeout: in number of jiffies to wait, signed If the timeout expires, return 0 If the call is interrupted, return the remaining jiffies

Timeouts n Example wait_queue_head_t wait; init_waitqueue_head(&wait); wait_event_interruptible_timeout(wait, 0, delay); q condition = 0 (no

Timeouts n Example wait_queue_head_t wait; init_waitqueue_head(&wait); wait_event_interruptible_timeout(wait, 0, delay); q condition = 0 (no condition to wait for) q Execution resumes when n Someone calls wake_up() n Timeout expires

Timeouts n Another way to schedule timeout #include <linux/sched. h> signed long schedule_timeout(signed long

Timeouts n Another way to schedule timeout #include <linux/sched. h> signed long schedule_timeout(signed long timeout); q timeout: the number of jiffies to delay Require the caller set the current process state q set_current_state(TASK_INTERRUPTIBLE); schedule_timeout(delay); q A process may not resume immediately after the timer expires

SOFTWARE INTERRUPTS

SOFTWARE INTERRUPTS

interrupt A ? Scheduler CS Software Interrupts (Bottom Halves) CS ISR time

interrupt A ? Scheduler CS Software Interrupts (Bottom Halves) CS ISR time

Software Interrupts kernel/softirq. c char*softirq_to_name[NR_SOFTIRQS] = { "HI", "TIMER", "NET_TX", "NET_RX", "BLOCK_IOPOLL", "TASKLET", "SCHED",

Software Interrupts kernel/softirq. c char*softirq_to_name[NR_SOFTIRQS] = { "HI", "TIMER", "NET_TX", "NET_RX", "BLOCK_IOPOLL", "TASKLET", "SCHED", "HRTIMER", "RCU” }; n

KERNEL TIMERS <LINUX/TIMER. H>

KERNEL TIMERS <LINUX/TIMER. H>

Kernel Timers n A kernel timer schedules a function to run at a specified

Kernel Timers n A kernel timer schedules a function to run at a specified time q E. g. , polling a device at regular intervals

Kernel Timers n The scheduled function is run as a software interrupt q Needs

Kernel Timers n The scheduled function is run as a software interrupt q Needs to observe constraints imposed on this interrupt/atomic context n Not associated with any user-level process q q n No access to user space The current pointer is not meaningful No sleeping or scheduling may be performed q No calls to schedule(), wait_event(), kmalloc(…, GFP_KERNEL), or mutex_*

Kernel Timers q n Both defined in <asm/hardirq. h> More on kernel timers q

Kernel Timers q n Both defined in <asm/hardirq. h> More on kernel timers q q q A task can reregister itself (e. g. , polling) Reregistered timer tries to run on the same CPU A potential source of race conditions, even on uniprocessor systems n Need to protect data structures accessed by the timer function (via atomic types or spinlocks)

The Timer API n Basic building blocks #include <linux/timer. h> struct timer_list { jiffies

The Timer API n Basic building blocks #include <linux/timer. h> struct timer_list { jiffies value when the /*. . . */ timer is expected to run unsigned long expires; void (*function) (unsigned long); Called with data as unsigned long data; argument; pointer cast to }; unsigned long void init_timer(struct timer_list *timer); struct timer_list TIMER_INITIALIZER(_function, _expires, _data); void add_timer(struct timer_list *timer); int del_timer(struct timer_list *timer);

The Timer API n Example (see jiq. c)

The Timer API n Example (see jiq. c)

The Timer API n Other functions /* update the expiration time of a timer

The Timer API n Other functions /* update the expiration time of a timer */ int mod_timer(struct timer_list *timer, unsigned long expires); /* like del_timer, but SMP safe */ int del_timer_sync(struct timer_list *timer); /* returns true if the timer is scheduled to run */ int timer_pending(const struct timer_list * timer);

The Implementation of Kernel Timers n Requirements q q n Lightweight Scale as the

The Implementation of Kernel Timers n Requirements q q n Lightweight Scale as the number of timers increases Most timers expire within a few seconds Run on the same registered CPU Solution (details omitted) q Per-CPU data structure

HIGH RESOLUTION TIMERS

HIGH RESOLUTION TIMERS

High Resolution Timers n n Some operations require more precise wakeup times Expiration is

High Resolution Timers n n Some operations require more precise wakeup times Expiration is specified in nsecs, instead of jiffies q q Precision will depend on hardware available May not work in virtual machines

High Resolution Timers n Interface similar to regular timers q see include/linux/hrtimer. h q

High Resolution Timers n Interface similar to regular timers q see include/linux/hrtimer. h q ktime_t datatype n include/linux/ktime. h n Representation depends on architecture Use helper functions! n n Uses per-CPU red-black trees to sort events Documentation/timers/highres. txt http: //lwn. net/Articles/167897/

TASKLETS

TASKLETS

Tasklets n Resemble kernel timers q Always run at interrupt time On the same

Tasklets n Resemble kernel timers q Always run at interrupt time On the same CPU that schedules them Receive an unsigned long argument q Can reregister itself q q n Unlike kernel timers q Only can ask a tasklet to be run later (not at a specific time)

Tasklets n Useful with hardware interrupt handling q q n n Must be handled

Tasklets n Useful with hardware interrupt handling q q n n Must be handled as quickly as possible A tasklet is handled later in a soft interrupt Can be enabled/disabled (nested semantics) Can run at normal or high priority May run immediately, but no later than the next timer tick Cannot be run concurrently with itself

Tasklets n Basic building blocks #include <linux/interrupt. h> struct tasklet_struct { /*. . .

Tasklets n Basic building blocks #include <linux/interrupt. h> struct tasklet_struct { /*. . . */ void (*func)(unsigned long); unsigned long data; }; void tasklet_init(struct tasklet_struct *t, void (*func) (unsigned long), unsigned long data); DECLARE_TASKLET(name, func, data); DECLARE_TASKLET_DISABLED(name, func, data);

Tasklet Interface /* make a tasklet stop running immediately; will not execute until it

Tasklet Interface /* make a tasklet stop running immediately; will not execute until it is enabled again */ void tasklet_disable(struct tasklet_struct *t); /* disable the tasklet when it returns */ void tasklet_disable_nosync(struct tasklet_struct *t); /* need the same number of enable calls as disable calls */ void tasklet_enable(struct tasklet_struct *t); /* Ignore if the tasklet is already scheduled */ /* If a tasklet is already running, run the tasklet again after it completes */ void tasklet_schedule(struct tasklet_struct *t);

Tasklet Interface /* schedule the tasklet with higher priority */ void tasklet_hi_schedule(struct tasklet_struct *t);

Tasklet Interface /* schedule the tasklet with higher priority */ void tasklet_hi_schedule(struct tasklet_struct *t); /* ensures that the tasklet is not scheduled to run again */ /* will finish scheduled tasklet */ void tasklet_kill(struct tasklet_struct *t);

WORKQUEUES

WORKQUEUES

Workqueues (may replace tasklets) n Similar to tasklets q q n Kernel can request

Workqueues (may replace tasklets) n Similar to tasklets q q n Kernel can request a function to be called later Cannot access the user space Unlike tasklets q q Queued task may run on a different CPU Workqueue functions are associated with kernel processes n q Can sleep Can be delayed for an explicit interval

Workqueues n Requires struct workqueue_struct q n Defined in <linux/workqueue. h> To create a

Workqueues n Requires struct workqueue_struct q n Defined in <linux/workqueue. h> To create a workqueue, call /* create one workqueue thread per processor */ struct workqueue_struct *create_workqueue(const char *name); /* create a single workqueue thread */ struct workqueue_struct * create_singlethread_workqueue(const char *name);

Workqueues n To submit a task to a workqueue, you need to fill in

Workqueues n To submit a task to a workqueue, you need to fill in a work_structure q At compile time, call DECLARE_WORK(name, void (*function)(void *)); q At runtime, call one of the following /* does a more thorough job of initializing the structure */ INIT_WORK(struct work_struct *work, void (*function) (void *)); /* does not link the work_struct into the workqueue */ PREPARE_WORK(struct work_struct *work, void (*function) (void *));

Workqueues n To submit work to a workqueue, call either int queue_work(struct workqueue_struct *queue,

Workqueues n To submit work to a workqueue, call either int queue_work(struct workqueue_struct *queue, struct work_struct *work); /* may specify the delay in jiffies */ int queue_delayed_work(struct workqueue_struct *queue, struct work_struct *work, unsigned long delay); n To cancel a pending workqueue entry, call /* returns nonzero if the entry is still pending */ int cancel_delayed_work(struct work_struct *work);

Workqueues n To destroy a workqueue, call void flush_workqueue(struct workqueue_struct *queue); n To destroy

Workqueues n To destroy a workqueue, call void flush_workqueue(struct workqueue_struct *queue); n To destroy a workqueue, call void destroy_workqueue(struct workqueue_struct *queue);

Timers: Relative vs Absolute Time n What is the difference? wake_time = start_time period

Timers: Relative vs Absolute Time n What is the difference? wake_time = start_time period = 5 msec timer_handler() { <do work> restart_timer(period, RELATIVE) } timer_handler() { <do work> wake_time += period restart_timer(wake_time, ABSOLUTE) }

Various Delayed Execution Methods Interruptible during the wait No busy waiting Good precision for

Various Delayed Execution Methods Interruptible during the wait No busy waiting Good precision for Fine-grained delay Scheduled task can access user space Can sleep inside the scheduled task Busy waiting Maybe No No Yes Yielding the processor Yes Maybe No Yes Timeouts Maybe Yes Yes msleep ssleep No Yes Yes msleep_interruptible Yes No Yes ndelay udelay mdelay No No Maybe Yes Kernel timers Yes Yes No No Tasklets Yes No No No Workqueues Yes Yes No Yes

Additional References (not including those previously listed in n http: //elinux. org/Kernel_Timer_Systems the slides)

Additional References (not including those previously listed in n http: //elinux. org/Kernel_Timer_Systems the slides)