This chapter describes the locking primitives and thread synchronization mechanisms of the illumos multithreaded kernel. You should design device drivers to take advantage of multithreading. This chapter provides information on the following subjects:
3.1. Locking Primitives
In traditional UNIX systems, every section of kernel code terminates either through an explicit call to sleep(1) to give up the processor or through a hardware interrupt. illumos operates differently. A kernel thread can be preempted at any time to run another thread. Because all kernel threads share kernel address space and often need to read and modify the same data, the kernel provides a number of locking primitives to prevent threads from corrupting shared data. These mechanisms include mutual exclusion locks, which are also known as mutexes, readers/writer locks, and semaphores.
3.1.1. Storage Classes of Driver Data
The storage class of data is a guide to whether the driver might need to take explicit steps to control access to the data. The three data storage classes are:
Automatic (stack) data. Every thread has a private stack, so drivers never need to lock automatic variables.
Global static data. Global static data can be shared by any number of threads in the driver. The driver might need to lock this type of data at times.
Kernel heap data. Any number of threads in the driver can share kernel heap data, such as data allocated by kmem_alloc(9F). The driver needs to protect shared data at all times.
3.1.2. Mutual-Exclusion Locks
A mutual-exclusion lock, or mutex, is usually associated with a set of data and regulates access to that data. Mutexes provide a way to allow only one thread at a time access to that data. The mutex functions are:
Releases any associated storage.
Acquires a mutex.
Releases a mutex.
Initializes a mutex.
Tests to determine whether the mutex is held by the current thread. To be used in ASSERT(9F) only.
Acquires a mutex if available, but does not block.
Setting Up Mutexes
Device drivers usually allocate a mutex for each driver data structure. The
mutex is typically a field in the structure of type
kmutex_t. mutex_init(9F) is called to prepare the mutex for use. This call is usually made
at attach(9E) time for per-device mutexes and _init(9E) time for global driver mutexes.
struct xxstate *xsp; /* ... */ mutex_init(&xsp->mu, NULL, MUTEX_DRIVER, NULL); /* ... */
For a more complete example of mutex initialization, see Driver Autoconfiguration.
The driver must destroy the mutex with mutex_destroy(9F) before being unloaded. Destroying the mutex is usually done at detach(9E) time for per-device mutexes and _fini(9E) time for global driver mutexes.
Every section of the driver code that needs to read or write the shared data structure must do the following tasks:
Acquire the mutex
Access the data
Release the mutex
The scope of a mutex, that is, the data the mutex protects, is entirely up to the programmer. A mutex protects a data structure only if every code path that accesses the data structure does so while holding the mutex.
3.1.3. Readers/Writer Locks
A readers/writer lock regulates access to a set of data. The readers/writer lock is so called because many threads can hold the lock simultaneously for reading, but only one thread can hold the lock for writing.
Most device drivers do not use readers/writer locks. These locks are slower than mutexes. The locks provide a performance gain only when they protect commonly read data that is not frequently written. In this case, contention for a mutex could become a bottleneck, so using a readers/writer lock might be more efficient. The readers/writer functions are summarized in the following table. See the rwlock(9F) man page for detailed information. The readers/writer lock functions are:
Destroys a readers/writer lock
Downgrades a readers/writer lock holder from writer to reader
Acquires a readers/writer lock
Releases a readers/writer lock
Initializes a readers/writer lock
Determines whether a readers/writer lock is held for read or write
Attempts to acquire a readers/writer lock without waiting
Attempts to upgrade readers/writer lock holder from reader to writer
Counting semaphores are available as an alternative primitive for managing threads within device drivers. See the semaphore(9F) man page for more information. The semaphore functions are:
Destroys a semaphore.
Initialize a semaphore.
Decrement semaphore and possibly block.
Decrement semaphore but do not block if signal is pending. See Threads Unable to Receive Signals.
Attempt to decrement semaphore, but do not block.
Increment semaphore and possibly unblock waiter.
3.2. Thread Synchronization
In addition to protecting shared data, drivers often need to synchronize execution among multiple threads.
3.2.1. Condition Variables in Thread Synchronization
Condition variables are a standard form of thread synchronization. They are designed to be used with mutexes. The associated mutex is used to ensure that a condition can be checked atomically, and that the thread can block on the associated condition variable without missing either a change to the condition or a signal that the condition has changed.
The condvar(9F) functions are:
Signals all threads waiting on the condition variable.
Destroys a condition variable.
Initializes a condition variable.
Signals one thread waiting on the condition variable.
Waits for condition, time-out, or signal. See Threads Unable to Receive Signals.
Waits for condition or time-out.
Waits for condition.
Waits for condition or return zero on receipt of a signal. See Threads Unable to Receive Signals.
Initializing Condition Variables
Declare a condition variable of type
kcondvar_t for each condition. Usually, the condition variables are declared in the driver's
soft-state structure. Use cv_init(9F) to initialize each condition variable. Similar to mutexes, condition variables are
usually initialized at attach(9E) time. A typical example of initializing a condition variable is:
cv_init(&xsp->cv, NULL, CV_DRIVER, NULL);
For a more complete example of condition variable initialization, see Driver Autoconfiguration.
Waiting for the Condition
To use condition variables, follow these steps in the code path waiting for the condition:
Acquire the mutex guarding the condition.
Test the condition.
If the test results do not allow the thread to continue, use cv_wait(9F) to block the current thread on the condition. The cv_wait(9F) function releases the mutex before blocking the thread and reacquires the mutex before returning. On return from cv_wait(9F), repeat the test.
After the test allows the thread to continue, set the condition to its new value. For example, set a device flag to busy.
Release the mutex.
Signaling the Condition
Follow these steps in the code path to signal the condition:
Acquire the mutex guarding the condition.
Set the condition.
Signal the blocked thread with cv_broadcast(9F).
Release the mutex.
The following example uses a busy flag along with mutex and condition variables to force the read(9E) routine to wait until the device is no longer busy before starting a transfer.
3.2.2. cv_wait and cv_timedwait Functions
If a thread is blocked on a condition with cv_wait(9F) and that condition does not occur, the thread would wait forever. To avoid that situation,
use cv_timedwait(9F), which
depends upon another thread to perform a wakeup.
cv_timedwait takes an absolute wait time as an argument.
-1 if the time is reached and the event has not occurred.
cv_timedwait returns a positive value
if the condition is met.
cv_timedwait(9F) requires an absolute wait time expressed in clock ticks since the system was last rebooted. The wait time can be determined by retrieving the current value with ddi_get_lbolt(9F). The driver usually has a maximum number of seconds or microseconds to wait, so this value is converted to clock ticks with drv_usectohz(9F) and added to the value from ddi_get_lbolt(9F).
The following example shows how to use cv_timedwait(9F) to wait up to five seconds to access the device before returning
EIO to the
Although device driver writers generally prefer to use cv_timedwait(9F) over cv_wait(9F), sometimes cv_wait(9F) is a better choice. For example, cv_wait(9F) is better if a driver is waiting on the following conditions:
Internal driver state changes, where such a state change might require some command to be executed, or a set amount of time to pass
Something the driver needs to single-thread
Some situation that is already managing a possible timeout, as when “A” depends on “B,” and “B” is using cv_timedwait(9F)
3.2.3. cv_wait_sig Function
A driver might be waiting for a condition that cannot occur or will not happen for a long time. In such cases, the user can send a signal to abort the thread. Depending on the driver design, the signal might not cause the driver to wake up.
cv_wait_sig(9F) allows a signal to unblock the thread. This capability enables the user to break out of potentially long waits by sending a signal to the thread with kill(1) or by typing the interrupt character. cv_wait_sig(9F) returns zero if it is returning because of a signal, or nonzero if the condition occurred. However, see Threads Unable to Receive Signals for cases in which signals might not be received.
The following example shows how to use cv_wait_sig(9F) to allow a signal to unblock the thread.
3.2.4. cv_timedwait_sig Function
cv_timedwait_sig(9F) is similar to cv_timedwait(9F) and cv_wait_sig(9F), except that
the condition being signaled after a timeout has been reached, or
0 if a signal (for example, kill(2)) is sent to the thread.
For both cv_timedwait(9F) and cv_timedwait_sig(9F), time is measured in absolute clock ticks since the last system reboot.
3.3. Choosing a Locking Scheme
The locking scheme for most device drivers should be kept straightforward. Using additional locks allows more concurrency but increases overhead. Using fewer locks is less time consuming but allows less concurrency. Generally, use one mutex per data structure, a condition variable for each event or condition the driver must wait for, and a mutex for each major set of data global to the driver. Avoid holding mutexes for long periods of time. Use the following guidelines when choosing a locking scheme:
Use the multithreading semantics of the entry point to your advantage.
Make all entry points re-entrant. You can reduce the amount of shared data by changing a static variable to automatic.
If your driver acquires multiple mutexes, acquire and release the mutexes in the same order in all code paths.
Hold and release locks within the same functional space.
Avoid holding driver mutexes when calling DDI interfaces that can block, for example, kmem_alloc(9F) with
To look at lock usage, use lockstat(1M). lockstat(1M) monitors all kernel lock events, gathers frequency and timing data about the events, and displays the data.
See the Multithreaded Programming Guide for more details on multithreaded operations.
3.3.1. Potential Locking Pitfalls
Mutexes are not re-entrant by the same thread. If you already own the mutex, attempting to claim this mutex a second time leads to the following panic:
panic: recursive mutex_enter. mutex %x caller %x
Releasing a mutex that the current thread does not hold causes this panic:
panic: mutex_adaptive_exit: mutex not held by thread
The following panic occurs only on uniprocessors:
panic: lock_set: lock held and only one CPU
lock_set panic indicates that a spin mutex is held and will spin forever, because no other
CPU can release this mutex. This situation can happen if the driver forgets to release
the mutex on one code path or becomes blocked while holding the mutex.
A common cause of the
lock_set panic occurs when a device with a high-level interrupt calls a routine that blocks,
such as cv_wait(9F). Another
typical cause is a high-level handler grabbing an adaptive mutex by calling mutex_enter(9F).
3.3.2. Threads Unable to Receive Signals
cv_timedwait_sig functions can be awakened when the thread receives a signal. A problem can arise
because some threads are unable to receive signals. For example, when close(9E) is called as a result of the application calling close(2), signals can be received.
However, when close(9E) is called from within the exit(2) processing that closes all open file descriptors, the thread cannot receive signals.
When the thread cannot receive signals,
sema_p_sig behaves as
cv_wait_sig behaves as
cv_timedwait_sig behaves as
Use caution to avoid sleeping forever on events that might never occur. Events that
never occur create unkillable (
defunct) threads and make the device unusable until the system is rebooted. Signals cannot
be received by defunct processes.
To detect whether the current thread is able to receive a signal, use the ddi_can_receive_sig(9F) function. If the
B_TRUE, then the above functions can wake up on a received signal. If the
B_FALSE, then the above functions cannot wake up on a received signal. If the
B_FALSE, then the driver should use an alternate means, such as the timeout(9F) function, to reawaken.
One important case where this problem occurs is with serial ports. If the remote system asserts flow control and the close(9E) function blocks while attempting to drain the output data, a port can be stuck until the flow control condition is resolved or the system is rebooted. Such drivers should detect this case and set up a timer to abort the drain operation when the flow control condition persists for an excessive period of time.
This issue also affects the qwait_sig(9F) function, which is described in Chapter 7, STREAMS Framework – Kernel Level, in STREAMS Programming Guide.