This chapter describes the locking primitives and thread synchronization mechanisms of the illumos multithreaded kernel. You should design device drivers to take advantage of multithreading. This chapter provides information on the following subjects:

3.1. Locking Primitives

In traditional UNIX systems, every section of kernel code terminates either through an explicit call to sleep(1) to give up the processor or through a hardware interrupt. illumos operates differently. A kernel thread can be preempted at any time to run another thread. Because all kernel threads share kernel address space and often need to read and modify the same data, the kernel provides a number of locking primitives to prevent threads from corrupting shared data. These mechanisms include mutual exclusion locks, which are also known as mutexes, readers/writer locks, and semaphores.

3.1.1. Storage Classes of Driver Data

The storage class of data is a guide to whether the driver might need to take explicit steps to control access to the data. The three data storage classes are:

3.1.2. Mutual-Exclusion Locks

A mutual-exclusion lock, or mutex, is usually associated with a set of data and regulates access to that data. Mutexes provide a way to allow only one thread at a time access to that data. The mutex functions are:


Releases any associated storage.


Acquires a mutex.


Releases a mutex.


Initializes a mutex.


Tests to determine whether the mutex is held by the current thread. To be used in ASSERT(9F) only.


Acquires a mutex if available, but does not block.

Setting Up Mutexes

Device drivers usually allocate a mutex for each driver data structure. The mutex is typically a field in the structure of type kmutex_t. mutex_init(9F) is called to prepare the mutex for use. This call is usually made at attach(9E) time for per-device mutexes and _init(9E) time for global driver mutexes.

For example,

struct xxstate *xsp;
/* ... */
mutex_init(&xsp->mu, NULL, MUTEX_DRIVER, NULL);
/* ... */

For a more complete example of mutex initialization, see Driver Autoconfiguration.

The driver must destroy the mutex with mutex_destroy(9F) before being unloaded. Destroying the mutex is usually done at detach(9E) time for per-device mutexes and _fini(9E) time for global driver mutexes.

Using Mutexes

Every section of the driver code that needs to read or write the shared data structure must do the following tasks:

The scope of a mutex, that is, the data the mutex protects, is entirely up to the programmer. A mutex protects a data structure only if every code path that accesses the data structure does so while holding the mutex.

3.1.3. Readers/Writer Locks

A readers/writer lock regulates access to a set of data. The readers/writer lock is so called because many threads can hold the lock simultaneously for reading, but only one thread can hold the lock for writing.

Most device drivers do not use readers/writer locks. These locks are slower than mutexes. The locks provide a performance gain only when they protect commonly read data that is not frequently written. In this case, contention for a mutex could become a bottleneck, so using a readers/writer lock might be more efficient. The readers/writer functions are summarized in the following table. See the rwlock(9F) man page for detailed information. The readers/writer lock functions are:


Destroys a readers/writer lock


Downgrades a readers/writer lock holder from writer to reader


Acquires a readers/writer lock


Releases a readers/writer lock


Initializes a readers/writer lock


Determines whether a readers/writer lock is held for read or write


Attempts to acquire a readers/writer lock without waiting


Attempts to upgrade readers/writer lock holder from reader to writer

3.1.4. Semaphores

Counting semaphores are available as an alternative primitive for managing threads within device drivers. See the semaphore(9F) man page for more information. The semaphore functions are:


Destroys a semaphore.


Initialize a semaphore.


Decrement semaphore and possibly block.


Decrement semaphore but do not block if signal is pending. See Threads Unable to Receive Signals.


Attempt to decrement semaphore, but do not block.


Increment semaphore and possibly unblock waiter.

3.2. Thread Synchronization

In addition to protecting shared data, drivers often need to synchronize execution among multiple threads.

3.2.1. Condition Variables in Thread Synchronization

Condition variables are a standard form of thread synchronization. They are designed to be used with mutexes. The associated mutex is used to ensure that a condition can be checked atomically, and that the thread can block on the associated condition variable without missing either a change to the condition or a signal that the condition has changed.

The condvar(9F) functions are:


Signals all threads waiting on the condition variable.


Destroys a condition variable.


Initializes a condition variable.


Signals one thread waiting on the condition variable.


Waits for condition, time-out, or signal. See Threads Unable to Receive Signals.


Waits for condition or time-out.


Waits for condition.


Waits for condition or return zero on receipt of a signal. See Threads Unable to Receive Signals.

Initializing Condition Variables

Declare a condition variable of type kcondvar_t for each condition. Usually, the condition variables are declared in the driver's soft-state structure. Use cv_init(9F) to initialize each condition variable. Similar to mutexes, condition variables are usually initialized at attach(9E) time. A typical example of initializing a condition variable is:

cv_init(&xsp->cv, NULL, CV_DRIVER, NULL);

For a more complete example of condition variable initialization, see Driver Autoconfiguration.

Waiting for the Condition

To use condition variables, follow these steps in the code path waiting for the condition:

  1. Acquire the mutex guarding the condition.

  2. Test the condition.

  3. If the test results do not allow the thread to continue, use cv_wait(9F) to block the current thread on the condition. The cv_wait(9F) function releases the mutex before blocking the thread and reacquires the mutex before returning. On return from cv_wait(9F), repeat the test.

  4. After the test allows the thread to continue, set the condition to its new value. For example, set a device flag to busy.

  5. Release the mutex.

Signaling the Condition

Follow these steps in the code path to signal the condition:

  1. Acquire the mutex guarding the condition.

  2. Set the condition.

  3. Signal the blocked thread with cv_broadcast(9F).

  4. Release the mutex.

The following example uses a busy flag along with mutex and condition variables to force the read(9E) routine to wait until the device is no longer busy before starting a transfer.

static int
xxread(dev_t dev, struct uio *uiop, cred_t *credp)
        struct xxstate *xsp;
        /* ... */
        while (xsp->busy)
                cv_wait(&xsp->cv, &xsp->mu);
        xsp->busy = 1;
        /* perform the data access */

static uint_t
xxintr(caddr_t arg)
        struct xxstate *xsp = (struct xxstate *)arg;
        xsp->busy = 0;
Using Mutexes and Condition Variables

3.2.2. cv_wait and cv_timedwait Functions

If a thread is blocked on a condition with cv_wait(9F) and that condition does not occur, the thread would wait forever. To avoid that situation, use cv_timedwait(9F), which depends upon another thread to perform a wakeup. cv_timedwait takes an absolute wait time as an argument. cv_timedwait returns -1 if the time is reached and the event has not occurred. cv_timedwait returns a positive value if the condition is met.

cv_timedwait(9F) requires an absolute wait time expressed in clock ticks since the system was last rebooted. The wait time can be determined by retrieving the current value with ddi_get_lbolt(9F). The driver usually has a maximum number of seconds or microseconds to wait, so this value is converted to clock ticks with drv_usectohz(9F) and added to the value from ddi_get_lbolt(9F).

The following example shows how to use cv_timedwait(9F) to wait up to five seconds to access the device before returning EIO to the caller.

clock_t            cur_ticks, to;
while (xsp->busy) {
        cur_ticks = ddi_get_lbolt();
        to = cur_ticks + drv_usectohz(5000000); /* 5 seconds from now */
        if (cv_timedwait(&xsp->cv, &xsp->mu, to) == -1) {
                 * The timeout time 'to' was reached without the
                 * condition being signaled.
                /* tidy up and exit */
                return (EIO);
xsp->busy = 1;
Using cv_timedwait

Although device driver writers generally prefer to use cv_timedwait(9F) over cv_wait(9F), sometimes cv_wait(9F) is a better choice. For example, cv_wait(9F) is better if a driver is waiting on the following conditions:

3.2.3. cv_wait_sig Function

A driver might be waiting for a condition that cannot occur or will not happen for a long time. In such cases, the user can send a signal to abort the thread. Depending on the driver design, the signal might not cause the driver to wake up.

cv_wait_sig(9F) allows a signal to unblock the thread. This capability enables the user to break out of potentially long waits by sending a signal to the thread with kill(1) or by typing the interrupt character. cv_wait_sig(9F) returns zero if it is returning because of a signal, or nonzero if the condition occurred. However, see Threads Unable to Receive Signals for cases in which signals might not be received.

The following example shows how to use cv_wait_sig(9F) to allow a signal to unblock the thread.

while (xsp->busy) {
        if (cv_wait_sig(&xsp->cv, &xsp->mu) == 0) {
        /* Signaled while waiting for the condition */
                /* tidy up and exit */
                return (EINTR);
xsp->busy = 1;
Using cv_wait_sig

3.2.4. cv_timedwait_sig Function

cv_timedwait_sig(9F) is similar to cv_timedwait(9F) and cv_wait_sig(9F), except that cv_timedwait_sig returns -1 without the condition being signaled after a timeout has been reached, or 0 if a signal (for example, kill(2)) is sent to the thread.

For both cv_timedwait(9F) and cv_timedwait_sig(9F), time is measured in absolute clock ticks since the last system reboot.

3.3. Choosing a Locking Scheme

The locking scheme for most device drivers should be kept straightforward. Using additional locks allows more concurrency but increases overhead. Using fewer locks is less time consuming but allows less concurrency. Generally, use one mutex per data structure, a condition variable for each event or condition the driver must wait for, and a mutex for each major set of data global to the driver. Avoid holding mutexes for long periods of time. Use the following guidelines when choosing a locking scheme:

To look at lock usage, use lockstat(1M). lockstat(1M) monitors all kernel lock events, gathers frequency and timing data about the events, and displays the data.

See the Multithreaded Programming Guide for more details on multithreaded operations.

3.3.1. Potential Locking Pitfalls

Mutexes are not re-entrant by the same thread. If you already own the mutex, attempting to claim this mutex a second time leads to the following panic:

panic: recursive mutex_enter. mutex %x caller %x

Releasing a mutex that the current thread does not hold causes this panic:

panic: mutex_adaptive_exit: mutex not held by thread

The following panic occurs only on uniprocessors:

panic: lock_set: lock held and only one CPU

The lock_set panic indicates that a spin mutex is held and will spin forever, because no other CPU can release this mutex. This situation can happen if the driver forgets to release the mutex on one code path or becomes blocked while holding the mutex.

A common cause of the lock_set panic occurs when a device with a high-level interrupt calls a routine that blocks, such as cv_wait(9F). Another typical cause is a high-level handler grabbing an adaptive mutex by calling mutex_enter(9F).

3.3.2. Threads Unable to Receive Signals

The sema_p_sig, cv_wait_sig, and cv_timedwait_sig functions can be awakened when the thread receives a signal. A problem can arise because some threads are unable to receive signals. For example, when close(9E) is called as a result of the application calling close(2), signals can be received. However, when close(9E) is called from within the exit(2) processing that closes all open file descriptors, the thread cannot receive signals. When the thread cannot receive signals, sema_p_sig behaves as sema_p, cv_wait_sig behaves as cv_wait, and cv_timedwait_sig behaves as cv_timedwait.

Use caution to avoid sleeping forever on events that might never occur. Events that never occur create unkillable (defunct) threads and make the device unusable until the system is rebooted. Signals cannot be received by defunct processes.

To detect whether the current thread is able to receive a signal, use the ddi_can_receive_sig(9F) function. If the ddi_can_receive_sigfunction returns B_TRUE, then the above functions can wake up on a received signal. If the ddi_can_receive_sigfunction returns B_FALSE, then the above functions cannot wake up on a received signal. If the ddi_can_receive_sigfunction returns B_FALSE, then the driver should use an alternate means, such as the timeout(9F) function, to reawaken.

One important case where this problem occurs is with serial ports. If the remote system asserts flow control and the close(9E) function blocks while attempting to drain the output data, a port can be stuck until the flow control condition is resolved or the system is rebooted. Such drivers should detect this case and set up a timer to abort the drain operation when the flow control condition persists for an excessive period of time.

This issue also affects the qwait_sig(9F) function, which is described in Chapter 7, STREAMS Framework – Kernel Level, in STREAMS Programming Guide.