Managing Events and Queueing Tasks
5.1. Managing Events
A system often needs to respond to a condition change such as a user action or system request. For example, a device might issue a warning when a component begins to overheat, or might start a movie player when a DVD is inserted into a drive. Device drivers can use a special message called an event to inform the system that a change in state has taken place.
5.1.1. Introduction to Events
An event is a message that a device driver sends
to interested entities to indicate that a change of state has taken place.
Events are implemented in illumos as user-defined, name-value pair
structures that are managed using the nvlist*
functions.
(See the nvlist_alloc(9F) man page.) Events are organized by vendor, class, and
subclass. For example, you could define a class for monitoring environmental
conditions. An environmental class could have subclasses to indicate changes
in temperature, fan status, and power.
When a change in state occurs, the device notifies the driver. The driver
then uses the ddi_log_sysevent(9F) function to log this event in a queue called sysevent
. The sysevent
queue passes events to the user
level for handling by either the syseventd
daemon or syseventconfd
daemon. These daemons send notifications to any applications
that have subscribed for notification of the specified event.
Two methods for designers of user-level applications deal with events:
-
An application can use the routines in libsysevent(3LIB) to subscribe with the
syseventd
daemon for notification when a specific event occurs. -
A developer can write a separate user-level application to respond to an event. This type of application needs to be registered with syseventadm(1M). When
syseventconfd
encounters the specified event, the application is run and deals with the event accordingly.
This process is illustrated in the following figure.

5.1.2. Using ddi_log_sysevent to Log Events
Device drivers use the ddi_log_sysevent(9F) interface to generate and log events with the system.
ddi_log_sysevent Syntax
ddi_log_sysevent
uses the following syntax:
int ddi_log_sysevent(dev_info_t *dip, char *vendor, char *class,
char *subclass, nvlist_t *attr-list, sysevent_id_t *eidp, int sleep-flag);
where:
- dip
-
A pointer to the dev_info node for this driver.
- vendor
-
A pointer to a string that defines the driver's vendor. Third-party drivers should use their company's stock symbol or a similarly enduring identifier. Sun-supplied drivers use
DDI_VENDOR_SUNW
. - class
-
A pointer to a string defining the event's class.
class
is a driver-specific value. An example of a class might be a string that represents a set of environmental conditions that affect a device. This value must be understood by the event consumer. - subclass
-
A driver-specific string that represents a subset of the
class
argument. For example, within a class that represents environmental conditions, an event subclass might refer to the device's temperature. This value must be intelligible to the event consumer. - attr-list
-
A pointer to an
nvlist_t
structure that lists name-value attributes associated with the event. Name-value attributes are driver-defined and can refer to a specific attribute or condition of the device.For example, consider a device that reads both CD-ROMs and DVDs. That device could have an attribute with the name
disc_type
and the value equal to eithercd_rom
ordvd
.As with
class
andsubclass
, an event consumer must be able to interpret the name-value pairs.For more information on name-value pairs and the
nvlist_t
structure, see Defining Event Attributes, as well as the nvlist_alloc(9F) man page.If the event has no attributes, then this argument should be set to
NULL
. - eidp
-
The address of a
sysevent_id_t
structure. Thesysevent_id_t
structure is used to provide a unique identification for the event. ddi_log_sysevent(9F) returns this structure with a system-provided event sequence number and time stamp. See the ddi_log_sysevent(9F) man page for more information on thesysevent_id_t
structure. - sleep-flag
-
A flag that indicates how the caller wants to handle the possibility of resources not being available. If sleep-flag is set to
DDI_SLEEP
, the driver blocks until the resources become available. WithDDI_NOSLEEP
, an allocation will not sleep and cannot be guaranteed to succeed. IfDDI_ENOMEM
is returned, the driver would need to retry the operation at a later time.Even with
DDI_SLEEP
, other error returns are possible with this interface, such as system busy, thesyseventd
daemon not responding, or trying to log an event in interrupt context.
Sample Code for Logging Events
A device driver performs the following tasks to log events:
-
Allocate memory for the attribute list using nvlist_alloc(9F)
-
Add name-value pairs to the attribute list
-
Use the ddi_log_sysevent(9F) function to log the event in the
sysevent
queue -
Call nvlist_free(9F) when the attribute list is no longer needed
The following example demonstrates how to use ddi_log_sysevent
.
char *vendor_name = "DDI_VENDOR_JGJG"
char *my_class = "JGJG_event";
char *my_subclass = "JGJG_alert";
nvlist_t *nvl;
/* ... */
nvlist_alloc(&nvl, nvflag, kmflag);
/* ... */
(void) nvlist_add_byte_array(nvl, propname, (uchar_t *)propval, proplen + 1);
/* ... */
if (ddi_log_sysevent(dip, vendor_name, my_class,
my_subclass, nvl, NULL, DDI_SLEEP)!= DDI_SUCCESS)
cmn_err(CE_WARN, "error logging system event");
nvlist_free(nvl);
5.1.3. Defining Event Attributes
Event attributes are defined as a list of name-value pairs. The illumos
DDI provides routines and structures for storing information in name-value
pairs. Name-value pairs are retained in an nvlist_t
structure,
which is opaque to the driver. The value for a name-value pair can be a Boolean,
an int
, a byte, a string, an nvlist
,
or an array of these data types. An int
can be defined
as 16 bits, 32 bits, or 64 bits and can be signed or unsigned.
The steps in creating a list of name-value pairs are as follows.
-
Create an
nvlist_t
structure with nvlist_alloc(9F).The
nvlist_alloc
interface takes three arguments:-
nvlp – Pointer to a pointer to an
nvlist_t
structure -
nvflag – Flag to indicate the uniqueness of the names of the pairs. If this flag is set to
NV_UNIQUE_NAME_TYPE
, any existing pair that matches the name and type of a new pair is removed from the list. If the flag is set toNV_UNIQUE_NAME
, then any existing pair with a duplicate name is removed, regardless of its type. SpecifyingNV_UNIQUE_NAME_TYPE
allows a list to contain two or more pairs with the same name as long as their types are different, whereas withNV_UNIQUE_NAME
only one instance of a pair name can be in the list. If the flag is not set, then no uniqueness checking is done and the consumer of the list is responsible for dealing with duplicates. -
kmflag – Flag to indicate the allocation policy for kernel memory. If this argument is set to
KM_SLEEP
, then the driver blocks until the requested memory is available for allocation.KM_SLEEP
allocations might sleep but are guaranteed to succeed.KM_NOSLEEP
allocations are guaranteed not to sleep but might returnNULL
if no memory is currently available.
-
-
Populate the
nvlist
with name-value pairs. For example, to add a string, use nvlist_add_string(9F). To add an array of 32-bit integers, use nvlist_add_int32_array(9F). The nvlist_add_boolean(9F) man page contains a complete list of interfaces for adding pairs.
To deallocate a list, use nvlist_free(9F).
The following code sample illustrates the creation of a name-value list.
nvlist_t*
create_nvlist()
{
int err;
char *str = "child";
int32_t ints[] = {0, 1, 2};
nvlist_t *nvl;
err = nvlist_alloc(&nvl, NV_UNIQUE_NAME, 0); /* allocate list */
if (err)
return (NULL);
if ((nvlist_add_string(nvl, "name", str) != 0) ||
(nvlist_add_int32_array(nvl, "prop", ints, 3) != 0)) {
nvlist_free(nvl);
return (NULL);
}
return (nvl);
}
Drivers can retrieve the elements of an nvlist
by
using a lookup function for that type, such as nvlist_lookup_int32_array(9F), which takes
as an argument the name of the pair to be searched for.
These interfaces work only if either NV_UNIQUE_NAME
or NV_UNIQUE_NAME_TYPE
is specified when nvlist_alloc(9F) is called. Otherwise, ENOTSUP
is returned, because the list cannot contain multiple pairs with the same
name.
A list of name-value list pairs can be placed in contiguous memory. This approach is useful for passing the list to an entity that has subscribed for notification. The first step is to get the size of the memory block that is needed for the list with nvlist_size(9F). The next step is to pack the list into the buffer with nvlist_pack(9F). The consumer receiving the buffer's content can unpack the buffer with nvlist_unpack(9F).
The functions for manipulating name-value pairs are available to both user-level and kernel-level developers. You can find identical man pages for these functions in both manual pages section 3F: Library Interfaces and Headers and in man pages section 9: DDI and DKI Kernel Functions. For a list of functions that operate on name-value pairs, see the following table.
Man Page |
Purpose / Functions |
---|---|
Add name-value pairs to the list. Functions include:
|
|
Manipulate the name-value list buffer. Functions include:
|
|
Search for name-value pairs. Functions include:
|
|
Get name-value pair data. Functions include:
|
|
Remove name-value pairs. Functions include:
|
5.2. Queueing Tasks
This section discusses how to use task queues to postpone processing of some tasks and delegate their execution to another kernel thread.
5.2.1. Introduction to Task Queues
A common operation in kernel programming is to schedule a task to be performed at a later time, by a different thread. The following examples give some reasons that you might want a different thread to perform a task at a later time:
-
Your current code path is time critical. The additional task you want to perform is not time critical.
-
The additional task might require grabbing a lock that another thread is currently holding.
-
You cannot block in your current context. The additional task might need to block, for example to wait for memory.
-
A condition is preventing your code path from completing, but your current code path cannot sleep or fail. You need to queue the current task to execute after the condition disappears.
-
You need to launch multiple tasks in parallel.
In each of these cases, a task is executed in a different context. A different context is usually a different kernel thread with a different set of locks held and possibly a different priority. Task queues provide a generic kernel API for scheduling asynchronous tasks.
A task queue is a list of tasks with one or more threads to service the list. If a task queue has a single service thread, all tasks are guaranteed to execute in the order in which they are added to the list. If a task queue has more than one service thread, the order in which the tasks will execute is not known.
If the task queue has more than one service thread, make sure that the execution of one task does not depend on the execution of any other task. Dependencies between tasks can cause a deadlock to occur.
5.2.2. Task Queue Interfaces
The following DDI interfaces manage task queues. These interfaces are defined in the sys/sunddi.h header file. See the taskq(9F) man page for more information about these interfaces.
|
Opaque handle |
|
System default priority |
|
Can block for memory |
|
Cannot block for memory |
|
Create a task queue |
|
Destroy a task queue |
|
Add a task to a task queue |
|
Wait for pending tasks to complete |
|
Suspend a task queue |
|
Check whether a task queue is suspended |
|
Resume a suspended task queue |
5.2.3. Using Task Queues
The typical usage in drivers is to create task queues at attach(9E). Most taskq_dispatch
invocations are from interrupt context.
To study task queues used in illumos drivers, go to http://www.opensolaris.org/os/. In the left margin
menu, click Source Browser. In the Symbol field of the search area, enter ddi_taskq_create
. In the Project list, select onnv
.
Click the Search button. In your search results you should see the USB generic
serial driver (usbser.c), the 1394 mass storage HBA FireWire
driver (scsa1394/hba.c), and the SCSI HBA driver for
Dell PERC 3DC/4SC/4DC/4Di RAID devices (amr.c).
Click the file name amr.c. The ddi_taskq_create
function is called in the amr_attach
entry
point. The ddi_taskq_destroy
function is called in the amr_detach
entry point and also in the error handling section of
the amr_attach
entry point. The ddi_taskq_dispatch
function is called in the amr_done
function,
which is called in the amr_intr
function. The amr_intr
function is an interrupt-handling function that is an argument
to the ddi_add_intr(9F) function in the amr_attach
entry
point.
5.2.4. Observing Task Queues
This section describes two techniques that you can use to monitor the system resources that are consumed by a task queue. Task queues export statistics on the use of system time by task queue threads. Task queues also use DTrace SDT probes to determine when a task queue starts and finishes execution of a task.
Task Queue Kernel Statistics Counters
Every task queue has an associated set of kstat
counters.
Examine the output of the following kstat(1M) command:
$ kstat -c taskq module: unix instance: 0 name: ata_nexus_enum_tq class: taskq crtime 53.877907833 executed 0 maxtasks 0 nactive 1 nalloc 0 priority 60 snaptime 258059.249256749 tasks 0 threads 1 totaltime 0 module: unix instance: 0 name: callout_taskq class: taskq crtime 0 executed 13956358 maxtasks 4 nactive 4 nalloc 0 priority 99 snaptime 258059.24981709 tasks 13956358 threads 2 totaltime 120247890619
The kstat
output shown above includes the following information:
-
The name of the task queue and its instance number
-
The number of scheduled (
tasks
) and executed (executed
) tasks -
The number of kernel threads processing the task queue (
threads
) and their priority (priority
) -
The total time (in nanoseconds) spent processing all the tasks (
totaltime
)
The following example shows how you can use the kstat
command
to observe how a counter (number of scheduled tasks) increases over time:
$ kstat -p unix:0:callout_taskq:tasks 1 5 unix:0:callout_taskq:tasks 13994642 unix:0:callout_taskq:tasks 13994711 unix:0:callout_taskq:tasks 13994784 unix:0:callout_taskq:tasks 13994855 unix:0:callout_taskq:tasks 13994926
Task Queue DTrace SDT Probes
Task queues provide several useful SDT probes. All the probes described in this section have the following two arguments:
-
The task queue pointer returned by
ddi_taskq_create
-
The pointer to the
taskq_ent_t
structure. Use this pointer in your D script to extract the function and the argument.
You can use these probes to collect precise timing information about individual task queues and individual tasks being executed through them. For example, the following script prints the functions that were scheduled through task queues for every 10 seconds:
#!/usr/sbin/dtrace -qs sdt:genunix::taskq-enqueue { this->tq = (taskq_t *)arg0; this->tqe = (taskq_ent_t *) arg1; @[this->tq->tq_name, this->tq->tq_instance, this->tqe->tqent_func] = count(); } tick-10s { printa ("%s(%d): %a called %@d times\n", @); trunc(@); }
On a particular machine, the above D script produced the following output:
callout_taskq(1): genunix`callout_execute called 51 times callout_taskq(0): genunix`callout_execute called 701 times kmem_taskq(0): genunix`kmem_update_timeout called 1 times kmem_taskq(0): genunix`kmem_hash_rescale called 4 times callout_taskq(1): genunix`callout_execute called 40 times USB_hid_81_pipehndl_tq_1(14): usba`hcdi_cb_thread called 256 times callout_taskq(0): genunix`callout_execute called 702 times kmem_taskq(0): genunix`kmem_update_timeout called 1 times kmem_taskq(0): genunix`kmem_hash_rescale called 4 times callout_taskq(1): genunix`callout_execute called 28 times USB_hid_81_pipehndl_tq_1(14): usba`hcdi_cb_thread called 228 times callout_taskq(0): genunix`callout_execute called 706 times callout_taskq(1): genunix`callout_execute called 24 times USB_hid_81_pipehndl_tq_1(14): usba`hcdi_cb_thread called 141 times callout_taskq(0): genunix`callout_execute called 708 times