Recommended Coding Practices
This chapter describes how to write drivers that are robust. Drivers that are written in accordance with the guidelines that are discussed in this chapter are easier to debug. The recommended practices also protect the system from hardware and software faults.
This chapter provides information on the following subjects:
23.1. Debugging Preparation Techniques
Driver code is more difficult to debug than user programs because:
-
The driver interacts directly with the hardware
-
The driver operates without the protection of the operating system that is available to user processes
Be sure to build debugging support into your driver. This support facilitates both maintenance work and future development.
23.1.1. Use a Unique Prefix to Avoid Kernel Symbol Collisions
The name of each function, data element, and driver preprocessor definition must be unique for each driver.
A driver module is linked into the kernel. The name of each symbol unique
to a particular driver must not collide with other kernel symbols. To avoid
such collisions, each function and data element for a particular driver must
be named with a prefix common to that driver. The prefix must be sufficient
to uniquely name each driver symbol. Typically, this prefix is the name of
the driver or an abbreviation for the name of the driver. For example, xx_open
would be the name of the open(9E) routine of driver xx
.
When building a driver, a driver must necessarily include a number of system header files. The globally-visible names within these header files cannot be predicted. To avoid collisions with these names, each driver preprocessor definition must be given a unique name by using an identifying prefix.
A distinguishing driver symbol prefix also is an aid to deciphering
system logs and panics when troubleshooting. Instead of seeing an error related
to an ambiguous attach
function, you see an error message
about xx_attach
.
23.1.2. Use cmn_err to Log Driver Activity
Use the cmn_err(9F) function to print messages
to a system log from within the device driver. The cmn_err(9F) function for kernel
modules is similar to the printf(3C) function for applications. The cmn_err(9F) function
provides additional format characters, such as the %b
format
to print device register bits. The cmn_err(9F) function writes messages to a system
log. Use the tail(1) command
to monitor these messages on /var/adm/messages.
% tail -f /var/adm/messages
23.1.3. Use ASSERT to Catch Invalid Assumptions
Assertions are an extremely valuable form of active documentation. The syntax for ASSERT(9F) is as follows:
void ASSERT(EXPRESSION)
The ASSERT
macro halts the execution of the kernel
if a condition that is expected to be true is actually false.
ASSERT
provides a way for the programmer to validate the
assumptions made by a piece of code.
The ASSERT
macro is defined only when the
DEBUG
compilation symbol is defined. When
DEBUG
is not defined, the ASSERT
macro
has no effect.
The following example assertion tests the assumption that a particular pointer value
is not NULL
:
ASSERT(ptr != NULL);
If the driver has been compiled with DEBUG
, and if
the value of ptr
is NULL
at this point
in execution, then the following panic message is printed to the console:
panic: assertion failed: ptr != NULL, file: driver.c, line: 56
Because ASSERT(9F) uses the DEBUG
compilation
symbol, any conditional debugging code should also use DEBUG
.
23.1.4. Use mutex_owned to Validate and Document Locking Requirements
The syntax for mutex_owned(9F) is as follows:
int mutex_owned(kmutex_t *mp);
A significant portion of driver
development involves properly handling multiple threads. Comments should always
be used when a mutex is acquired. Comments can be even more useful when an
apparently necessary mutex is not acquired. To determine
whether a mutex is held by a thread, use mutex_owned
within ASSERT(9F):
void helper(void)
{
/* this routine should always be called with xsp's mutex held */
ASSERT(mutex_owned(&xsp->mu));
/* ... */
}
mutex_owned
is only valid within ASSERT
macros. You should use mutex_owned
to control
the behavior of a driver.
23.1.5. Use Conditional Compilation to Toggle Costly Debugging Features
You can insert code for debugging into a driver through conditional
compiles by using a preprocessor symbol such as DEBUG
or
by using a global variable. With conditional compilation, unnecessary code
can be removed in the production driver. Use a variable to set the amount
of debugging output at runtime. The output can be specified by setting a debugging
level at runtime with an ioctl
or through a debugger. Commonly,
these two methods are combined.
The following example relies on the compiler to remove unreachable code,
in this case, the code following the always-false test of zero. The example
also provides a local variable that can be set in /etc/system
or
patched by a debugger.
#ifdef DEBUG
/* comments on values of xxdebug and what they do */
static int xxdebug;
#define dcmn_err if (xxdebug) cmn_err
#else
#define dcmn_err if (0) cmn_err
#endif
/* ... */
dcmn_err(CE_NOTE, "Error!\n");
This method handles the fact that cmn_err(9F) has a variable number of
arguments. Another method relies on the fact that the macro has one argument,
a parenthesized argument list for cmn_err(9F). The macro removes this argument.
This macro also removes the reliance on the optimizer by expanding the macro
to nothing if DEBUG
is not defined.
#ifdef DEBUG
/* comments on values of xxdebug and what they do */
static int xxdebug;
#define dcmn_err(X) if (xxdebug) cmn_err X
#else
#define dcmn_err(X) /* nothing */
#endif
/* ... */
/* Note:double parentheses are required when using dcmn_err. */
dcmn_err((CE_NOTE, "Error!"));
You can extend this technique in many ways. One way is to specify different
messages from cmn_err(9F),
depending on the value of xxdebug
. However, in such a case,
you must be careful not to obscure the code with too much debugging information.
Another common scheme is to write an xxlog
function,
which uses vsprintf(9F) or vcmn_err(9F) to handle
variable argument lists.
23.2. Declaring a Variable Volatile
volatile
is a keyword that must be applied when declaring
any variable that will reference a device register. Without the use of volatile
, the compile-time optimizer can inadvertently delete important
accesses. Neglecting to use volatile
might result in bugs
that are difficult to track down.
The correct use of volatile
is necessary to
prevent elusive bugs. The volatile
keyword instructs the
compiler to use exact semantics for the declared objects, in particular, not
to remove or reorder accesses to the object. Two instances where device drivers
must use the volatile
qualifier are:
-
When data refers to an external hardware device register, that is, memory that has side effects other than just storage. Note, however, that if the DDI data access functions are used to access device registers, you do not have to use
volatile
. -
When data refers to global memory that is accessible by more than one thread, that is not protected by locks, and that relies on the sequencing of memory accesses. Using
volatile
consumes fewer resources than using lock.
The following example uses volatile
. A busy flag
is used to prevent a thread from continuing while the device is busy and the
flag is not protected by a lock:
while (busy) {
/* do something else */
}
The testing thread will continue when another thread turns off the busy
flag:
busy = 0;
Because busy
is accessed frequently in the testing
thread, the compiler can potentially optimize the test by placing the value
of busy
in a register and test the contents of the register
without reading the value of busy
in memory before every
test. The testing thread would never see busy
change and
the other thread would only change the value of busy
in
memory, resulting in deadlock. Declaring the busy
flag
as volatile
forces its value to be read before each test.
An alternative to the busy
flag is to use a condition variable. See Condition Variables in Thread Synchronization.
When using the volatile
qualifier, avoid the risk of accidental omission. For example, the following code
struct device_reg {
volatile uint8_t csr;
volatile uint8_t data;
};
struct device_reg *regp;
is preferable to the next example:
struct device_reg {
uint8_t csr;
uint8_t data;
};
volatile struct device_reg *regp;
Although the two examples are functionally equivalent, the second one
requires the writer to ensure that volatile
is used in
every declaration of type struct
device_reg
.
The first example results in the data being treated as volatile in all declarations
and is therefore preferred. As mentioned above, using the DDI data access
functions to access device registers makes qualifying variables as volatile
unnecessary.
23.3. Serviceability
To ensure serviceability, the driver must be enabled to take the following actions:
-
Detect faulty devices and report the fault
-
Remove a device as supported by the illumos hot-plug model
-
Add a new device as supported by the illumos hot-plug model
-
Perform periodic health checks to enable the detection of latent faults
23.3.1. Periodic Health Checks
A latent fault is one that does not show itself until some other action occurs. For example, a hardware failure occurring in a device that is a cold standby could remain undetected until a fault occurs on the master device. At this point, the system now contains two defective devices and might be unable to continue operation.
Latent faults that remain undetected typically cause system failure eventually. Without latent fault checking, the overall availability of a redundant system is jeopardized. To avoid this situation, a device driver must detect latent faults and report them in the same way as other faults.
You should provide the driver with a mechanism for making periodic health checks on the device. In a fault-tolerant situation where the device can be the secondary or failover device, early detection of a failed secondary device is essential to ensure that the secondary device can be repaired or replaced before any failure in the primary device occurs.
Periodic health checks can be used to perform the following activities:
-
Check any register or memory location on the device whose value might have been altered since the last poll.
Features of a device that typically exhibit deterministic behavior include heartbeat semaphores, device timers (for example, local
lbolt
used by download), and event counters. Reading an updated predictable value from the device gives a degree of confidence that things are proceeding satisfactorily. -
Timestamp outgoing requests such as transmit blocks or commands that are issued by the driver.
The periodic health check can look for any suspect requests that have not completed.
-
Initiate an action on the device that should be completed before the next scheduled check.
If this action is an interrupt, this check is an ideal way to ensure that the device's circuitry can deliver an interrupt.