Direct Memory Access (DMA)
Many devices can temporarily take control of the bus. These devices can perform data transfers that involve main memory and other devices. Because the device is doing the work without the help of the CPU, this type of data transfer is known as direct memory access (DMA). The following types of DMA transfers can be performed:
-
Between two devices
-
Between a device and memory
-
Between memory and memory
This chapter explains transfers between a device and memory only. The chapter provides information on the following subjects:
9.1. DMA Model
The illumos Device Driver Interface/Driver-Kernel Interface (DDI/DKI) provides a high-level, architecture-independent model for DMA. This model enables the framework, that is, the DMA routines, to hide architecture-specific details such as the following:
-
Setting up DMA mappings
-
Building scatter-gather lists
-
Ensuring that I/O and CPU caches are consistent
Several abstractions are used in the DDI/DKI to describe aspects of a DMA transaction:
-
DMA object – Memory that is the source or destination of a DMA transfer.
-
DMA handle – An opaque object returned from a successful ddi_dma_alloc_handle(9F) call. The DMA handle can be used in subsequent DMA subroutine calls to refer to such DMA objects.
-
DMA cookie – A ddi_dma_cookie(9S) structure (ddi_dma_cookie_t) describes a contiguous portion of a DMA object that is entirely addressable by the device. The cookie contains DMA addressing information that is required to program the DMA engine.
Rather than map an object directly into memory, device drivers allocate DMA resources for a memory object. The DMA routines then perform any platform-specific operations that are needed to set up the object for DMA access. The driver receives a DMA handle to identify the DMA resources that are allocated for the object. This handle is opaque to the device driver. The driver must save the handle and pass the handle in subsequent calls to DMA routines. The driver should not interpret the handle in any way.
Operations that provide the following services are defined on a DMA handle:
-
Manipulating DMA resources
-
Synchronizing DMA objects
-
Retrieving attributes of the allocated resources
9.2. Types of Device DMA
Devices perform one of the following three types of DMA:
-
Bus-master DMA
-
Third-party DMA
-
First-party DMA
9.2.1. Bus-Master DMA
The driver should program the device's DMA registers directly in cases where the device acts like a true bus master. For example, a device acts like a bus master when the DMA engine resides on the device board. The transfer address and count are obtained from the DMA cookie to be passed on to the device.
9.2.2. Third-Party DMA
Third-party DMA uses a system DMA engine resident on the main system board, which has several DMA channels that are available for use by devices. The device relies on the system's DMA engine to perform the data transfers between the device and memory. The driver uses DMA engine routines (see the ddi_dmae(9F) function) to initialize and program the DMA engine. For each DMA data transfer, the driver programs the DMA engine and then gives the device a command to initiate the transfer in cooperation with that engine.
9.2.3. First-Party DMA
Under first-party DMA, the device uses a channel from the system's DMA engine to drive that device's DMA bus cycles. Use the ddi_dmae_1stparty(9F) function to configure this channel in a cascade mode so that the DMA engine does not interfere with the transfer.
9.3. Types of Host Platform DMA
The platform on which the device operates provides either direct memory access (DMA) or direct virtual memory access (DVMA).
On platforms that support DMA, the system provides the device with a physical address in order to perform transfers. In this case, the transfer of a DMA object can actually consist of a number of physically discontiguous transfers. An example is when an application transfers a buffer that spans several contiguous virtual pages that map to physically discontiguous pages. To deal with the discontiguous memory, devices for these platforms usually have some kind of scatter-gather DMA capability. Typically, x86 systems provide physical addresses for direct memory transfers.
On platforms that support DVMA, the system provides the device with a virtual address to perform transfers. In this case, memory management units (MMU) provided by the underlying platform translate device accesses to these virtual addresses into the proper physical addresses. The device transfers to and from a contiguous virtual image that can be mapped to discontiguous physical pages. Devices that operate in these platforms do not need scatter-gather DMA capability. Typically, SPARC platforms provide virtual addresses for direct memory transfers.
9.4. DMA Software Components: Handles, Windows, and Cookies
A DMA handle is an opaque pointer that represents an object, usually a memory buffer or address. A DMA handle enables a device to perform DMA transfers. Several different calls to DMA routines use the handle to identify the DMA resources that are allocated for the object.
An object represented by a DMA handle is completely covered by one or more DMA cookies. A DMA cookie represents a contiguous piece of memory that is used in data transfers by the DMA engine. The system divides objects into multiple cookies based on the following information:
-
The
ddi_dma_attr(9S)
attribute structure provided by the driver -
Memory location of the target object
-
Alignment of the target object
If an object does not fit within the limitations of the DMA engine, that object must be broken into multiple DMA windows. You can only activate and allocate resources for one window at a time. Use the ddi_dma_getwin(9F) function to position between windows within an object. Each DMA window consists of one or more DMA cookies. For more information, see DMA Windows.
Some DMA engines can accept more than one cookie. Such engines perform scatter-gather I/O without the help of the system. If multiple cookies are returned from a bind, the driver should call ddi_dma_nextcookie(9F) repeatedly to retrieve each cookie. These cookies must then be programmed into the engine. The device can then be programmed to transfer the total number of bytes covered by the aggregate of these DMA cookies.
9.5. DMA Operations
The steps in a DMA transfer are similar among the types of DMA. The following sections present methods for performing DMA transfers.
You do not need to ensure that the DMA object is locked in memory in block drivers for buffers that come from the file system. The file system has already locked the data in memory.
9.5.1. Performing Bus-Master DMA Transfers
The driver should perform the following steps for bus-master DMA.
-
Describe the DMA attributes. This step enables the routines to ensure that the device is able to access the buffer.
-
Allocate a DMA handle.
-
Ensure that the DMA object is locked in memory. See the physio(9F) or ddi_umem_lock(9F) man page.
-
Allocate DMA resources for the object.
-
Program the DMA engine on the device.
-
Start the engine.
-
When the transfer is complete, continue the bus master operation.
-
Perform any required object synchronizations.
-
Release the DMA resources.
-
Free the DMA handle.
9.5.2. Performing First-Party DMA Transfers
The driver should perform the following steps for first-party DMA.
-
Allocate a DMA channel.
-
Use ddi_dmae_1stparty(9F) to configure the channel.
-
Ensure that the DMA object is locked in memory. See the physio(9F) or ddi_umem_lock(9F) man page.
-
Allocate DMA resources for the object.
-
Program the DMA engine on the device.
-
Start the engine.
-
When the transfer is complete, continue the bus-master operation.
-
Perform any required object synchronizations.
-
Release the DMA resources.
-
Deallocate the DMA channel.
9.5.3. Performing Third-Party DMA Transfers
The driver should perform these steps for third-party DMA.
-
Allocate a DMA channel.
-
Retrieve the system's DMA engine attributes with ddi_dmae_getattr(9F).
-
Lock the DMA object in memory. See the physio(9F) or ddi_umem_lock(9F) man page.
-
Allocate DMA resources for the object.
-
Use ddi_dmae_prog(9F) to program the system DMA engine to perform the transfer.
-
Perform any required object synchronizations.
-
Use ddi_dmae_stop(9F) to stop the DMA engine.
-
Release the DMA resources.
-
Deallocate the DMA channel.
Certain hardware platforms restrict DMA capabilities in a bus-specific way. Drivers should use ddi_slaveonly(9F) to determine whether the device is in a slot in which DMA is possible.
9.5.4. DMA Attributes
DMA attributes describe the attributes and limits of a DMA engine, which include:
-
Limits on addresses that the device can access
-
Maximum transfer count
-
Address alignment restrictions
A device driver must inform the system about any DMA engine limitations through the ddi_dma_attr(9S) structure. This action ensures that DMA resources that are allocated by the system can be accessed by the device's DMA engine. The system can impose additional restrictions on the device attributes, but the system never removes any of the driver-supplied restrictions.
ddi_dma_attr Structure
The DMA attribute structure has the following members:
typedef struct ddi_dma_attr {
uint_t dma_attr_version; /* version number */
uint64_t dma_attr_addr_lo; /* low DMA address range */
uint64_t dma_attr_addr_hi; /* high DMA address range */
uint64_t dma_attr_count_max; /* DMA counter register */
uint64_t dma_attr_align; /* DMA address alignment */
uint_t dma_attr_burstsizes; /* DMA burstsizes */
uint32_t dma_attr_minxfer; /* min effective DMA size */
uint64_t dma_attr_maxxfer; /* max DMA xfer size */
uint64_t dma_attr_seg; /* segment boundary */
int dma_attr_sgllen; /* s/g length */
uint32_t dma_attr_granular; /* granularity of device */
uint_t dma_attr_flags; /* Bus specific DMA flags */
} ddi_dma_attr_t;
where:
dma_attr_version
-
Version number of the attribute structure.
dma_attr_version
should be set to DMA_ATTR_V0. dma_attr_addr_lo
-
Lowest bus address that the DMA engine can access.
dma_attr_addr_hi
-
Highest bus address that the DMA engine can access.
dma_attr_count_max
-
Specifies the maximum transfer count that the DMA engine can handle in one cookie. The limit is expressed as the maximum count minus one. This count is used as a bit mask, so the count must also be one less than a power of two.
dma_attr_align
-
Specifies alignment requirements when allocating memory from ddi_dma_mem_alloc(9F). An example of an alignment requirement is alignment on a page boundary. The
dma_attr_align
field is used only when allocating memory. This field is ignored during bind operations. For bind operations, the driver must ensure that the buffer is aligned appropriately. dma_attr_burstsizes
-
Specifies the burst sizes that the device supports. A burst size is the amount of data the device can transfer before relinquishing the bus. This member is a binary encoding of burst sizes, which are assumed to be powers of two. For example, if the device is capable of doing 1-byte, 2-byte, 4-byte, and 16-byte bursts, this field should be set to 0x17. The system also uses this field to determine alignment restrictions.
dma_attr_minxfer
-
Minimum effective transfer size that the device can perform. This size also influences restrictions on alignment and on padding.
dma_attr_maxxfer
-
Describes the maximum number of bytes that the DMA engine can accommodate in one I/O command. This limitation is only significant if
dma_attr_maxxfer
is less than(dma_attr_count_max + 1) * dma_attr_sgllen
. dma_attr_seg
-
Upper bound of the DMA engine's address register.
dma_attr_seg
is often used where the upper 8 bits of an address register are a latch that contains a segment number. The lower 24 bits are used to address a segment. In this case,dma_attr_seg
would be set to 0xFFFFFF, which prevents the system from crossing a 24-bit segment boundary when allocating resources for the object. dma_attr_sgllen
-
Specifies the maximum number of entries in the scatter-gather list.
dma_attr_sgllen
is the number of cookies that the DMA engine can consume in one I/O request to the device. If the DMA engine has no scatter-gather list, this field should be set to 1. dma_attr_granular
-
This field gives the granularity in bytes of the DMA transfer ability of the device. An example of how this value is used is to specify the sector size of a mass storage device. When a bind operation requires a partial mapping, this field is used to ensure that the sum of the sizes of the cookies in a DMA window is a whole multiple of granularity. However, if the device does not have a scatter-gather capability, it is impossible for the DDI to ensure the granularity. For this case, the value of the
dma_attr_granular
field should be 1. dma_attr_flags
-
This field can be set to
DDI_DMA_FORCE_PHYSICAL
, which indicates that the system should return physical rather than virtual I/O addresses if the system supports both. If the system does not support physical DMA, the return value fromddi_dma_alloc_handle(9F)
isDDI_DMA_BADATTR
. In this case, the driver has to clearDDI_DMA_FORCE_PHYSICAL
and retry the operation.
SBus Example
A DMA engine on an SBus in a SPARC machine has the following attributes:
-
Access to addresses ranging from 0xFF000000 to 0xFFFFFFFF only
-
32-bit DMA counter register
-
Ability to handle byte-aligned transfers
-
Support for 1-byte, 2-byte, and 4-byte burst sizes
-
Minimum effective transfer size of 1 byte
-
32-bit address register
-
No scatter-gather list
-
Operation on sectors only, for example, a disk
A DMA engine on an SBus in a SPARC machine has the following attribute structure:
static ddi_dma_attr_t attributes = {
DMA_ATTR_V0, /* Version number */
0xFF000000, /* low address */
0xFFFFFFFF, /* high address */
0xFFFFFFFF, /* counter register max */
1, /* byte alignment */
0x7, /* burst sizes: 0x1 | 0x2 | 0x4 */
0x1, /* minimum transfer size */
0xFFFFFFFF, /* max transfer size */
0xFFFFFFFF, /* address register max */
1, /* no scatter-gather */
512, /* device operates on sectors */
0, /* attr flag: set to 0 */
};
ISA Bus Example
A DMA engine on an ISA bus in an x86 machine has the following attributes:
-
Access to the first 16 megabytes of memory only
-
Inability to cross a 1-megabyte boundary in a single DMA transfer
-
16-bit counter register
-
Ability to handle byte-aligned transfers
-
Support for 1-byte, 2-byte, and 4-byte burst sizes
-
Minimum effective transfer size of 1 byte
-
Ability to hold up to 17 scatter-gather transfers
-
Operation on sectors only, for example, a disk
A DMA engine on an ISA bus in an x86 machine has the following attribute structure:
static ddi_dma_attr_t attributes = {
DMA_ATTR_V0, /* Version number */
0x00000000, /* low address */
0x00FFFFFF, /* high address */
0xFFFF, /* counter register max */
1, /* byte alignment */
0x7, /* burst sizes */
0x1, /* minimum transfer size */
0xFFFFFFFF, /* max transfer size */
0x000FFFFF, /* address register max */
17, /* scatter-gather */
512, /* device operates on sectors */
0, /* attr flag: set to 0 */
};
9.6. Managing DMA Resources
This section describes how to manage DMA resources.
9.6.1. Object Locking
Before allocating the DMA resources for a memory object, the object must be prevented from moving. Otherwise, the system can remove the object from memory while the device is trying to write to that object. A missing object would cause the data transfer to fail and possibly corrupt the system. The process of preventing memory objects from moving during a DMA transfer is known as locking down the object.
-
Buffers coming from the file system through strategy(9E). These buffers are already locked by the file system.
-
Kernel memory allocated within the device driver, such as that allocated by ddi_dma_mem_alloc(9F).
For other objects such as buffers from user space, physio(9F) or ddi_umem_lock(9F) must be used to lock down the objects. Locking down objects with these functions is usually performed in the read(9E) or write(9E) routines of a character device driver. See Data Transfer Methods for an example.
9.6.2. Allocating a DMA Handle
A DMA handle is an opaque object
that is used as a reference to subsequently allocated DMA resources. The
DMA handle is usually allocated in the driver's attach
entry
point that uses ddi_dma_alloc_handle(9F). The ddi_dma_alloc_handle
function
takes the device information that is referred to by dip and
the device's DMA attributes described by a ddi_dma_attr(9S) structure as parameters.
The ddi_dma_alloc_handle
function has the following syntax:
int ddi_dma_alloc_handle(dev_info_t *dip,
ddi_dma_attr_t *attr, int (*callback)(caddr_t),
caddr_t arg, ddi_dma_handle_t *handlep);
where:
dip
-
Pointer to the device's
dev_info
structure. attr
-
Pointer to a ddi_dma_attr(9S) structure, as described in DMA Attributes.
callback
-
Address of the callback function for handling resource allocation failures.
arg
-
Argument to be passed to the callback function.
handlep
-
Pointer to a DMA handle to store the returned handle.
9.6.3. Allocating DMA Resources
Two interfaces allocate DMA resources:
-
ddi_dma_buf_bind_handle(9F) – Used with
buf(9S)
structures
-
ddi_dma_addr_bind_handle(9F) – Used with virtual addresses
DMA resources are usually allocated in the driver's xxstart
routine,
if an xxstart
routine exists. See Asynchronous Data Transfers (Block Drivers) for
a discussion of xxstart
. These two interfaces have the
following syntax:
int ddi_dma_addr_bind_handle(ddi_dma_handle_t handle,
struct as *as, caddr_t addr,
size_t len, uint_t flags, int (*callback)(caddr_t),
caddr_t arg, ddi_dma_cookie_t *cookiep, uint_t *ccountp);
int ddi_dma_buf_bind_handle(ddi_dma_handle_t handle,
struct buf *bp, uint_t flags,
int (*callback)(caddr_t), caddr_t arg,
ddi_dma_cookie_t *cookiep, uint_t *ccountp);
The following arguments are common to both ddi_dma_addr_bind_handle(9F) and ddi_dma_buf_bind_handle(9F):
handle
-
DMA handle and the object for allocating resources.
flags
-
Set of flags that indicate the transfer direction and other attributes.
DDI_DMA_READ
indicates a data transfer from device to memory.DDI_DMA_WRITE
indicates a data transfer from memory to device. See the ddi_dma_addr_bind_handle(9F) or ddi_dma_buf_bind_handle(9F) man page for a complete discussion of the available flags. callback
-
Address of callback function for handling resource allocation failures. See the ddi_dma_alloc_handle(9F) man page.
arg
-
Argument to pass to the callback function.
cookiep
-
Pointer to the first DMA cookie for this object.
ccountp
-
Pointer to the number of DMA cookies for this object.
For ddi_dma_addr_bind_handle(9F), the object is described by an address range with the following parameters:
as
-
Pointer to an address space structure. The value of
as
must beNULL
. addr
-
Base kernel address of the object.
len
-
Length of the object in bytes.
For ddi_dma_buf_bind_handle(9F), the object is described by a buf(9S) structure pointed to by bp
.
Device Register Structure
DMA-capable devices require more registers than were used in the previous examples.
The following fields are used in the device register structure to support DMA-capable device with no scatter-gather support:
uint32_t dma_addr; /* starting address for DMA */
uint32_t dma_size; /* amount of data to transfer */
The following fields are used in the device register structure to support DMA-capable devices with scatter-gather support:
struct sglentry {
uint32_t dma_addr;
uint32_t dma_size;
} sglist[SGLLEN];
caddr_t iopb_addr; /* When written, informs the device of the next */
/* command's parameter block address. */
/* When read after an interrupt, contains */
/* the address of the completed command. */
DMA Callback Example
In DMA Callback Example, xxstart
is used as the callback function. The per-device state
structure is used as the argument to xxstart
. The xxstart
function attempts to start the command. If the command
cannot be started because resources are not available, xxstart
is
scheduled to be called later when resources are available.
Because xxstart
is used as a DMA callback, xxstart
must adhere to the following rules, which are imposed on DMA callbacks:
-
Resources cannot be assumed to be available. The callback must try to allocate resources again.
-
The callback must indicate to the system whether allocation succeeded.
DDI_DMA_CALLBACK_RUNOUT
should be returned if the callback fails to allocate resources, in which casexxstart
needs to be called again later.DDI_DMA_CALLBACK_DONE
indicates success, so that no further callback is necessary.
static int
xxstart(caddr_t arg)
{
struct xxstate *xsp = (struct xxstate *)arg;
struct device_reg *regp;
int flags;
mutex_enter(&xsp->mu);
if (xsp->busy) {
/* transfer in progress */
mutex_exit(&xsp->mu);
return (DDI_DMA_CALLBACK_RUNOUT);
}
xsp->busy = 1;
regp = xsp->regp;
if ( /* transfer is a read */ ) {
flags = DDI_DMA_READ;
} else {
flags = DDI_DMA_WRITE;
}
mutex_exit(&xsp->mu);
if (ddi_dma_buf_bind_handle(xsp->handle,xsp->bp,flags, xxstart,
(caddr_t)xsp, &cookie, &ccount) != DDI_DMA_MAPPED) {
/* really should check all return values in a switch */
mutex_enter(&xsp->mu);
xsp->busy=0;
mutex_exit(&xsp->mu);
return (DDI_DMA_CALLBACK_RUNOUT);
}
/* Program the DMA engine. */
return (DDI_DMA_CALLBACK_DONE);
}
9.6.4. Determining Maximum Burst Sizes
Drivers
specify the DMA burst sizes that their device supports in the dma_attr_burstsizes
field of the ddi_dma_attr(9S) structure. This field is
a bitmap of the supported burst sizes. However, when DMA resources are allocated,
the system might impose further restrictions on the burst sizes that might
be actually used by the device. The ddi_dma_burstsizes(9F) routine can
be used to obtain the allowed burst sizes. This routine returns the appropriate
burst size bitmap for the device. When DMA resources are allocated, a driver
can ask the system for appropriate burst sizes to use for its DMA engine.
#define BEST_BURST_SIZE 0x20 /* 32 bytes */
if (ddi_dma_buf_bind_handle(xsp->handle,xsp->bp, flags, xxstart,
(caddr_t)xsp, &cookie, &ccount) != DDI_DMA_MAPPED) {
/* error handling */
}
burst = ddi_dma_burstsizes(xsp->handle);
/* check which bit is set and choose one burstsize to */
/* program the DMA engine */
if (burst & BEST_BURST_SIZE) {
/* program DMA engine to use this burst size */
} else {
/* other cases */
}
9.6.5. Allocating Private DMA Buffers
Some device drivers might need to allocate memory for DMA transfers in addition to performing transfers requested by user threads and the kernel. Some examples of allocating private DMA buffers are setting up shared memory for communication with the device and allocating intermediate transfer buffers. Use ddi_dma_mem_alloc(9F) to allocate memory for DMA transfers.
int ddi_dma_mem_alloc(ddi_dma_handle_t handle, size_t length,
ddi_device_acc_attr_t *accattrp, uint_t flags,
int (*waitfp)(caddr_t), caddr_t arg, caddr_t *kaddrp,
size_t *real_length, ddi_acc_handle_t *handlep);
where:
handle
-
DMA handle
length
-
Length in bytes of the desired allocation
accattrp
-
Pointer to a device access attribute structure
flags
-
Data transfer mode flags. Possible values are
DDI_DMA_CONSISTENT
andDDI_DMA_STREAMING
. waitfp
-
Address of callback function for handling resource allocation failures. See the ddi_dma_alloc_handle(9F) man page.
arg
-
Argument to pass to the callback function
kaddrp
-
Pointer on a successful return that contains the address of the allocated storage
real_length
-
Length in bytes that was allocated
handlep
-
Pointer to a data access handle
The flags
parameter should be set to DDI_DMA_CONSISTENT
if the device accesses in a nonsequential fashion. Synchronization
steps that use ddi_dma_sync(9F) should be as lightweight as possible due to frequent
application to small objects. This type of access is commonly known as consistent access. Consistent access is particularly useful for
I/O parameter blocks that are used for communication between a device and
the driver.
On the x86 platform, allocation of DMA memory that is physically contiguous has these requirements:
-
The length of the scatter-gather list
dma_attr_sgllen
in the ddi_dma_attr(9S) structure must be set to 1. -
Do not specify
DDI_DMA_PARTIAL
.DDI_DMA_PARTIAL
allows partial resource allocation.
The following example shows how to allocate IOPB memory and the necessary
DMA resources to access this memory. DMA resources must still be allocated,
and the DDI_DMA_CONSISTENT
flag must be passed to the allocation
function.
if (ddi_dma_mem_alloc(xsp->iopb_handle, size, &accattr,
DDI_DMA_CONSISTENT, DDI_DMA_SLEEP, NULL, &xsp->iopb_array,
&real_length, &xsp->acchandle) != DDI_SUCCESS) {
/* error handling */
goto failure;
}
if (ddi_dma_addr_bind_handle(xsp->iopb_handle, NULL,
xsp->iopb_array, real_length,
DDI_DMA_READ | DDI_DMA_CONSISTENT, DDI_DMA_SLEEP,
NULL, &cookie, &count) != DDI_DMA_MAPPED) {
/* error handling */
ddi_dma_mem_free(&xsp->acchandle);
goto failure;
}
The flags
parameter should be set to DDI_DMA_STREAMING
for
memory transfers that are sequential, unidirectional, block-sized, and block-aligned.
This type of access is commonly known as streaming access.
In some cases, an I/O transfer can be sped up by using an I/O cache.
I/O cache transfers one cache line at a minimum. The ddi_dma_mem_alloc(9F) routine rounds size
to
a multiple of the cache line to avoid data corruption.
The ddi_dma_mem_alloc(9F) function returns the actual size of the allocated memory object. Because of padding and alignment requirements, the actual size might be larger than the requested size. The ddi_dma_addr_bind_handle(9F) function requires the actual length.
Use the ddi_dma_mem_free(9F) function to free the memory allocated by ddi_dma_mem_alloc(9F).
Drivers must ensure that buffers are aligned appropriately. Drivers for devices that have alignment requirements on down bound DMA buffers might need to copy the data into a driver intermediate buffer that meets the requirements, and then bind that intermediate buffer to the DMA handle for DMA. Use ddi_dma_mem_alloc(9F) to allocate the driver intermediate buffer. Always use ddi_dma_mem_alloc(9F) instead of kmem_alloc(9F) to allocate memory for the device to access.
9.6.6. Handling Resource Allocation Failures
The resource-allocation routines provide the driver with several options
when handling allocation failures. The waitfp
argument
indicates whether the allocation routines block, return immediately, or schedule
a callback, as shown in the following table.
|
Indicated Action |
---|---|
|
Driver does not want to wait for resources to become available |
|
Driver is willing to wait indefinitely for resources to become available |
Other values |
The address of a function to be called when resources are likely to be available |
9.6.7. Programming the DMA Engine
When the resources have been successfully allocated, the device must be programmed. Although programming a DMA engine is device specific, all DMA engines require a starting address and a transfer count. Device drivers retrieve these two values from the DMA cookie returned by a successful call from ddi_dma_addr_bind_handle(9F), ddi_dma_buf_bind_handle(9F), or ddi_dma_getwin(9F). These functions all return the first DMA cookie and a cookie count indicating whether the DMA object consists of more than one cookie. If the cookie count N is greater than 1, ddi_dma_nextcookie(9F) must be called N-1 times to retrieve all the remaining cookies.
A DMA cookie is of type ddi_dma_cookie(9S). This type of cookie has the following fields:
uint64_t _dmac_ll; /* 64-bit DMA address */
uint32_t _dmac_la[2]; /* 2 x 32-bit address */
size_t dmac_size; /* DMA cookie size */
uint_t dmac_type; /* bus specific type bits */
The dmac_laddress
specifies a 64-bit I/O
address that is appropriate for programming the device's DMA engine. If a
device has a 64-bit DMA address register, a driver should use this field to
program the DMA engine. The dmac_address
field
specifies a 32-bit I/O address that should be used for devices that have a
32-bit DMA address register. The dmac_size
field contains
the transfer count. Depending on the bus architecture, the dmac_type
field
in the cookie might be required by the driver. The driver should not perform
any manipulations, such as logical or arithmetic, on the cookie.
ddi_dma_cookie_t cookie;
if (ddi_dma_buf_bind_handle(xsp->handle,xsp->bp, flags, xxstart,
(caddr_t)xsp, &cookie, &xsp->ccount) != DDI_DMA_MAPPED) {
/* error handling */
}
sglp = regp->sglist;
for (cnt = 1; cnt <= SGLLEN; cnt++, sglp++) {
/* store the cookie parms into the S/G list */
ddi_put32(xsp->access_hdl, &sglp->dma_size,
(uint32_t)cookie.dmac_size);
ddi_put32(xsp->access_hdl, &sglp->dma_addr,
cookie.dmac_address);
/* Check for end of cookie list */
if (cnt == xsp->ccount)
break;
/* Get next DMA cookie */
(void) ddi_dma_nextcookie(xsp->handle, &cookie);
}
/* start DMA transfer */
ddi_put8(xsp->access_hdl, ®p->csr,
ENABLE_INTERRUPTS | START_TRANSFER);
9.6.8. Freeing the DMA Resources
After a DMA transfer is completed, usually in the interrupt routine, the driver can release DMA resources by calling ddi_dma_unbind_handle(9F).
As described in Synchronizing Memory Objects, ddi_dma_unbind_handle(9F) calls ddi_dma_sync(9F), eliminating the need for any explicit synchronization. After calling ddi_dma_unbind_handle(9F), the DMA resources become invalid, and further references to the resources have undefined results. The following example shows how to use ddi_dma_unbind_handle(9F).
static uint_t
xxintr(caddr_t arg)
{
struct xxstate *xsp = (struct xxstate *)arg;
uint8_t status;
volatile uint8_t temp;
mutex_enter(&xsp->mu);
/* read status */
status = ddi_get8(xsp->access_hdl, &xsp->regp->csr);
if (!(status & INTERRUPTING)) {
mutex_exit(&xsp->mu);
return (DDI_INTR_UNCLAIMED);
}
ddi_put8(xsp->access_hdl, &xsp->regp->csr, CLEAR_INTERRUPT);
/* for store buffers */
temp = ddi_get8(xsp->access_hdl, &xsp->regp->csr);
ddi_dma_unbind_handle(xsp->handle);
/* Check for errors. */
xsp->busy = 0;
mutex_exit(&xsp->mu);
if ( /* pending transfers */ ) {
(void) xxstart((caddr_t)xsp);
}
return (DDI_INTR_CLAIMED);
}
The DMA resources should be released. The DMA resources should be reallocated if a different object is to be used in the next transfer. However, if the same object is always used, the resources can be allocated once. The resources can then be reused as long as intervening calls to ddi_dma_sync(9F) remain.
9.6.9. Freeing the DMA Handle
When the driver is detached, the DMA handle must be freed. The ddi_dma_free_handle(9F) function destroys the DMA handle and destroys any residual resources that the system is caching on the handle. Any further references of the DMA handle will have undefined results.
9.6.10. Canceling DMA Callbacks
DMA callbacks cannot be canceled. Canceling a DMA callback requires
some additional code in the driver's detach(9E) entry point. The detach
routine must not return DDI_SUCCESS
if any
outstanding callbacks exist. See Canceling DMA Callbacks. When DMA callbacks occur, the detach
routine
must wait for the callback to run. When the callback has finished, detach
must prevent the callback from rescheduling itself. Callbacks
can be prevented from rescheduling through additional fields in the state
structure, as shown in the following example.
static int
xxdetach(dev_info_t *dip, ddi_detach_cmd_t cmd)
{
/* ... */
mutex_enter(&xsp->callback_mutex);
xsp->cancel_callbacks = 1;
while (xsp->callback_count > 0) {
cv_wait(&xsp->callback_cv, &xsp->callback_mutex);
}
mutex_exit(&xsp->callback_mutex);
/* ... */
}
static int
xxstrategy(struct buf *bp)
{
/* ... */
mutex_enter(&xsp->callback_mutex);
xsp->bp = bp;
error = ddi_dma_buf_bind_handle(xsp->handle, xsp->bp, flags,
xxdmacallback, (caddr_t)xsp, &cookie, &ccount);
if (error == DDI_DMA_NORESOURCES)
xsp->callback_count++;
mutex_exit(&xsp->callback_mutex);
/* ... */
}
static int
xxdmacallback(caddr_t callbackarg)
{
struct xxstate *xsp = (struct xxstate *)callbackarg;
/* ... */
mutex_enter(&xsp->callback_mutex);
if (xsp->cancel_callbacks) {
/* do not reschedule, in process of detaching */
xsp->callback_count--;
if (xsp->callback_count == 0)
cv_signal(&xsp->callback_cv);
mutex_exit(&xsp->callback_mutex);
return (DDI_DMA_CALLBACK_DONE); /* don't reschedule it */
}
/*
* Presumably at this point the device is still active
* and will not be detached until the DMA has completed.
* A return of 0 means try again later
*/
error = ddi_dma_buf_bind_handle(xsp->handle, xsp->bp, flags,
DDI_DMA_DONTWAIT, NULL, &cookie, &ccount);
if (error == DDI_DMA_MAPPED) {
/* Program the DMA engine. */
xsp->callback_count--;
mutex_exit(&xsp->callback_mutex);
return (DDI_DMA_CALLBACK_DONE);
}
if (error != DDI_DMA_NORESOURCES) {
xsp->callback_count--;
mutex_exit(&xsp->callback_mutex);
return (DDI_DMA_CALLBACK_DONE);
}
mutex_exit(&xsp->callback_mutex);
return (DDI_DMA_CALLBACK_RUNOUT);
}
9.6.11. Synchronizing Memory Objects
In the process of accessing the memory object, the driver might need to synchronize the memory object with respect to various caches. This section provides guidelines on when and how to synchronize memory objects.
Cache
CPU cache is a very high-speed memory that sits between the CPU and the system's main memory. I/O cache sits between the device and the system's main memory, as shown in the following figure.

When an attempt is made to read data from main memory, the associated cache checks for the requested data. If the data is available, the cache supplies the data quickly. If the cache does not have the data, the cache retrieves the data from main memory. The cache then passes the data on to the requester and saves the data in case of a subsequent request.
Similarly, on a write cycle, the data is stored in the cache quickly. The CPU or device is allowed to continue executing, that is, transferring data. Storing data in a cache takes much less time than waiting for the data to be written to memory.
With this model, after a device transfer is complete, the data can still be in the I/O cache with no data in main memory. If the CPU accesses the memory, the CPU might read the wrong data from the CPU cache. The driver must call a synchronization routine to flush the data from the I/O cache and update the CPU cache with the new data. This action ensures a consistent view of the memory for the CPU. Similarly, a synchronization step is required if data modified by the CPU is to be accessed by a device.
You can create additional caches and buffers between the device and
memory, such as bus extenders and bridges. Use ddi_dma_sync(9F)
to
synchronize all applicable caches.
ddi_dma_sync Function
A memory object might have multiple mappings, such as for the CPU and
for a device, by means of a DMA handle. A driver with multiple mappings needs
to call ddi_dma_sync(9F) if any mappings are used to modify the memory object.
Calling ddi_dma_sync
ensures that the modification of
the memory object is complete before the object is accessed through a different
mapping. The ddi_dma_sync
function can also inform other
mappings of the object if any cached references to the object are now stale.
Additionally, ddi_dma_sync
flushes or invalidates stale
cache references as necessary.
Generally, the driver must call ddi_dma_sync
when
a DMA transfer completes. The exception to this rule is if deallocating the
DMA resources with ddi_dma_unbind_handle(9F) does an implicit ddi_dma_sync
on behalf of the driver. The syntax for ddi_dma_sync
is as follows:
int ddi_dma_sync(ddi_dma_handle_t handle, off_t off,
size_t length, uint_t type);
If the object is going to be read by the DMA engine of the device, the
device's view of the object must be synchronized by setting type to DDI_DMA_SYNC_FORDEV
. If the DMA engine of the device has written
to the memory object and the object is going to be read by the CPU, the CPU's
view of the object must be synchronized by setting type to DDI_DMA_SYNC_FORCPU
.
The following example demonstrates synchronizing a DMA object for the CPU:
if (ddi_dma_sync(xsp->handle, 0, length, DDI_DMA_SYNC_FORCPU)
== DDI_SUCCESS) {
/* the CPU can now access the transferred data */
/* ... */
} else {
/* error handling */
}
Use the flag DDI_DMA_SYNC_FORKERNEL
if the only mapping
is for the kernel, as in the case of memory that is allocated by ddi_dma_mem_alloc(9F). The system tries to synchronize the kernel's view
more quickly than the CPU's view. If the system cannot synchronize the kernel
view faster, the system acts as if the DDI_DMA_SYNC_FORCPU
flag
were set.
9.7. DMA Windows
If an object does not fit within the limitations of the DMA engine, the transfer must be broken into a series of smaller transfers. The driver can break up the transfer itself. Alternatively, the driver can allow the system to allocate resources for only part of the object, thereby creating a series of DMA windows. Allowing the system to allocate resources is the preferred solution, because the system can manage the resources more effectively than the driver can manage the resources.
A DMA window has two attributes. The offset attribute is measured from the beginning of the object. The length attribute is the number of bytes of memory to be allocated. After a partial allocation, only a range of length bytes that starts at offset has allocated resources.
A DMA window is requested by specifying the DDI_DMA_PARTIAL
flag
as a parameter to ddi_dma_buf_bind_handle(9F) or ddi_dma_addr_bind_handle(9F). Both functions return DDI_DMA_PARTIAL_MAP
if
a window can be established. However, the system might allocate resources
for the entire object, in which case DDI_DMA_MAPPED
is
returned. The driver should check the return value to determine whether DMA
windows are in use. See the following example.
static int
xxstart (caddr_t arg)
{
struct xxstate *xsp = (struct xxstate *)arg;
struct device_reg *regp = xsp->reg;
ddi_dma_cookie_t cookie;
int status;
mutex_enter(&xsp->mu);
if (xsp->busy) {
/* transfer in progress */
mutex_exit(&xsp->mu);
return (DDI_DMA_CALLBACK_RUNOUT);
}
xsp->busy = 1;
mutex_exit(&xsp->mu);
if ( /* transfer is a read */) {
flags = DDI_DMA_READ;
} else {
flags = DDI_DMA_WRITE;
}
flags |= DDI_DMA_PARTIAL;
status = ddi_dma_buf_bind_handle(xsp->handle, xsp->bp,
flags, xxstart, (caddr_t)xsp, &cookie, &ccount);
if (status != DDI_DMA_MAPPED &&
status != DDI_DMA_PARTIAL_MAP)
return (DDI_DMA_CALLBACK_RUNOUT);
if (status == DDI_DMA_PARTIAL_MAP) {
ddi_dma_numwin(xsp->handle, &xsp->nwin);
xsp->partial = 1;
xsp->windex = 0;
} else {
xsp->partial = 0;
}
/* Program the DMA engine. */
return (DDI_DMA_CALLBACK_DONE);
}
Two functions operate with DMA windows. The first, ddi_dma_numwin(9F), returns
the number of DMA windows for a particular DMA object. The other function, ddi_dma_getwin(9F), allows repositioning
within the object, that is, reallocation of system resources. The ddi_dma_getwin
function shifts the current window to a new window within the
object. Because ddi_dma_getwin
reallocates system resources
to the new window, the previous window becomes invalid.
Do not move the DMA windows with a call to ddi_dma_getwin
before transfers into the current window are complete. Wait until
the transfer to the current window is complete, which is when the interrupt
arrives. Then call ddi_dma_getwin
to avoid data corruption.
The ddi_dma_getwin
function is normally called
from an interrupt routine, as shown in Interrupt Handler Using DMA Windows. The first DMA transfer is initiated as a result of a call to the
driver. Subsequent transfers are started from the interrupt routine.
The interrupt routine examines the status of the device to determine whether the device completes the transfer successfully. If not, normal error recovery occurs. If the transfer is successful, the routine must determine whether the logical transfer is complete. A complete transfer includes the entire object as specified by the buf(9S) structure. In a partial transfer, only one DMA window is moved. In a partial transfer, the interrupt routine moves the window with ddi_dma_getwin(9F), retrieves a new cookie, and starts another DMA transfer.
If the logical request has been completed, the interrupt routine checks for pending requests. If necessary, the interrupt routine starts a transfer. Otherwise, the routine returns without invoking another DMA transfer. The following example illustrates the usual flow control.
static uint_t
xxintr(caddr_t arg)
{
struct xxstate *xsp = (struct xxstate *)arg;
uint8_t status;
volatile uint8_t temp;
mutex_enter(&xsp->mu);
/* read status */
status = ddi_get8(xsp->access_hdl, &xsp->regp->csr);
if (!(status & INTERRUPTING)) {
mutex_exit(&xsp->mu);
return (DDI_INTR_UNCLAIMED);
}
ddi_put8(xsp->access_hdl,&xsp->regp->csr, CLEAR_INTERRUPT);
/* for store buffers */
temp = ddi_get8(xsp->access_hdl, &xsp->regp->csr);
if ( /* an error occurred during transfer */ ) {
bioerror(xsp->bp, EIO);
xsp->partial = 0;
} else {
xsp->bp->b_resid -= /* amount transferred */ ;
}
if (xsp->partial && (++xsp->windex < xsp->nwin)) {
/* device still marked busy to protect state */
mutex_exit(&xsp->mu);
(void) ddi_dma_getwin(xsp->handle, xsp->windex,
&offset, &len, &cookie, &ccount);
/* Program the DMA engine with the new cookie(s). */
return (DDI_INTR_CLAIMED);
}
ddi_dma_unbind_handle(xsp->handle);
biodone(xsp->bp);
xsp->busy = 0;
xsp->partial = 0;
mutex_exit(&xsp->mu);
if ( /* pending transfers */ ) {
(void) xxstart((caddr_t)xsp);
}
return (DDI_INTR_CLAIMED);
}