This appendix discusses general issues about hardware that is capable of supporting illumos. The discussion includes the processor, bus architectures, and memory models that are supported by illumos. Various device issues and the PROM used in Sun platforms are also covered.
The material in this appendix is for informational purposes only. This information might be of use during driver debugging. However, many of these implementation details are hidden from device drivers by the illumos DDI/DKI interfaces.
This appendix provides information on the following subjects:
SPARC Processor Issues
This section describes a number of SPARC processor-specific topics such as data alignment, byte ordering, register windows, and availability of floating-point instructions. For information on x86 processor-specific topics, see x86 Processor Issues.
Drivers should never perform floating-point operations, because these operations are not supported in the kernel.
SPARC Data Alignment
All quantities must be aligned on their natural boundaries, using standard C data types:
shortintegers are aligned on 16-bit boundaries.
intintegers are aligned on 32-bit boundaries.
longintegers are aligned on 64-bit boundaries for SPARC systems. For information on data models, see Making a Device Driver 64-Bit Ready.
long longintegers are aligned on 64-bit boundaries.
Usually, the compiler handles any alignment issues. However, driver writers are more likely to be concerned about alignment because the proper data types must be used to access the devices. Because device registers are commonly accessed through a pointer reference, drivers must ensure that pointers are properly aligned when accessing the device.
Member Alignment in SPARC Structures
Because of the data alignment restrictions imposed by the SPARC processor,
C structures also have alignment requirements. Structure alignment requirements
are imposed by the most strictly aligned structure component. For example,
a structure containing only characters has no alignment restrictions, while
a structure containing a
must be constructed to guarantee that this member falls on a 64-bit boundary.
SPARC Byte Ordering
The SPARC processor uses big-endian byte ordering. The most significant byte (MSB) of an integer is stored at the lowest address of the integer. The least significant byte is stored at the highest address for words in this processor. For example, byte 63 is the least significant byte for 64-bit processors.
SPARC Register Windows
SPARC processors use register windows. Each register window consists of eight in registers, eight local registers, eight out registers, and eight global registers. Out registers are the in registers for the next window. The number of register windows ranges from 2 to 32, depending on the processor implementation.
Because drivers are normally written in C, the compiler usually hides the fact that register windows are used. However, you might have to use register windows when debugging the driver.
SPARC Multiply and Divide Instructions
The Version 7 SPARC processors do not have multiply or divide instructions. The multiply and divide instructions are emulated in software. Because a driver might run on a Version 7, Version 8, or Version 9 processor, avoid intensive integer multiplication and division. Instead, use bitwise left and right shifts to multiply and divide by powers of two.
The SPARC Architecture Manual, Version 9, contains more specific information on the SPARC CPU. The SPARC Compliance Definition, Version 2.4, contains details of the application binary interface (ABI) for SPARC V9. The manual describes the 32-bit SPARC V8 ABI and the 64-bit SPARC V9 ABI. You can obtain this document from SPARC International at http://www.sparc.com.
x86 Processor Issues
Data types have no alignment restrictions. However, extra memory cycles might be required for the x86 processor to properly handle misaligned data transfers.
Drivers should not perform floating-point operations, as these operations are not supported in the kernel.
x86 Byte Ordering
The x86 processors use little-endian byte ordering. The least significant byte (LSB) of an integer is stored at the lowest address of the integer. The most significant byte is stored at the highest address for data items in this processor. For example, byte 7 is the most significant byte for 64-bit processors.
x86 Architecture Manuals
Both Intel Corporation and AMD publish a number of books on the x86 family of processors. See http://www.intel.com and http://www.amd.com/.
To achieve the goal of multiple-platform, multiple-instruction-set architecture portability, host bus dependencies were removed from the drivers. The first dependency issue to be addressed was the endianness, that is, byte ordering, of the processor. For example, the x86 processor family is little-endian while the SPARC architecture is big-endian.
Bus architectures display the same endianness types as processors. The PCI local bus, for example, is little-endian, the SBus is big-endian, the ISA bus is little-endian, and so on.
To maintain portability between processors and buses,
DDI-compliant drivers must be endian neutral. Although drivers can manage
their endianness by runtime checks or by preprocessor directives like
#ifdef _LITTLE_ENDIAN in the source code, long-term maintenance can be
troublesome. In some cases, the DDI framework performs the byte swapping using
a software approach. In other cases, byte swapping can be done by hardware
page-level swapping as in memory management unit (MMU) or by special machine
instructions. The DDI framework can take advantage of the hardware features
to improve performance.
Along with being endian-neutral, portable drivers must also be independent from data ordering of the processor. Under most circumstances, data must be transferred in the sequence instructed by the driver. However, sometimes data can be merged, batched, or reordered to streamline the data transfer, as illustrated in the following figure. For example, data merging can be applied to accelerate graphics display on frame buffers. Drivers have the option to advise the DDI framework to use other optimal data transfer mechanisms during the transfer.
To improve performance, the CPU uses internal store buffers to temporarily store data. Using internal buffers can affect the synchronization of device I/O operations. Therefore, the driver needs to take explicit steps to make sure that writes to registers are completed at the proper time.
For example, consider the case where access to device space, such as registers or a frame buffer, is synchronized by a lock. The driver needs to check that the store to the device space has actually completed before releasing the lock. The release of the lock does not guarantee the flushing of I/O buffers.
To give another example, when acknowledging an interrupt, the driver usually sets or clears a bit in a device control register. The driver must ensure that the write to the control register has reached the device before the interrupt handler returns. Similarly, a device might require a delay, that is, driver busy-waits, after writing a command to the control register. In such a case, the driver must ensure that the write has reached the device before delaying.
Where device registers can be read without undesirable side effects, verification of a write can simply consist of reading the register immediately after the write. If that particular register cannot be read without undesirable side effects, another device register in the same register set can be used.
System Memory Model
The system memory model defines the semantics of memory operations such as load and store and specifies how the order in which these operations are issued by a processor is related to the order in which they reach memory. The memory model applies to both uniprocessors and shared-memory multiprocessors. Two memory models are supported: total store ordering (TSO) and partial store ordering (PSO).
Total Store Ordering (TSO)
TSO guarantees that the sequence in which store, FLUSH, and atomic load-store instructions appear in memory for a given processor is identical to the sequence in which they were issued by the processor.
Both x86 and SPARC processors support TSO.
Partial Store Ordering (PSO)
PSO does not guarantee that the sequence in which store, FLUSH, and atomic load-store instructions appear in memory for a given processor is identical to the sequence in which they were issued by the processor. The processor can reorder the stores so that the sequence of stores to memory is not the same as the sequence of stores issued by the CPU.
SPARC processors support PSO; x86 processors do not.
For SPARC processors, conformance between issuing order and memory order is provided by the system framework using the STBAR instruction. If two of the above instructions are separated by an STBAR instruction in the issuing order of a processor, or if the instructions reference the same location, the memory order of the two instructions is the same as the issuing order. Enforcement of strong data-ordering in DDI-compliant drivers is provided by the ddi_regs_map_setup(9F) interface. Compliant drivers cannot use the STBAR instruction directly.
See the SPARC Architecture Manual, Version 9, for more details on the SPARC memory model.
This section describes device identification, device addressing, and interrupts.
Device identification is the process of determining which devices are present in the system. Some devices are self-identifying meaning that the device itself provides information to the system so that the system can identify the device driver that needs to be used. SBus and PCI local bus devices are examples of self-identifying devices. On SBus, the information is usually derived from a small Forth program stored in the FCode PROM on the device. Most PCI devices provide a configuration space containing device configuration information. See the sbus(4) and pci(4) man pages for more information.
All modern bus architectures require devices to be self-identifying.
Supported Interrupt Types
The illumos platform supports both polling and vectored interrupts. The illumos DDI/DKI interrupt model is the same for both types of interrupts. See Interrupt Handlers for more information about interrupt handling.
This section covers addressing and device configuration issues specific to the buses that the illumos platform supports.
PCI Local Bus
The PCI local bus is a high-performance bus designed for high-speed data transfer. The PCI bus resides on the system board. This bus is normally used as an interconnect mechanism between highly integrated peripheral components, peripheral add-on boards, and host processor or memory systems. The host processor, main memory, and the PCI bus itself are connected through a PCI host bridge, as shown in Machine Block Diagram.
A tree structure of interconnected I/O buses is supported through a series of PCI bus bridges. Subordinate PCI bus bridges can be extended underneath the PCI host bridge to enable a single bus system to be expanded into a complex system with multiple secondary buses. PCI devices can be connected to one or more of these secondary buses. In addition, other bus bridges, such as SCSI or USB, can be connected.
Every PCI device has a unique vendor ID and device ID. Multiple devices of the same kind are further identified by their unique device numbers on the bus where they reside.
The PCI host bridge provides an interconnect between the processor and peripheral components. Through the PCI host bridge, the processor can directly access main memory independent of other PCI bus masters. For example, while the CPU is fetching data from the cache controller in the host bridge, other PCI devices can also access the system memory through the host bridge. The advantage of this architecture is that this architecture separates the I/O bus from the processor's host bus.
The PCI host bridge also provides data access mappings between the CPU and peripheral I/O devices. The bridge maps every peripheral device to the host address domain so that the processor can access the device through programmed I/O. On the local bus side, the PCI host bridge maps the system memory to the PCI address domain so that the PCI device can access the host memory as a bus master. Machine Block Diagram shows the two address domains.
PCI Address Domain
The PCI address domain consists of three distinct address spaces: configuration, memory, and I/O space.
PCI Configuration Address Space
Configuration space is defined geographically. The location of a peripheral device is determined by its physical location within an interconnected tree of PCI bus bridges. A device is located by its bus number and device (slot) number. Each peripheral device contains a set of well-defined configuration registers in its PCI configuration space. The registers are used not only to identify devices but also to supply device configuration information to the configuration framework. For example, base address registers in the device configuration space must be mapped before a device can respond to data access.
The method for generating configuration cycles is host dependent. In x86 machines, special I/O ports are used. On other platforms, the PCI configuration space can be memory-mapped to certain address locations corresponding to the PCI host bridge in the host address domain. When a device configuration register is accessed by the processor, the request is routed to the PCI host bridge. The bridge then translates the access into proper configuration cycles on the bus.
PCI Configuration Base Address Registers
The PCI configuration space consists of up to six 32-bit base address registers for each device. These registers provide both size and data type information. System firmware assigns base addresses in the PCI address domain to these registers.
Each addressable region can be either memory or I/O space. The value contained in bit 0 of the base address register identifies the type. A value of 0 in bit 0 indicates a memory space and a value of 1 indicates an I/O space. The following figure shows two base address registers: one for memory and the other for I/O types.
PCI Memory Address Space
PCI supports both 32-bit and 64-bit addresses for memory space. System firmware assigns regions of memory space in the PCI address domain to PCI peripherals. The base address of a region is stored in the base address register of the device's PCI configuration space. The size of each region must be a power of two, and the assigned base address must be aligned on a boundary equal to the size of the region. Device addresses in memory space are memory-mapped into the host address domain so that data access to any device can be performed by the processor's native load or store instructions.
PCI I/O Address Space
PCI supports 32-bit I/O space. I/O space
can be accessed differently on different platforms. Processors with special
I/O instructions, like the Intel processor family, access the I/O space with
out instructions. Machines without special
I/O instructions will map to the address locations corresponding to the PCI
host bridge in the host address domain. When the processor accesses the memory-mapped
addresses, an I/O request will be sent to the PCI host bridge, which then
translates the addresses into I/O cycles and puts them on the PCI bus. Memory-mapped
I/O is performed by the native load/store instructions of the processor.
PCI Hardware Configuration Files
Hardware configuration files should be unnecessary for PCI local bus devices. However, on some occasions drivers for PCI devices need to use hardware configuration files to augment the driver private information. See the driver.conf(4) and pci(4) man pages for further details.
The standard PCI bus has evolved into PCI Express. PCI Express is the next generation high performance I/O bus for connecting peripheral devices in such applications as desktop, mobile, workstation, server, embedded computing and communication platforms.
PCI Express improves bus performance, reduces overall system cost and takes advantage of new developments in computer design. PCI Express uses a serial, point-to-point type interconnect for communication between two devices. Using switches enables users to connect a large number of devices together in a system. Serial interconnect implies fewer pins per device package, which reduces cost and makes the performance highly scalable.
The PCI Express bus has built-in features to accommodate the following technologies:
QoS (Quality of Service)
Hotplugging and hot swap
Advanced power management
RAS (Reliability, Available, Serviceable)
Improved error handling
A PCI Express interconnect that connects two devices together is called a link. A link can either be x1, x2, x4, x8, x12, x16 or x32 bidirectional signal pairs. These signals are called lanes. The bandwidth (x1) of each lane is 500 MB/sec in duplex mode. Although PCI-X and PCI Express have different hardware connections, the two buses are identical from a driver writer's point of view. PCI-X is a shared bus. For example, all the devices on the bus share a single set of data lines and signal lines. PCI-Express is a switched bus, which enables more efficient use of the bandwidth between the devices and the system bus.
For more information on PCI Express, please refer to the following web site: http://www.pcisig.com/
Typical SBus systems consist of a motherboard (containing the CPU and SBus interface logic), a number of SBus devices on the motherboard itself, and a number of SBus expansion slots. An SBus can also be connected to other types of buses through an appropriate bus bridge.
The SBus is geographically addressed. Each SBus slot exists at a fixed physical address in the system. An SBus card has a different address, depending on which slot it is plugged into. Moving an SBus device to a new slot causes the system to treat this device as a new device.
The SBus uses polling interrupts. When an SBus device interrupts, the system only knows which of several devices might have issued the interrupt. The system interrupt handler must ask the driver for each device whether that device is responsible for the interrupt.
SBus Physical Address Space
The following table shows the physical
address space layout of the Sun UltraSPARC 2 computer. A physical address
on the UltraSPARC 2 model consists of 41 bits. The 41-bit physical address
space is further broken down into multiple 33-bit address spaces identified
2 Gbytes main memory
0x80 – 0xDF
Reserved on Ultra 2
0xE2 – 0xFD
Reserved on Ultra 2
UPA Slave (FFB)
System I/O space
SBus Slot 0
SBus Slot 1
SBus Slot 2
SBus Slot 3
SBus Slot D
SBus Slot E
SBus Slot F
Physical SBus Addresses
The SBus has 32 address bits, as described in the SBus Specification. The following table describes how the Ultra 2 uses the address bits.
0 - 27
These bits are the SBus address lines used by an SBus card to address the contents of the card.
28 - 31
Used by the CPU to select one of the SBus slots. These bits generate the SlaveSelect lines.
This addressing scheme yields the Ultra 2 addresses shown in Device Physical Space in the Ultra 2. Other implementations might use a different number of address bits.
The Ultra 2 has seven SBus slots, four of which are physical. Slots 0 through 3 are available for SBus cards. Slots 4-12 are reserved. The slots are used as follows:
Slots 0-3 are physical slots that have DMA-master capability.
Slots D, E, and F are not actual physical slots, but refer to the onboard direct memory access (DMA), SCSI, Ethernet, and audio controllers. For convenience, these classes of devices are viewed as being plugged into slots D, E, and F.
Some SBus slots are slave-only slots. Drivers that require DMA capability should use ddi_slaveonly(9F) to determine whether their device is in a DMA-capable slot. For an example of this function, see attach Entry Point.
SBus Hardware Configuration Files
Hardware configuration files are normally unnecessary for SBus devices. However, on some occasions, drivers for SBus devices need to use hardware configuration files to augment the information provided by the SBus card. See the driver.conf(4) and sbus(4) man page for further details.
This section describes issues with special devices.
While most driver operations can be performed without mechanisms for synchronization and protection beyond those provided by the locking primitives, some devices require that a sequence of events occur in order without interruption. In conjunction with the locking primitives, the function ddi_enter_critical(9F) asks the system to guarantee, to the best of its ability, that the current thread will neither be preempted nor interrupted. This guarantee stays in effect until a closing call to ddi_exit_critical(9F) is made. See the ddi_enter_critical(9F) man page for details.
Many chips specify that they can be accessed only at specified intervals. For example, the Zilog Z8530 SCC has a “write recovery time” of 1.6 microseconds. This specification means that a delay must be enforced with drv_usecwait(9F) when writing characters with an 8530. In some instances, the specifications do not make explicit what delays are needed, so the delays must be determined empirically.
Be careful not to compound delays for parts of devices that might exist in large numbers, for example, thousands of SCSI disk drives.
Internal Sequencing Logic
Devices with internal sequencing logic map multiple internal registers to the same external address. The various kinds of internal sequencing logic include the following types:
The Intel 8251A and the Signetics 2651 alternate the same external register between two internal mode registers. Writing to the first internal register is accomplished by writing to the external register. This write, however, has the side effect of setting up the sequencing logic in the chip so that the next read/write operation refers to the second internal register.
The NEC PD7201 PCC has multiple internal data registers. To write a byte into a particular register, two steps must be performed. The first step is to write into register zero the number of the register into which the following byte of data will go. The data is then written to the specified data register. The sequencing logic automatically sets up the chip so that the next byte sent will go into data register zero.
The AMD 9513 timer has a data pointer register that points at the data register into which a data byte will go. When sending a byte to the data register, the pointer is incremented. The current value of the pointer register cannot be read.
Note the following common interrupt-related issues:
A controller interrupt does not necessarily indicate that both the controller and one of its slave devices are ready. For some controllers, an interrupt can indicate that either the controller is ready or one of its devices is ready but not both.
Not all devices power up with interrupts disabled and can begin interrupting at any time.
Some devices do not provide a way to determine that the board has generated an interrupt.
Not all interrupting boards shut off interrupts when told to do so or after a bus reset.
PROM on SPARC Machines
Some platforms have a PROM monitor that provides support for debugging a device without an operating system. This section describes how to use the PROM on SPARC machines to map device registers so that they can be accessed. Usually, the device can be exercised enough with PROM commands to determine whether the device is working correctly.
See the boot(1M) man page for a description of the x86 boot subsystem.
The PROM has several purposes, including:
Bringing the machine up from power on, or from a hard reset PROM
Providing an interactive tool for examining and setting memory, device registers, and memory mappings
Booting the illumos system.
Simply powering up the computer and attempting to use its PROM to examine device registers can fail. While the device might be correctly installed, those mappings are specific to illumos and do not become active until the illumos kernel is booted. Upon power up, the PROM maps only essential system devices, such as the keyboard.
Taking a system crash dump using the
Open Boot PROM 3
For complete documentation on the Open Boot PROM, see the Open Boot PROM Toolkit User's Guide and the monitor(1M) man page. The examples in this section refer to a Sun4UTM architecture. Other architectures might require different commands to perform actions.
The Open Boot PROM is currently used on Sun machines with an SBus
or UPA/PCI. The Open Boot PROM uses an “
prompt. On older machines, you might have to type `
get the “
If the PROM is in secure mode (the
security-mode parameter is not set to none), the PROM password
might be required (set in the
printenv command displays all parameters and
Help is available with the
EMACS-style command-line history is available. Use Control-N (next) and Control-P (previous) to traverse the history list.
The Open Boot PROM uses the Forth programming language. Forth is a stack-based language. Arguments must be pushed on the stack before running the correct command (called a word), and the result is left on the stack.
To place a number on the stack, type its value.
ok 57 ok 68
To add the two top values on the stack, use the
The result remains on the stack. The stack is shown with the
ok .s bf
The default base is hexadecimal. The
decimal words can be used to switch bases.
ok decimal ok .s 191
See the Forth User's Guide for more information.
Walking the PROMs Device Tree
ls walk the PROM device tree to get to the device. The
must be used to establish a position in the tree before
work. This example is from an Ultra 1 workstation with a
buffer on an SBus.
ok cd /
To see the devices attached to the current node in the tree, use
ok ls f006a064 SUNW,UltraSPARC@0,0 f00598b0 sbus@1f,0 f00592dc counter-timer@1f,3c00 f004eec8 virtual-memory f004e8e8 memory@0,0 f002ca28 aliases f002c9b8 options f002c880 openprom f002c814 chosen f002c7a4 packages
The full node name can be used:
ok cd sbus@1f,0 ok ls f006a4e4 cgsix@2,0 f0068194 SUNW,bpp@e,c800000 f0065370 ledma@e,8400010 f006120c espdma@e,8400000 f005a448 SUNW,pll@f,1304000 f005a394 sc@f,1300000 f005a24c zs@f,1000000 f005a174 zs@f,1100000 f005a0c0 eeprom@f,1200000 f0059f8c SUNW,fdtwo@f,1400000 f0059ec4 flashprom@f,0 f0059e34 auxio@f,1900000 f0059d28 SUNW,CS4231@d,c000000
Rather than using the full node name in the previous example, you could also use an abbreviation. The abbreviated command-line entry looks like the following example:
ok cd sbus
The name is actually
device@slot,offset (for SBus
cgsix device is in slot 2 and starts at offset
0. If an SBus device is displayed in this tree, the device has been recognized
by the PROM.
.properties command displays the PROM properties
of a device. These properties can be examined to determine which properties
the device exports. This information is useful later to ensure that the driver
is looking for the correct hardware properties. These properties are the same
properties that can be retrieved with ddi_getprop(9F).
ok cd cgsix ok .properties character-set ISO8859-1 intr 00000005 00000000 interrupts 00000005 reg 00000002 00000000 01000000 dblbuf 00 00 00 00 vmsize 00 00 00 01 ...
reg property defines an array of register description structures containing the following
uint_t bustype; /* cookie for related bus type*/ uint_t addr; /* address of reg relative to bus */ uint_t size; /* size of this register set */
cgsix example, the address is 0.
Mapping the Device
A device must be mapped into memory to be tested. The PROM can then be used to verify proper operation of the device by using data-transfer commands to transfer bytes, words, and long words. If the device can be operated from the PROM, even in a limited way, the driver should also be able to operate the device.
To set up the device for initial testing, perform the following steps:
Determine the SBus slot number the device is in.
In this example, the
cgsixdevice is located in slot 2.
Determine the offset within the physical address space used by the device.
The offset used is specific to the device. In the
cgsixexample, the video memory happens to start at an offset of 0x800000.
select-devword to select the Sbus device and the
map-inword to map the device in.
select-devword takes a string of the device path as its argument. The
map-inword takes an offset, a slot number, and a size as arguments to map. Like the offset, the size of the byte transfer is specific to the device. In the
cgsixexample, the size is set to 0x100000 bytes.
In the following code example, the Sbus path is displayed as an argument to the
select-devword, and the offset, slot number, and size values for the frame buffer are displayed as arguments to the
map-inword. Notice the space between the opening quote and / in the
select-devargument. The virtual address to use remains on top of the stack. The stack is shown using the
.sword. The stack can be assigned a name with the
ok " sbus@1f,0" select-dev ok 800000 2 100000 map-in ok .s ffe98000 ok constant fb
Reading and Writing
The PROM provides a variety of 8-bit, 16-bit, and 32-bit operations.
In general, a
c (character) prefix indicates an 8-bit (one-byte)
w (word) prefix indicates a 16-bit (two-byte)
operation; and an
L (longword) prefix indicates a 32-bit
A suffix of
! indicates a write operation. The write
operation takes the first two items off the stack. The first item is the address,
and the second item is the value.
ok 55 ffe98000 c!
A suffix of
@ indicates a read operation. The read
operation takes the address off the stack.
ok ffe98000 c@ ok .s 55
A suffix of
? is used to display the value without
affecting the stack.
ok ffe98000 c? 55
Be careful when trying to query the device. If the mappings are not
set up correctly, trying to read or write could cause errors. Special words
are provided to handle these cases.
lprobe, for example, read from the given address but
return zero if the location does not respond, or nonzero if it does.
ok fffa4000 c@ Data Access Error ok fffa4000 cprobe ok .s0 ok ffe98000 cprobe ok .s 0 ffffffffffffffff
A region of memory can be shown with the
This takes an address and a length,
and displays the contents of the memory region in bytes.
In the following example, the
fill word is used to
fill video memory with a pattern.
fill takes the address,
the number of bytes to fill, and the byte to use. Use
Lfill for words and longwords. This fill example causes
cgsix to display simple patterns based on the byte
ok " /sbus" select-dev ok 800000 2 100000 map-in ok constant fb ok fb 10000 ff fill ok fb 20000 0 fill ok fb 18000 55 fill ok fb 15000 3 fill ok fb 10000 5 fillok fb 5000 f9 fill