fbt Provider
The Function Boundary Tracing (FBT) provider provides probes associated with the entry to and return from most functions in the illumos kernel. The function is the fundamental unit of program text. In a well-designed system, each function performs a discrete and well-defined operation on a specified object or series of like objects. Therefore, even on the smallest illumos systems, FBT will provide on the order of 20,000 probes.
Similar to other DTrace providers, FBT has no probe effect when it is not explicitly enabled. When enabled, FBT only induces a probe effect in probed functions. While the FBT implementation is highly specific to the instruction set architecture, FBT has been implemented on both SPARC and x86 platforms. For each instruction set, there are a small number of functions that do not call other functions and are highly optimized by the compiler (so-called leaf functions) that cannot be instrumented by FBT. Probes for these functions are not present in DTrace.
Effective use of FBT probes requires knowledge of
the operating system implementation. Therefore, it is recommended that you
use FBT only when developing kernel software or when other
providers are not sufficient. Other DTrace providers, including syscall
,
sched
, proc
, and io
, can be
used to answer most system analysis questions without requiring operating
system implementation knowledge.
20.1. Probes
FBT provides a probe at the boundary of
most functions in the kernel. The boundary of a function is crossed by entering
the function and by returning from the function. FBT thus
provides two functions for every function in the kernel: one upon entry to
the function, and one upon return from the function. These probes are named entry
and return
, respectively. The function
name, and module name are specified as part of the probe. All FBT probes
specify a function name and module name.
20.2. Probe arguments
20.2.1. entry probes
The arguments to entry
probes are the same as the
arguments to the corresponding operating system kernel function. These arguments
may be accessed in a typed fashion by using the args[]
array.
These arguments may be accessed as int64_t
's by using the arg0
.. arg
n variables.
20.2.2. return probes
While a given function only has a single point of entry, it may have
many different points where it returns to its caller. You are usually interested
in either the value that a function returned or the fact that the function
returned at all rather than the specific return path taken. FBT therefore
collects a function's multiple return sites into a single return
probe.
If the exact return path is of interest, you can examine the return
probe args[0]
value, which indicates the offset (in
bytes) of the returning instruction in the function text.
If the function has a return value, the return value is stored in args[1]
. If a function does not have a return value, args[1]
is
not defined.
20.3. Examples
You can use FBT to easily explore the kernel's implementation. The following example
script records the first ioctl(2) from
any xclock
process and then follows the subsequent code
path through the kernel:
/*
* To make the output more readable, we want to indent every function
* entry (and unindent every function return). This is done by
* setting the "flowindent" option.
*/
#pragma D option flowindent
syscall::ioctl:entry
/execname == "xclock" && guard++ == 0/
{
self->traceme = 1;
printf("fd: %d", arg0);
}
fbt:::
/self->traceme/
{}
syscall::ioctl:return
/self->traceme/
{
self->traceme = 0;
exit(0);
}
Running this script results in output similar to the following example:
# dtrace -s ./xioctl.d dtrace: script './xioctl.d' matched 26254 probes CPU FUNCTION 0 => ioctl fd: 3 0 -> ioctl 0 -> getf 0 -> set_active_fd 0 <- set_active_fd 0 <- getf 0 -> fop_ioctl 0 -> sock_ioctl 0 -> strioctl 0 -> job_control_type 0 <- job_control_type 0 -> strcopyout 0 -> copyout 0 <- copyout 0 <- strcopyout 0 <- strioctl 0 <- sock_ioctl 0 <- fop_ioctl 0 -> releasef 0 -> clear_active_fd 0 <- clear_active_fd 0 -> cv_broadcast 0 <- cv_broadcast 0 <- releasef 0 <- ioctl 0 <= ioctl
The output shows that an xclock
process called
ioctl
on a file descriptor that appears to be associated
with a socket.
You can also use FBT when trying to understand kernel
drivers. For example, the ssd(7D) driver
has many code paths by which EIO
may be returned. FBT can be easily used to determine the precise code path that resulted
in an error condition, as shown in the following example:
fbt:ssd::return
/arg1 == EIO/
{
printf("%s+%x returned EIO.", probefunc, arg0);
}
For more information on any one return of EIO
, one
may wish to speculatively trace all fbt
probes, and then
commit
(or discard
) based on the return value
of a specific function. See Speculative Tracing
for details on speculative tracing.
Alternatively, you can use FBT to understand the functions called within a specified module. The following example lists all of the functions called in UFS:
# dtrace -n fbt:ufs::entry'{@a[probefunc] = count()}' dtrace: description 'fbt:ufs::entry' matched 353 probes ^C ufs_ioctl 1 ufs_statvfs 1 ufs_readlink 1 ufs_trans_touch 1 wrip 1 ufs_dirlook 1 bmap_write 1 ufs_fsync 1 ufs_iget 1 ufs_trans_push_inode 1 ufs_putpages 1 ufs_putpage 1 ufs_syncip 1 ufs_write 1 ufs_trans_write_resv 1 ufs_log_amt 1 ufs_getpage_miss 1 ufs_trans_syncip 1 getinoquota 1 ufs_inode_cache_constructor 1 ufs_alloc_inode 1 ufs_iget_alloced 1 ufs_iget_internal 2 ufs_reset_vnode 2 ufs_notclean 2 ufs_iupdat 2 blkatoff 3 ufs_close 5 ufs_open 5 ufs_access 6 ufs_map 8 ufs_seek 11 ufs_addmap 15 rdip 15 ufs_read 15 ufs_rwunlock 16 ufs_rwlock 16 ufs_delmap 18 ufs_getattr 19 ufs_getpage_ra 24 bmap_read 25 findextent 25 ufs_lockfs_begin 27 ufs_lookup 46 ufs_iaccess 51 ufs_imark 92 ufs_lockfs_begin_getpage 102 bmap_has_holes 102 ufs_getpage 102 ufs_itimes_nolock 107 ufs_lockfs_end 125 dirmangled 498 dirbadname 498
If you know the purpose or arguments of a kernel function, you can use FBT to understand
how or why the function is being called. For example, putnext(9F) takes a pointer to a queue(9S) structure as its first member.
The q_qinfo
member of the queue
structure
is a pointer to a qinit(9S) structure.
The qi_minfo
member of the qinit
structure
has a pointer to a module_info(9S) structure, which contains the module name in its mi_idname
member. The following example puts this information together
by using the FBT probe in putnext
to
track putnext(9F) calls
by module name:
fbt::putnext:entry
{
@calls[stringof(args[0]->q_qinfo->qi_minfo->mi_idname)] = count();
}
Running the above script results in output similar to the following example:
# dtrace -s ./putnext.d ^C iprb 1 rpcmod 1 pfmod 1 timod 2 vpnmod 2 pts 40 conskbd 42 kb8042 42 tl 58 arp 108 tcp 126 ptm 249 ip 313 ptem 340 vuid2ps2 361 ttcompat 412 ldterm 413 udp 569 strwhead 624 mouse8042 726
You can also use FBT to determine the time spent in a particular function. The following example shows how to determine the callers of the DDI delaying routines drv_usecwait(9F) and delay(9F).
fbt::delay:entry,
fbt::drv_usecwait:entry
{
self->in = timestamp
}
fbt::delay:return,
fbt::drv_usecwait:return
/self->in/
{
@snoozers[stack()] = quantize(timestamp - self->in);
self->in = 0;
}
This example script is particularly interesting to run during boot. Anonymous Tracing describes the procedure for performing anonymous tracing during system boot. Upon reboot, you might see output similar to the following example:
# dtrace -ae ata`ata_wait+0x34 ata`ata_id_common+0xf5 ata`ata_disk_id+0x20 ata`ata_drive_type+0x9a ata`ata_init_drive+0xa2 ata`ata_attach+0x50 genunix`devi_attach+0x75 genunix`attach_node+0xb2 genunix`i_ndi_config_node+0x97 genunix`i_ddi_attachchild+0x4b genunix`devi_attach_node+0x3d genunix`devi_config_one+0x1d0 genunix`ndi_devi_config_one+0xb0 devfs`dv_find+0x125 devfs`devfs_lookup+0x40 genunix`fop_lookup+0x21 genunix`lookuppnvp+0x236 genunix`lookuppnat+0xe7 genunix`lookupnameat+0x87 genunix`cstatat_getvp+0x134 value ------------- Distribution ------------- count 2048 | 0 4096 |@@@@@@@@@@@@@@@@@@@@@ 4105 8192 |@@@@ 783 16384 |@@@@@@@@@@@@@@ 2793 32768 | 16 65536 | 0 kb8042`kb8042_wait_poweron+0x29 kb8042`kb8042_init+0x22 kb8042`kb8042_attach+0xd6 genunix`devi_attach+0x75 genunix`attach_node+0xb2 genunix`i_ndi_config_node+0x97 genunix`i_ddi_attachchild+0x4b genunix`devi_attach_node+0x3d genunix`devi_config_one+0x1d0 genunix`ndi_devi_config_one+0xb0 genunix`resolve_pathname+0xa5 genunix`ddi_pathname_to_dev_t+0x16 consconfig_dacf`consconfig_load_drivers+0x14 consconfig_dacf`dynamic_console_config+0x6c consconfig`consconfig+0x8 unix`stubs_common_code+0x3b value ------------- Distribution ------------- count 262144 | 0 524288 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 221 1048576 |@@@@ 29 2097152 | 0 usba`hubd_enable_all_port_power+0xed usba`hubd_check_ports+0x8e usba`usba_hubdi_attach+0x275 usba`usba_hubdi_bind_root_hub+0x168 uhci`uhci_attach+0x191 genunix`devi_attach+0x75 genunix`attach_node+0xb2 genunix`i_ndi_config_node+0x97 genunix`i_ddi_attachchild+0x4b genunix`i_ddi_attach_node_hierarchy+0x49 genunix`attach_driver_nodes+0x49 genunix`ddi_hold_installed_driver+0xe3 genunix`attach_drivers+0x28 value ------------- Distribution ------------- count 33554432 | 0 67108864 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3 134217728 | 0
20.4. Tail-call Optimization
When one function ends by calling another function, the compiler can engage in tail-call optimization, in which the function being called reuses the caller's stack frame. This procedure is most commonly used in the SPARC architecture, where the compiler reuses the caller's register window in the function being called in order to minimize register window pressure.
The presence of this optimization causes the return
probe
of the calling function to fire before the entry
probe
of the called function. This ordering can lead to quite a bit of confusion.
For example, if you wanted to record all functions called from a particular
function and any functions that this function calls, you might use the following
script:
fbt::foo:entry
{
self->traceme = 1;
}
fbt:::entry
/self->traceme/
{
printf("called %s", probefunc);
}
fbt::foo:return
/self->traceme/
{
self->traceme = 0;
}
However, if foo
ends in an optimized tail-call,
the tail-called function, and therefore any functions that it calls, will
not be captured. The kernel cannot be dynamically deoptimized on the fly,
and DTrace does not wish to engage in a lie about how code is structured.
Therefore, you should be aware of when tail-call optimization might be used.
Tail-call optimization is likely to be used in source code similar to the following example:
return (bar());
Or in source code similar to the following example:
(void) bar();
return;
Conversely, function source code that ends like the following example cannot have its call to bar
optimized, because
the call to bar
is not a tail-call:
bar();
return (rval);
You can determine whether a call has been tail-call optimized using the following technique:
-
While running DTrace, trace
arg0
of thereturn
probe in question.arg0
contains the offset of the returning instruction in the function. -
After DTrace has stopped, use mdb(1) to look at the function. If the traced offset contains a call to another function instead of an instruction to return from the function, the call has been tail-call optimized.
Due to the instruction set architecture, tail-call optimization is far
more common on SPARC systems than on x86 systems. The following example uses mdb
to discover tail-call optimization in the kernel's dup
function:
# dtrace -q -n fbt::dup:return'{printf("%s+0x%x", probefunc, arg0);}'
While this command is running, run a program that performs a dup(2), such as a bash
process.
The above command should provide output similar to the following example:
dup+0x10 ^C
Now examine the function with mdb
:
# echo "dup::dis" | mdb -k dup: sra %o0, 0, %o0 dup+4: mov %o7, %g1 dup+8: clr %o2 dup+0xc: clr %o1 dup+0x10: call -0x1278 <fcntl> dup+0x14: mov %g1, %o7
The output shows that dup+0x10
is a call to the fcntl
function and not a ret
instruction. Therefore,
the call to fcntl
is an example of tail-call optimization.
20.5. Assembly Functions
You might observe functions that seem to enter but never return or vice versa. Such rare functions are generally hand-coded assembly routines that branch to the middle of other hand-coded assembly functions. These functions should not impede analysis: the branched-to function must still return to the caller of the branched-from function. That is, if you enable all FBT probes, you should see the entry to one function and the return from another function at the same stack depth.
20.6. Instruction Set Limitations
Some functions cannot be instrumented by FBT. The exact nature of uninstrumentable functions is specific to the instruction set architecture.
20.6.1. x86 Limitations
Functions that do not create a stack frame on x86 systems cannot be instrumented by FBT. Because the register set for x86 is extraordinarily small, most functions must put data on the stack and therefore create a stack frame. However, some x86 functions do not create a stack frame and therefore cannot be instrumented. Actual numbers vary, but typically fewer than five percent of functions cannot be instrumented on the x86 platform.
20.6.2. SPARC Limitations
Leaf routines hand-coded in assembly language on SPARC systems cannot be instrumented by FBT. The majority of the kernel is written in C, and all functions written in C can be instrumented by FBT.
20.7. Breakpoint Interaction
FBT works by
dynamically modifying kernel text. Because kernel breakpoints also work by
modifying kernel text, if a kernel breakpoint is placed at an entry or return
site before loading DTrace, FBT will
refuse to provide a probe for the function, even if the kernel breakpoint
is subsequently removed. If the kernel breakpoint is placed after loading
DTrace, both the kernel breakpoint and the DTrace probe will correspond to
the same point in text. In this situation, the breakpoint will trigger first,
and then the probe will fire when the debugger resumes the kernel. It is recommended
that kernel breakpoints not be used concurrently with DTrace. If breakpoints
are required, use the DTrace breakpoint
action instead.
20.8. Module Loading
The illumos kernel can dynamic load and unload kernel modules. When FBT is loaded and a module is dynamically loaded, FBT automatically provides new probes associated with the new module. If a loaded module has unenabled FBT probes, the module may be unloaded; the corresponding probes will be destroyed as the module is unloaded. If a loaded module has enabled FBT probes, the module is considered busy, and cannot be unloaded.
20.9. Stability
The FBT provider uses DTrace's stability mechanism to describe its stabilities, as shown in the following table. For more information about the stability mechanism, see Stability.
Element |
Name stability |
Data stability |
Dependency class |
---|---|---|---|
Provider |
Evolving |
Evolving |
ISA |
Module |
Private |
Private |
Unknown |
Function |
Private |
Private |
Unknown |
Name |
Evolving |
Evolving |
ISA |
Arguments |
Private |
Private |
ISA |
As FBT exposes the kernel implementation, nothing about it is Stable — and the Module and Function name and data stability are explicitly Private. The data stability for Provider and Name are Evolving, but all other data stabilities are Private: they are artifacts of the current implementation. The dependency class for FBT is ISA: while FBT is available on all current instruction set architectures, there is no guarantee that FBT will be available on arbitrary future instruction set architectures.