User Process Tracing
DTrace is an extremely powerful tool for understanding the behavior of user processes. DTrace can be invaluable when debugging, analyzing performance problems, or simply understanding the behavior of a complex application. This chapter focuses on the DTrace facilities relevant for tracing user process activity and provides examples to illustrate their use.
33.1. copyin and copyinstr Subroutines
DTrace's
interaction with processes is a little different than most traditional debuggers
or observability tools. Many such tools appear to execute within the scope
of the process, letting users dereference pointers to program variables directly.
Rather than appearing to execute within or as part of the process itself,
DTrace probes execute in the illumos kernel. To access process data, a probe
needs to use the copyin
or copyinstr
subroutines
to copy user process data into the address space of the kernel.
For example, consider the following write(2) system call:
ssize_t write(int fd, const void *buf, size_t nbytes);
The following D program illustrates an incorrect attempt to print the contents of a string passed to the write(2) system call:
syscall::write:entry
{
printf("%s", stringof(arg1)); /* incorrect use of arg1 */
}
If you try to run this script, DTrace will produce error messages similar to the following example:
dtrace: error on enabled probe ID 1 (ID 37: syscall::write:entry): \ invalid address (0x10038a000) in action #1
The arg1
variable, containing the value of the buf parameter, is an address that refers to memory in the process
executing the system call. To read the string at that address, use the copyinstr
subroutine and record its result with the printf
action:
syscall::write:entry
{
printf("%s", copyinstr(arg1)); /* correct use of arg1 */
}
The output of this script shows all of the strings being passed to the write(2) system call. Occasionally, however, you might see irregular output similar to the following example:
0 37 write:entry mada���
The copyinstr
subroutine acts on an input argument
that is the user address of a null-terminated ASCII string. However, buffers
passed to the write(2) system
call might refer to binary data rather than ASCII strings. To print only as
much of the string as the caller intended, use the copyin
subroutine,
which takes a size as its second argument:
syscall::write:entry
{
printf("%s", stringof(copyin(arg1, arg2)));
}
Notice that the stringof
operator is necessary so
that DTrace properly converts the user data retrieved using copyin
to
a string. The use of stringof
is not necessary when using copyinstr
because this function always returns type string
.
33.1.1. Avoiding Errors
The copyin
and copyinstr
subroutines
cannot read from user addresses which have not yet been touched so even a
valid address may cause an error if the page containing that address has not
yet been faulted in by being accessed. Consider the following example:
# dtrace -n syscall::open:entry'{ trace(copyinstr(arg0)); }' dtrace: description 'syscall::open:entry' matched 1 probe CPU ID FUNCTION:NAME dtrace: error on enabled probe ID 2 (ID 50: syscall::open:entry): invalid address (0x9af1b) in action #1 at DIF offset 52
In the above example output, the application was functioning properly,
and the address in arg0
was valid, but it referred to a
page that had not yet been accessed by the corresponding process. To resolve
this issue, wait for kernel or application to use the data before tracing
it. For example, you might wait until the system call returns to apply copyinstr
, as shown in the following example:
# dtrace -n syscall::open:entry'{ self->file = arg0; }' \ -n syscall::open:return'{ trace(copyinstr(self->file)); self->file = 0; }' dtrace: description 'syscall::open:entry' matched 1 probe CPU ID FUNCTION:NAME 2 51 open:return /dev/null
33.2. Eliminating dtrace(1M) Interference
If you trace every call to the write(2) system call, you will cause
a cascade of output. Each call to write
causes the dtrace(1M) command to call write
as
it displays the output, and so on. This feedback loop is a good example of
how the dtrace
command can interfere with the desired data.
You can use a simple predicate to prevent these unwanted data from being traced:
syscall::write:entry
/pid != $pid/
{
printf("%s", stringof(copyin(arg1, arg2)));
}
The $pid
macro variable expands to the process identifier
of the process that enabled the probes. The pid
variable
contains the process identifier of the process whose thread was running on
the CPU where the probe was fired. Therefore the predicate
/pid != $pid/
ensures that the script does not trace any events related
to the running of this script itself.
33.3. syscall Provider
The syscall
provider enables you to trace every
system call entry and return. System calls can be a good starting point for
understanding a process's behavior, especially if the process seems to be
spending a large amount of time executing or blocked in the kernel. You can
use the prstat(1M) command
to see where processes are spending time:
$ prstat -m -p 31337 PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP 13499 user1 53 44 0.0 0.0 0.0 0.0 2.5 0.0 4K 24 9K 0 mystery/6
This example shows that the process is consuming a large amount of system time. One possible explanation for this behavior is that the process is executing a large number of system calls. You can use a simple D program specified on the command-line to see which system calls are happening most often:
# dtrace -n syscall:::entry'/pid == 31337/{ @syscalls[probefunc] = count(); }' dtrace: description 'syscall:::entry' matched 215 probes ^C open 1 lwp_park 2 times 4 fcntl 5 close 6 sigaction 6 read 10 ioctl 14 sigprocmask 106 write 1092
This report shows which system calls are being called most often, in
this case, the write(2) system
call. You can use the syscall
provider to further examine
the source of all the write
system calls:
# dtrace -n syscall::write:entry'/pid == 31337/{ @writes[arg2] = quantize(arg2); }' dtrace: description 'syscall::write:entry' matched 1 probe ^C value ------------- Distribution ------------- count 0 | 0 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1037 2 |@ 3 4 | 0 8 | 0 16 | 0 32 |@ 3 64 | 0 128 | 0 256 | 0 512 | 0 1024 |@ 5 2048 | 0
The output shows that the process is executing many write
system
calls with a relatively small amount of data. This ratio could be the source
of the performance problem for this particular process. This example illustrates
a general methodology for investigating system call behavior.
33.4. ustack Action
Tracing
a process thread's stack at the time a particular probe is activated is often
useful for examining a problem in more detail. The ustack
action
traces the user thread's stack. If, for example, a process that opens many
files occasionally fails in the open(2) system call, you can use the ustack
action
to discover the code path that executes the failed open
:
syscall::open:entry
/pid == $1/
{
self->path = copyinstr(arg0);
}
syscall::open:return
/self->path != NULL && arg1 == -1/
{
printf("open for '%s' failed", self->path);
ustack();
}
This script also illustrates the use of the $1
macro
variable which takes the value of the first operand specified on the dtrace(1M) command-line:
# dtrace -s ./badopen.d 31337 dtrace: script './badopen.d' matched 2 probes CPU ID FUNCTION:NAME 0 40 open:return open for '/usr/lib/foo' failed libc.so.1`__open+0x4 libc.so.1`open+0x6c 420b0 tcsh`dosource+0xe0 tcsh`execute+0x978 tcsh`execute+0xba0 tcsh`process+0x50c tcsh`main+0x1d54 tcsh`_start+0xdc
The ustack
action records program counter (PC) values for the stack and dtrace(1M) resolves those PC values
to symbol names by looking though the process's symbol tables. If dtrace
can't resolve the PC value to a symbol, it will
print out the value as a hexadecimal integer.
If a process exits or is killed before the ustack
data
is formatted for output, dtrace
might be unable to convert
the PC values in the stack trace to symbol names, and will
be forced to display them as hexadecimal integers. To work around this limitation,
specify a process of interest with the -c
or -p
option
to dtrace
. See dtrace(1M) Utility for details on these and other options. If the
process ID or command is not known in advance, the following example D program
that can be used to work around the limitation:
/*
* This example uses the open(2) system call probe, but this technique
* is applicable to any script using the ustack() action where the stack
* being traced is in a process that may exit soon.
*/
syscall::open:entry
{
ustack();
stop_pids[pid] = 1;
}
syscall::rexit:entry
/stop_pids[pid] != 0/
{
printf("stopping pid %d", pid);
stop();
stop_pids[pid] = 0;
}
The above script stops a process just before it exits if the ustack
action has been applied to a thread in that process. This technique
ensures that the dtrace
command will be able to resolve
the PC values to symbolic names. Notice that the value
of stop_pids[pid]
is set to 0 after it has been used to
clear the dynamic variable. Remember to set stopped processes running again
using the prun(1) command
or your system will accumulate many stopped processes.
33.5. uregs[] Array
The uregs[]
array enables you to access individual user registers. The
following tables list indices into the uregs[]
array corresponding
to each supported illumos system architecture.
Constant |
Register |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Constant |
Register |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
On AMD64 platforms, the uregs
array has the same
content as it does on x86 platforms, plus the additional elements listed in
the following table:
Constant |
Register |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The aliases listed in the following table can be used on all platforms:
Constant |
Register |
---|---|
|
program counter register |
|
stack pointer register |
|
first return code |
|
second return code |
33.6. pid Provider
The pid
provider enables you to trace any instruction in a process.
Unlike most other providers, pid
probes are created on
demand based on the probe descriptions found in your D programs. As a result,
no pid
probes are listed in the output of
dtrace -l
until you have enabled them yourself.
33.6.1. User Function Boundary Tracing
The
simplest mode of operation for the pid
provider is as the
user space analogue to the fbt
provider. The following
example program traces all function entries and returns that are made from
a single function. The $1
macro variable (the first operand
on the command line) is the process ID for the process to trace. The $2
macro
variable (the second operand on the command line) is the name of the function
from which to trace all function calls.
#pragma D option flowindent
pid$1::$2:entry
{
self->trace = 1;
}
pid$1::$2:return
/self->trace/
{
self->trace = 0;
}
pid$1:::entry,
pid$1:::return
/self->trace/
{
}
Type in the above example script and save it in a file named userfunc.d, and then chmod
it to be executable. This script
produces output similar to the following example:
# ./userfunc.d 15032 execute dtrace: script './userfunc.d' matched 11594 probes 0 -> execute 0 -> execute 0 -> Dfix 0 <- Dfix 0 -> s_strsave 0 -> malloc 0 <- malloc 0 <- s_strsave 0 -> set 0 -> malloc 0 <- malloc 0 <- set 0 -> set1 0 -> tglob 0 <- tglob 0 <- set1 0 -> setq 0 -> s_strcmp 0 <- s_strcmp ...
The pid
provider can only be used on processes that
are already running. You can use the $target
macro variable
(see Scripting) and
the dtrace
-c
and -p
options
to create and grab processes of interest and instrument them using DTrace.
For example, the following D script can be used to determine the distribution
of function calls made to libc
by a particular subject
process:
pid$target:libc.so::entry
{
@[probefunc] = count();
}
To determine the distribution of such calls made by the date(1) command, save the script in a file named libc.d and execute the following command:
# dtrace -s libc.d -c date dtrace: script 'libc.d' matched 2476 probes Fri Jul 30 14:08:54 PDT 2004 dtrace: pid 109196 has exited pthread_rwlock_unlock 1 _fflush_u 1 rwlock_lock 1 rw_write_held 1 strftime 1 _close 1 _read 1 __open 1 _open 1 strstr 1 load_zoneinfo 1 ... _ti_bind_guard 47 _ti_bind_clear 94
33.6.2. Tracing Arbitrary Instructions
You can use the pid
provider
to trace any instruction in any user function. Upon demand, the pid
provider
will create a probe for every instruction in a function. The name of each
probe is the offset of its corresponding instruction in the function expressed
as a hexadecimal integer. For example, to enable a probe associated with the
instruction at offset 0x1c in function foo
of module bar.so
in the process with PID 123, you can use the following command:
# dtrace -n pid123:bar.so:foo:1c
To enable all of the probes in the function foo
,
including the probe for each instruction, you can use the command:
# dtrace -n pid123:bar.so:foo:
This command demonstrates an extremely powerful technique for debugging
and analyzing user applications. Infrequent errors can be difficult to debug
because they can be difficult to reproduce. Often, you can identify a problem
after the failure has occurred, too late to reconstruct the code path. The
following example demonstrates how to combine the pid
provider
with speculative tracing (see Speculative Tracing) to solve this problem by tracing every instruction
in a function.
pid$1::$2:entry
{
self->spec = speculation();
speculate(self->spec);
printf("%x %x %x %x %x", arg0, arg1, arg2, arg3, arg4);
}
pid$1::$2:
/self->spec/
{
speculate(self->spec);
}
pid$1::$2:return
/self->spec && arg1 == 0/
{
discard(self->spec);
self->spec = 0;
}
pid$1::$2:return
/self->spec && arg1 != 0/
{
commit(self->spec);
self->spec = 0;
}
Executing errorpath.d results in output similar to the following example:
# ./errorpath.d 100461 _chdir dtrace: script './errorpath.d' matched 19 probes CPU ID FUNCTION:NAME 0 25253 _chdir:entry 81e08 6d140 ffbfcb20 656c73 0 0 25253 _chdir:entry 0 25269 _chdir:0 0 25270 _chdir:4 0 25271 _chdir:8 0 25272 _chdir:c 0 25273 _chdir:10 0 25274 _chdir:14 0 25275 _chdir:18 0 25276 _chdir:1c 0 25277 _chdir:20 0 25278 _chdir:24 0 25279 _chdir:28 0 25280 _chdir:2c 0 25268 _chdir:return