Debugging With the Kernel Memory Allocator
The illumos kernel memory (kmem) allocator provides a powerful set of
debugging features that can facilitate analysis of a kernel crash dump. This
chapter discusses these debugging features, and the MDB dcmds and walkers
designed specifically for the allocator. Bonwick (see Related Books and Papers) provides an overview
of the principles of the allocator itself. Refer to the header file
the definitions of allocator data structures. The kmem debugging features
can be enabled on a production system to enhance problem analysis, or on development
systems to aid in debugging kernel software and device drivers.
MDB exposes kernel implementation details that are subject to change at any time. This guide reflects the illumos kernel implementation as of the date of publication of this guide. Information provided in this guide about the kernel memory allocator might not be correct or applicable to past or future illumos releases.
9.1. Getting Started: Creating a Sample Crash Dump
This section shows you how to obtain a sample crash dump, and how to invoke MDB in order to examine it.
9.1.1. Setting kmem_flags
The kernel memory allocator contains many advanced debugging features, but these are not enabled by default because they can cause performance degradation. In order to follow the examples in this guide, you should turn on these features. You should enable these features only on a test system, as they can cause performance degradation or expose latent problems.
The allocator's debugging functionality is controlled by the
kmem_flags tunable. To get started, make sure
# mdb -k > kmem_flags/X kmem_flags: kmem_flags: f
kmem_flags is not set to “
you should add the line:
/etc/system and reboot the system. When the system
reboots, confirm that
kmem_flags is set to “
Remember to remove your
/etc/system modifications before
returning this system to production use.
9.1.2. Forcing a Crash Dump
next step is to make sure crash dumps are properly configured. First, confirm
dumpadm is configured to save kernel crash dumps and
savecore is enabled. See dumpadm(1M) for more information on crash
# dumpadm Dump content: kernel pages Dump device: /dev/dsk/c0t0d0s1 (swap) Savecore directory: /var/crash/testsystem Savecore enabled: yes
Next, reboot the system
-d flag to reboot(1M), which forces the kernel
to panic and save a crash dump.
# reboot -d Sep 28 17:51:18 testsystem reboot: rebooted by root panic[cpu0]/thread=70aacde0: forced crash dump initiated at user request 401fbb10 genunix:uadmin+55c (1, 1, 0, 6d700000, 5, 0) %l0-7: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ...
When the system reboots, make sure the crash dump succeeded:
$ cd /var/crash/testsystem $ ls bounds unix.0 unix.1 vmcore.0 vmcore.1
If the dump is missing from your dump directory, it could be that the partition is out of space. You can free up space and run savecore(1M) manually as root to subsequently save the dump. If your dump directory contains multiple crash dumps, the one you just created will be the unix.[n] and vmcore.[n] pair with the most recent modification time.
9.1.3. Starting MDB
mdb on the crash dump you created, and check
$ mdb unix.1 vmcore.1 Loading modules: [ unix krtld genunix ip nfs ipc ] > ::status debugging crash dump vmcore.1 (32-bit) from testsystem operating system: 5.10 Generic (sun4u) panic message: forced crash dump initiated at user request
In the examples presented in this guide, a crash dump from a 32-bit kernel is used. All of the techniques presented here are applicable to a 64-bit kernel, and care has been taken to distinguish pointers (sized differently on 32- and 64-bit systems) from fixed-sized quantities, which are invariant with respect to the kernel data model.
An UltraSPARC workstation was used to generate the example presented. Your results can vary depending on the architecture and model of system you use.
9.2. Allocator Basics
The kernel memory allocator's job is to parcel out regions of virtual memory to other kernel subsystems (these are commonly called clients). This section explains the basics of the allocator's operation and introduces some terms used later in this guide.
9.2.1. Buffer States
The functional domain of the kernel memory allocator is the set of buffers of virtual memory that make up the kernel heap. These buffers are grouped together into sets of uniform size and purpose, known as caches. Each cache contains a set of buffers. Some of these buffers are currently free, which means that they have not yet been allocated to any client of the allocator. The remaining buffers are allocated, which means that a pointer to that buffer has been provided to a client of the allocator. If no client of the allocator holds a pointer to an allocated buffer, this buffer is said to be leaked, because it cannot be freed. Leaked buffers indicate incorrect code that is wasting kernel resources.
A kmem transaction is a transition on a buffer between the allocated and free states. The allocator can verify that the state of a buffer is valid as part of each transaction. Additionally, the allocator has facilities for logging transactions for post-mortem examination.
9.2.3. Sleeping and Non-Sleeping Allocations
the Standard C Library's malloc(3C) function,
the kernel memory allocator can block (or sleep), waiting
until enough virtual memory is available to satisfy the client's request.
This is controlled by the
flag parameter to
kmem_alloc(9F). A call
which has the
KM_SLEEP flag set can never fail; it will block forever waiting
for resources to become available.
9.2.4. Kernel Memory Caches
The kernel memory allocator divides the memory it manages into
a set of caches. All allocations are supplied from these
caches, which are represented by the
structure. Each cache has a fixed buffer size, which
represents the maximum allocation size satisfied by that cache. Each cache
has a string name indicating the type of data it manages.
Some kernel memory caches are special purpose and are initialized
to allocate only a particular kind of data structure. An example of this
is the “thread_cache,” which allocates only structures of type
Memory from these caches is allocated to clients by the
function and freed by the
not public DDI interfaces. Do NOT write code that relies on them, because
they are subject to change or removal in future releases of illumos.
whose name begins with “
the kernel's general memory allocation scheme. These caches provide memory
to clients of kmem_alloc(9F)
Each of these caches satisfies requests whose size is between the buffer size of that
cache and the buffer
size of the next smallest cache. For example, the kernel has
kmem_alloc_16 caches. In this case, the
kmem_alloc_16 cache handles all client requests for 9-16 bytes of
memory. Remember that the size of each buffer in the
is 16 bytes, regardless of the size of the client request. In a 14 byte request,
two bytes of the resulting buffer are unused, since the request is satisfied
The last set of caches are those used internally by the kernel memory
allocator for its own bookkeeping. These include those caches whose names
start with “
kmem_magazine_” or “
kmem_bufctl_cache and others.
9.3. Kernel Memory Caches
This section explains how to find and examine kernel memory caches.
You can learn about the various kmem caches on the system by issuing
> ::kmastat cache buf buf buf memory alloc alloc name size in use total in use succeed fail ------------------------- ------ ------ ------ --------- --------- ----- kmem_magazine_1 8 24 1020 8192 24 0 kmem_magazine_3 16 141 510 8192 141 0 kmem_magazine_7 32 96 255 8192 96 0 ... kmem_alloc_8 8 3614 3751 90112 9834113 0 kmem_alloc_16 16 2781 3072 98304 8278603 0 kmem_alloc_24 24 517 612 24576 680537 0 kmem_alloc_32 32 398 510 24576 903214 0 kmem_alloc_40 40 482 584 32768 672089 0 ... thread_cache 368 107 126 49152 669881 0 lwp_cache 576 107 117 73728 182 0 turnstile_cache 36 149 292 16384 670506 0 cred_cache 96 6 73 8192 2677787 0 ...
If you run
::kmastat you get a feel for what a “normal”
system looks like. This will help you to spot excessively large caches on
systems that are leaking memory. The results of
vary depending on the system you are running on, how many processes are running,
and so forth.
Another way to list the various kmem caches is with the
> ::kmem_cache ADDR NAME FLAG CFLAG BUFSIZE BUFTOTL 70036028 kmem_magazine_1 0020 0e0000 8 1020 700362a8 kmem_magazine_3 0020 0e0000 16 510 70036528 kmem_magazine_7 0020 0e0000 32 255 ... 70039428 kmem_alloc_8 020f 000000 8 3751 700396a8 kmem_alloc_16 020f 000000 16 3072 70039928 kmem_alloc_24 020f 000000 24 612 70039ba8 kmem_alloc_32 020f 000000 32 510 7003a028 kmem_alloc_40 020f 000000 40 584 ...
This command is useful because it maps cache names to addresses, and
provides the debugging flags for each cache in the FLAG column. It is important
to understand that the allocator's selection of debugging features is derived
on a per-cache basis from this set of flags. These are set in conjunction
with the global
kmem_flags variable at cache creation time.
kmem_flags while the system is running has no
effect on the debugging behavior, except for subsequently created caches (which
is rare after boot-up).
Next, walk the list of kmem caches directly using MDB's
> ::walk kmem_cache 70036028 700362a8 70036528 700367a8 ...
This produces a list of pointers that correspond to each kmem
cache in the kernel. To find out about a specific cache, apply the
> 0x70039928$<kmem_cache 0x70039928: lock 0x70039928: owner/waiters 0 0x70039930: flags freelist offset 20f 707c86a0 24 0x7003993c: global_alloc global_free alloc_fail 523 0 0 0x70039948: hash_shift hash_mask hash_table 5 1ff 70444858 0x70039954: nullslab 0x70039954: cache base next 70039928 0 702d5de0 0x70039960: prev head tail 707c86a0 0 0 0x7003996c: refcnt chunks -1 0 0x70039974: constructor destructor reclaim 0 0 0 0x70039980: private arena cflags 0 104444f8 0 0x70039994: bufsize align chunksize 24 8 40 0x700399a0: slabsize color maxcolor 8192 24 32 0x700399ac: slab_create slab_destroy buftotal 3 0 612 0x700399b8: bufmax rescale lookup_depth 612 1 0 0x700399c4: kstat next prev 702c8608 70039ba8 700396a8 0x700399d0: name kmem_alloc_24 0x700399f0: bufctl_cache magazine_cache magazine_size 70037ba8 700367a8 15 ...
Several important fields for debugging include
name. The name of the
kmem_cache (in this case
kmem_alloc_24”) indicates its purpose in the
system. The field
bufsize indicates the size of each buffer in this cache; in this case,
the cache is used for allocations of size 24 and smaller.
flags indicates what
debugging features are turned on for this cache. You can find the debugging
flags listed in
<sys/kmem_impl.h>. In this case
0x20f, which is
KMF_AUDIT | KMF_DEADBEEF | KMF_REDZONE | KMF_CONTENTS | KMF_HASH.
This document explains each of the debugging features in subsequent sections.
When you are interested in looking at buffers in a particular cache, you can walk the allocated and freed buffers in that cache directly:
> 0x70039928::walk kmem 704ba010 702ba008 704ba038 702ba030 ... > 0x70039928::walk freemem 70a9ae50 70a9ae28 704bb730 704bb2f8 ...
MDB provides a shortcut to supplying the cache address to the kmem walker: a specific walker is provided for each kmem cache, and its name is the same as the name of the cache. For example:
> ::walk kmem_alloc_24 704ba010 702ba008 704ba038 702ba030 ... > ::walk thread_cache 70b38080 70aac060 705c4020 70aac1e0 ...
Now you know how to iterate over the kernel memory allocator's internal
data structures and examine the most important members of the
kmem_cache data structure.
9.4. Detecting Memory Corruption
One of the primary debugging facilities of the allocator is that it includes algorithms to recognize data corruption quickly. When corruption is detected, the allocator immediately panics the system. This section describes how the allocator recognizes data corruption. You must understand this to be able to debug these problems.
Writing past the end of a buffer
Accessing uninitialized data
Continuing to use a freed buffer
Corrupting kernel memory
Keep these problems in mind as you read the next three sections. They will help you to understand the allocator's design, and enable you to diagnose problems more efficiently.
9.4.1. Freed Buffer Checking: 0xdeadbeef
0x2) bit is set in
the flags field of a
kmem_cache, the allocator tries to
make memory corruption easy to detect by writing a special pattern into all
freed buffers. This pattern is
0xdeadbeef. Since a typical
region of memory contains both allocated and freed memory, sections of each
kind of block will be interspersed. The following example is from the kmem_alloc_24 cache:
0x70a9add8: deadbeef deadbeef 0x70a9ade0: deadbeef deadbeef 0x70a9ade8: deadbeef deadbeef 0x70a9adf0: feedface feedface 0x70a9adf8: 70ae3260 8440c68e 0x70a9ae00: 5 4ef83 0x70a9ae08: 0 0 0x70a9ae10: 1 bbddcafe 0x70a9ae18: feedface 139d 0x70a9ae20: 70ae3200 d1befaed 0x70a9ae28: deadbeef deadbeef 0x70a9ae30: deadbeef deadbeef 0x70a9ae38: deadbeef deadbeef 0x70a9ae40: feedface feedface 0x70a9ae48: 70ae31a0 8440c54e
The buffers at
0xdeadbeefdeadbeef, which shows that these
buffers are free. The buffer redzones are filled with
0xfeedfacefeedface, which indicates they are untouched (no buffer
overrun has occurred). See the following section for an explanation of redzones.
0x70a9ae00 an allocated buffer is located between the
two free buffers.
9.4.2. Redzone: 0xfeedface
Note the pattern
0xfeedface in the buffer shown
in the previous section. This pattern is known as the redzone indicator.
This pattern enables the allocator (and a programmer debugging a problem)
to determine whether the boundaries of a buffer have been violated. Following
the redzone is some additional information. The content of that data depends
on other factors (see Memory Allocation Logging).
The redzone and its suffix are collectively called the buftag region. The Redzone summarizes this
The buftag is appended to each buffer in a cache when any of the
is set in that buffer's cache. The content of the buftag depends on whether
KMF_AUDIT is set.
Decomposing the memory region presented above into distinct buffers is now simple:
0x70a9add8: deadbeef deadbeef \ 0x70a9ade0: deadbeef deadbeef +- User Data (free) 0x70a9ade8: deadbeef deadbeef / 0x70a9adf0: feedface feedface -- REDZONE 0x70a9adf8: 70ae3260 8440c68e -- Debugging Data 0x70a9ae00: 5 4ef83 \ 0x70a9ae08: 0 0 +- User Data (allocated) 0x70a9ae10: 1 bbddcafe / 0x70a9ae18: feedface 139d -- REDZONE 0x70a9ae20: 70ae3200 d1befaed -- Debugging Data 0x70a9ae28: deadbeef deadbeef \ 0x70a9ae30: deadbeef deadbeef +- User Data (free) 0x70a9ae38: deadbeef deadbeef / 0x70a9ae40: feedface feedface -- REDZONE 0x70a9ae48: 70ae31a0 8440c54e -- Debugging Data
The buffers at
0xdeadbeefdeadbeef, which shows that these
buffers are free. The buffer redzones are filled with
0xfeedfacefeedface, which indicates they are untouched (no buffer overrun has occurred).
Buffer is allocated but uninitialized (see Uninitialized Data: 0xbaddcafe).
Buffer is free.
Buffer limits were respected (no overflow).
In the allocated buffer beginning at
the situation is different. Recall from Allocator Basics that there are two allocation types:
The client requested memory using kmem_cache_alloc(9F), in which case the size of the requested buffer is equal to the bufsize of the cache.
The client requested memory using kmem_alloc(9F), in which case the size of the requested buffer is less than or equal to the bufsize of the cache. For example, a request for 20 bytes will be fulfilled from the kmem_alloc_24 cache. The allocator enforces the buffer boundary by placing a marker, the redzone byte, immediately following the client data:
0x70a9ae00: 5 4ef83 \ 0x70a9ae08: 0 0 +- User Data (allocated) 0x70a9ae10: 1 bbddcafe / 0x70a9ae18: feedface 139d -- REDZONE 0x70a9ae20: 70ae3200 d1befaed -- Debugging Data
0xfeedface value at
followed by a 32-bit word containing what seems to be a random value. This
number is actually an encoded representation of the size of the buffer. To
decode this number and find the size of the allocated buffer, use the formula:
size = redzone_value / 251
So, in this example,
size = 0x139d / 251 = 20 bytes.
This indicates that the buffer requested was of size 20 bytes. The allocator
performs this decoding operation and finds that the redzone byte should be
at offset 20. The redzone byte is the hex pattern
which is present at
0x729084e4 (0x729084d0 + 0t20) as expected.
Redzone Byte shows the general form of this memory layout.
If the allocation size is the same as the bufsize of the cache, the redzone byte overwrites the first byte of the redzone itself, as shown in Redzone Byte at the Beginning of the Redzone.
This overwriting results in the first 32-bit word of the redzone being
0xfeedfabb depending on the endianness
of the hardware on which the system is running.
Why is the allocation size encoded this way? To encode the size, the allocator uses the formula (251 * size + 1). When the size decode occurs, the integer division discards the remainder of '+1'. However, the addition of 1 is valuable because the allocator can check whether the size is valid by testing whether (size % 251 == 1). In this way, the allocator defends against corruption of the redzone byte index.
9.4.3. Uninitialized Data: 0xbaddcafe
You might be wondering
what the suspicious
0xbbddcafe at address
0x729084d4 was before the redzone byte got placed over the first byte in the word.
0xbaddcafe. When the
is set in the cache, allocated but uninitialized memory
is filled with the
0xbaddcafe pattern. When the allocator
performs an allocation, it loops across the words of the buffer and verifies
that each word contains
0xdeadbeef, then fills that word
A system can panic with a message such as:
panic[cpu1]/thread=e1979420: BAD TRAP: type=e (Page Fault) rp=ef641e88 addr=baddcafe occurred in module "unix" due to an illegal access to a user address
In this case, the address that caused the fault was
the panicking thread has accessed some data that was never initialized.
9.4.4. Associating Panic Messages With Failures
The kernel memory allocator emits panic messages corresponding to the failure modes described earlier. For example, a system can panic with a message such as:
kernel memory allocator: buffer modified after being freed modification occurred at offset 0x30
The allocator was able to detect this case because it tried to validate
that the buffer in question was filled with
0x30, this condition was not met. Since this
condition indicates memory corruption, the allocator panicked the system.
Another example failure message is:
kernel memory allocator: redzone violation: write past end of buffer
The allocator was able to detect this case because it tried to validate
that the redzone byte (
0xbb) was in the location it determined
from the redzone size encoding. It failed to find the signature byte in the
correct location. Since this indicates memory corruption, the allocator panicked
the system. Other allocator panic messages are discussed later.
9.5. Memory Allocation Logging
This section explains the logging features of the kernel memory allocator and how you can employ them to debug system crashes.
9.5.1. Buftag Data Integrity
As explained earlier, the second half of each buftag contains extra information about the corresponding buffer. Some of this data is debugging information, and some is data private to the allocator. While this auxiliary data can take several different forms, it is collectively known as “Buffer Control” or bufctl data.
However, the allocator needs to know whether a buffer's bufctl pointer is valid, since this pointer might also have been corrupted by malfunctioning code. The allocator confirms the integrity of its auxiliary pointer by storing the pointer and an encoded version of that pointer, and then cross-checking the two versions.
As shown in Extra Debugging Data in the Buftag, these pointers
are the bcp (buffer control pointer) and bxstat (buffer
control XOR status). The allocator arranges bcp and bxstat so that the expression
bcp XOR bxstat equals a well-known value.
In the event that one or both of these pointers becomes corrupted, the
allocator can easily detect such corruption and panic the system. When a buffer
bcp XOR bxstat = 0xa110c8ed (“allocated”).
When a buffer is free,
bcp XOR bxstat = 0xf4eef4ee (“freefree”).
You might find it helpful to re-examine the example provided in Freed Buffer Checking: 0xdeadbeef, in order to confirm that the buftag pointers shown there are consistent.
In the event that the allocator finds a corrupt buftag, it panics the system and produces a message similar to the following:
kernel memory allocator: boundary tag corrupted bcp ^ bxstat = 0xffeef4ee, should be f4eef4ee
Remember, if bcp is corrupt, it is still possible to retrieve its value
by taking the value of
bxstat XOR 0xf4eef4ee or
bxstat XOR 0xa110c8ed, depending on whether the buffer is allocated or
9.5.2. The bufctl Pointer
The buffer control (bufctl) pointer contained
in the buftag region can have different meanings, depending on the cache's
kmem_flags. The behavior toggled by the
is of particular interest: when the KMF_AUDIT flag is not set,
the kernel memory allocator allocates a
for each buffer. This structure contains some minimal accounting information
about each buffer. When the
KMF_AUDIT flag is set,
the allocator instead allocates a
an extended version of the
This section presumes the
KMF_AUDIT flag is set.
For caches that do not have this bit set, the amount of available debugging
information is reduced.
short) contains additional information about the last transaction that occurred
on this buffer. The following example shows how to apply the
macro to examine an audit record. The buffer shown is the example
buffer used in Detecting Memory Corruption:
> 0x70a9ae00,5/KKn 0x70a9ae00: 5 4ef83 0 0 1 bbddcafe feedface 139d 70ae3200 d1befaed
Using the techniques presented above, it is easy to see that
bufctl_audit record: it is the first pointer following the redzone.
To examine the
bufctl_audit record it points to, apply the
> 0x70ae3200$<bufctl_audit 0x70ae3200: next addr slab 70378000 70a9ae00 707c86a0 0x70ae320c: cache timestamp thread 70039928 e1bd0e26afe 70aac4e0 0x70ae321c: lastlog contents stackdepth 7011c7c0 7018a0b0 4 0x70ae3228: kmem_zalloc+0x30 pid_assign+8 getproc+0x68 cfork+0x60
addr field is the address of the buffer corresponding to this
bufctl_audit record. This is the original address:
cache field points at the kmem_cache that allocated this buffer. You
can use the
::kmem_cache dcmd to examine it as follows:
> 0x70039928::kmem_cache ADDR NAME FLAG CFLAG BUFSIZE BUFTOTL 70039928 kmem_alloc_24 020f 000000 24 612
timestamp field represents the time this transaction occurred.
This time is expressed in the same manner as gethrtime(3C).
thread is a pointer to the thread that performed the last transaction
on this buffer. The
contents pointers point to locations
in the allocator's transaction logs. These logs are
discussed in detail in Allocator Logging Facility.
Typically, the most useful piece of information provided by
is the stack trace recorded at the point at which the transaction took place. In this
transaction was an allocation called as part of executing
9.6. Advanced Memory Analysis
This section describes facilities for performing advanced memory analysis, including locating memory leaks and sources of data corruption.
9.6.1. Finding Memory Leaks
provides powerful and efficient detection of memory leaks in kernel crash
dumps where the full set of kmem debug features has been enabled. The first
::findleaks processes the dump for memory
leaks (this can take a few minutes), and then coalesces the leaks by the allocation
stack trace. The findleaks report shows a bufctl address and the topmost
stack frame for each memory leak that was identified:
> ::findleaks CACHE LEAKED BUFCTL CALLER 70039ba8 1 703746c0 pm_autoconfig+0x708 70039ba8 1 703748a0 pm_autoconfig+0x708 7003a028 1 70d3b1a0 sigaddq+0x108 7003c7a8 1 70515200 pm_ioctl+0x187c ------------------------------------------------------ Total 4 buffers, 376 bytes
Using the bufctl pointers, you can obtain the complete stack backtrace
of the allocation by applying the
> 70d3b1a0$<bufctl_audit 0x70d3b1a0: next addr slab 70a049c0 70d03b28 70bb7480 0x70d3b1ac: cache timestamp thread 7003a028 13f7cf63b3 70b38380 0x70d3b1bc: lastlog contents stackdepth 700d6e60 0 5 0x70d3b1c8: kmem_alloc+0x30 sigaddq+0x108 sigsendproc+0x210 sigqkill+0x90 kill+0x28
The programmer can usually use the
and the allocation stack trace to quickly track down the code path that leaks
the given buffer.
9.6.2. Finding References to Data
When trying to diagnose a memory corruption problem, you should
know what other kernel entities hold a copy of a particular pointer. This
is important because it can reveal which thread accessed a data structure
after it was freed. It can also make it easier to understand what kernel
entities are sharing knowledge of a particular (valid) data item. The
::kgrep dcmds can be used to answer these
questions. You can apply
::whatis to a value of interest:
> 0x705d8640::whatis 705d8640 is 705d8640+0, allocated from streams_mblk
In this case,
0x705d8640 is revealed to be a pointer
to a STREAMS
mblk structure. To see the entire allocation
> 0x705d8640::whatis -a 705d8640 is 705d8640+0, allocated from streams_mblk 705d8640 is 705d8000+640, allocated from kmem_va_8192 705d8640 is 705d8000+640 from kmem_default vmem arena 705d8640 is 705d2000+2640 from kmem_va vmem arena 705d8640 is 705d2000+2640 from heap vmem arena
This reveals that the allocation also appears in the
kmem_va_8192 cache is a kmem cache that is fronting the
kmem_va vmem arena. It also shows the full stack of vmem allocations.
The complete list of kmem caches and vmem arenas is displayed by the
::kmastat dcmd. You can use
::kgrep to locate
other kernel addresses that contain a pointer to this
This illustrates the hierarchical nature of memory allocations in the system;
in general, you can determine the type of object referred to by the given
address from the name of the most specific kmem cache.
> 0x705d8640::kgrep 400a3720 70580d24 7069d7f0 706a37ec 706add34
and investigate them by applying
> 400a3720::whatis 400a3720 is in thread 7095b240's stack > 706add34::whatis 706add34 is 706add20+14, allocated from streams_dblk_120
Here one pointer is located on the stack of a known kernel thread, and
another is the
mblk pointer inside of the corresponding
9.6.3. Finding Corrupt Buffers With ::kmem_verify
implements most of the same checks that the kmem allocator does at runtime.
::kmem_verify can be invoked in order to scan every kmem
cache with appropriate
kmem_flags, or to examine a particular
Here is an example of using
::kmem_verify to isolate a problem:
> ::kmem_verify Cache Name Addr Cache Integrity kmem_alloc_8 70039428 clean kmem_alloc_16 700396a8 clean kmem_alloc_24 70039928 1 corrupt buffer kmem_alloc_32 70039ba8 clean kmem_alloc_40 7003a028 clean kmem_alloc_48 7003a2a8 clean ...
It is easy to see here that the kmem_alloc_24 cache
::kmem_verify believes to be a problem.
With an explicit cache argument, the
provides more detailed information about the problem:
> 70039928::kmem_verify Summary for cache 'kmem_alloc_24' buffer 702babc0 (free) seems corrupted, at 702babc0
The next step is to examine the buffer which
to be corrupt:
> 0x702babc0,5/KKn 0x702babc0: 0 deadbeef deadbeef deadbeef deadbeef deadbeef feedface feedface 703785a0 84d9714e
The reason that
::kmem_verify flagged this buffer
is now clear: The first word in the buffer (at
should probably be filled with the
not with a
0. At this point, examining the
for this buffer might yield clues about what code recently wrote
to the buffer, indicating where and when it was freed.
Another useful technique in this situation is to use
search the address space for references to address
in order to discover what threads or data structures are still holding references
to this freed data.
9.6.4. Allocator Logging Facility
set for a cache, the kernel memory allocator maintains a log that records
the recent history of its activity. This transaction log records
bufctl_audit records. If the
KMF_AUDIT and the
KMF_CONTENTS flags are both set, the allocator generates a
contents log that records portions of the actual contents of allocated and
freed buffers. The structure and use of the contents log is outside the scope
of this document. The transaction log is discussed in this section.
MDB provides several facilities for displaying the transaction
log. The simplest is
::walk kmem_log, which prints out
the transaction in the log as a series of
> ::walk kmem_log 70128340 701282e0 70128280 70128220 701281c0 ... > 70128340$<bufctl_audit 0x70128340: next addr slab 70ac1d40 70bc4ea8 70bb7c00 0x7012834c: cache timestamp thread 70039428 e1bd7abe721 70aacde0 0x7012835c: lastlog contents stackdepth 701282e0 7018f340 4 0x70128368: kmem_cache_free+0x24 nfs3_sync+0x3c vfs_sync+0x84 syssync+4
A more elegant way to view the entire transaction
log is by using the
> ::kmem_log CPU ADDR BUFADDR TIMESTAMP THREAD 0 70128340 70bc4ea8 e1bd7abe721 70aacde0 0 701282e0 70bc4ea8 e1bd7aa86fa 70aacde0 0 70128280 70bc4ea8 e1bd7aa27dd 70aacde0 0 70128220 70bc4ea8 e1bd7a98a6e 70aacde0 0 701281c0 70d03738 e1bd7a8e3e0 70aacde0 ... 0 70127140 70cf78a0 e1bd78035ad 70aacde0 0 701270e0 709cf6c0 e1bd6d2573a 40033e60 0 70127080 70cedf20 e1bd6d1e984 40033e60 0 70127020 70b09578 e1bd5fc1791 40033e60 0 70126fc0 70cf78a0 e1bd5fb6b5a 40033e60 0 70126f60 705ed388 e1bd5fb080d 40033e60 0 70126f00 705ed388 e1bd551ff73 70aacde0 ...
The output of
::kmem_log is sorted in descending
order by timestamp. The
ADDR column is the
structure corresponding to that transaction;
to the actual buffer.
These figures represent transactions on buffers (both allocations and frees). When a particular buffer is corrupted, it can be helpful to locate that buffer in the transaction log, then determine in which other transactions the transacting thread was involved. This can help to assemble a picture of the sequence of events that occurred prior to and after the allocation (or free) of a buffer.
You can employ the
to filter the output of walking the transaction log. The
command filters the buffers in the transaction log by buffer
address. This example filters on buffer
> ::walk kmem_log | ::bufctl -a 0x70b09578 ADDR BUFADDR TIMESTAMP THREAD CALLER 70127020 70b09578 e1bd5fc1791 40033e60 biodone+0x108 70126e40 70b09578 e1bd55062da 70aacde0 pageio_setup+0x268 70126de0 70b09578 e1bd52b2317 40033e60 biodone+0x108 70126c00 70b09578 e1bd497ee8e 70aacde0 pageio_setup+0x268 70120480 70b09578 e1bd21c5e2a 70aacde0 elfexec+0x9f0 70120060 70b09578 e1bd20f5ab5 70aacde0 getelfhead+0x100 7011ef20 70b09578 e1bd1e9a1dd 70aacde0 ufs_getpage_miss+0x354 7011d720 70b09578 e1bd1170dc4 70aacde0 pageio_setup+0x268 70117d80 70b09578 e1bcff6ff27 70bc2480 elfexec+0x9f0 70117960 70b09578 e1bcfea4a9f 70bc2480 getelfhead+0x100 ...
This example illustrates that a particular buffer can be used in numerous transactions.
Remember that the kmem transaction log is an incomplete record of the transactions made by the kernel memory allocator. Older entries in the log are evicted as needed in order to keep the size of the log constant.
::freedby dcmds provide a convenient
way to summarize transactions associated with a particular thread. Here is
an example of listing the recent allocations performed by thread
> 0x70aacde0::allocdby BUFCTL TIMESTAMP CALLER 70d4d8c0 e1edb14511a allocb+0x88 70d4e8a0 e1edb142472 dblk_constructor+0xc 70d4a240 e1edb13dd4f allocb+0x88 70d4e840 e1edb13aeec dblk_constructor+0xc 70d4d860 e1ed8344071 allocb+0x88 70d4e7e0 e1ed8342536 dblk_constructor+0xc 70d4a1e0 e1ed82b3a3c allocb+0x88 70a53f80 e1ed82b0b91 dblk_constructor+0xc 70d4d800 e1e9b663b92 allocb+0x88
bufctl_audit records, you can understand
the recent activities of a particular thread.