MDB Language Syntax
This chapter describes the MDB language syntax, operators, and rules for command and symbol name resolution.
3.1. Syntax
MDB processes commands from standard input. If standard input is a terminal, MDB provides terminal editing capabilities. MDB can also process commands from macro files and from dcmd pipelines, as described below.
-
Compute the value of an expression. This value typically is a memory address in the target. The current address location is referred to as dot. Use the dot or period character (
.
) to reference the value of the current address. -
Apply a dcmd to the computed address.
A metacharacter is a newline, space, or tab character, or one of the following characters:
[ ] | ! / \ ? = > $ : ;
A blank is a TAB
or
a SPACE
. A word is a sequence
of characters separated by one or more non-quoted metacharacters. . An identifier is a sequence of letters, digits, underbars,
periods, or back quotes beginning with a letter, underbar, or period. Identifiers
are used as the names of symbols, variables, dcmds, and walkers. Commands
are delimited by a NEWLINE
or semicolon ( ;
).
A dcmd is denoted by one of the following words or metacharacters:
/ \ ? = > $character
:character ::identifier
Dcmds named by metacharacters or prefixed by a single dollar sign ($
) or colon character (:
) are provided as built-in
operators. These dcmds implement complete compatibility with the command set
of the legacy adb(1) utility.
After a dcmd has been parsed, the /
, \
, ?
, =
, >
, $
,
and :
characters are no longer recognized as metacharacters
until the termination of the argument list.
A simple-command is a dcmd followed by a sequence of zero or more blank-separated words. The words are passed as arguments to the invoked dcmd, except as specified under Arithmetic Expansion and Quoting.
-
The dcmd succeeded.
-
The dcmd failed.
-
The dcmd was invoked with invalid arguments.
A pipeline is a sequence of
one or more simple-commands, each separated by the vertical bar or pipe character
(|
). After the pipeline has been parsed, each dcmd is invoked
in order from left to right. Each dcmd's output is processed and stored as
described in Dcmd Pipelines. After the
first dcmd in the pipeline is complete, its processed output is used as input
for the second dcmd in the pipeline. When the second dcmd is complete, its
output is used as input for the third dcmd in the pipeline, and so on. If
any dcmd does not return a successful exit status, the pipeline is aborted.
An expression is a sequence of words that is evaluated to compute a 64-bit unsigned integer value. The words are evaluated using the rules described in Arithmetic Expansion.
3.2. Commands
A command is one of the following:
- pipeline [ ! word ... ] [ ; ]
-
A simple-command or pipeline can be optionally followed by the exclamation point or bang character (
!
), indicating that the debugger should open a pipe(2). The standard output of the last dcmd in the MDB pipeline is sent to an external process created by executing$SHELL
-c
followed by the string formed by concatenating the words after the!
character. For more details, refer to Shell Escapes. - expression pipeline [ ! word ... ] [ ; ]
-
A simple-command or pipeline can be prefixed with an expression. Before execution of the pipeline, any occurrence of the dot or period character (
.
) in the pipeline is set to the value of the expression. - expression1 , expression2 pipeline [ ! word ... ] [ ; ]
-
A simple-command or pipeline can be prefixed with two expressions. The value of the first expression is the new value of dot. The value of the second expression is a repeat count for the first dcmd in the pipeline. The first dcmd in the pipeline is executed expression2 times before the next dcmd in the pipeline is executed. The repeat count applies only to the first dcmd in the pipeline.
- , expression pipeline [ ! word ... ] [ ; ]
-
If the first expression is omitted, dot is not modified. The value of the second expression (the expression after the comma character) is used exactly the same way as expression2 above.
- expression [ ! word ... ] [ ; ]
-
A command can consist of only an arithmetic expression. The value of the expression is the new value of dot. The previous dcmd pipeline is re-executed using the new value of dot.
- expression1 , expression2 [ ! word ... ] [ ; ]
-
A command can consist of only a dot expression and repeat count expression. The value of expression1 is the new value of dot. The previous dcmd pipeline is re-executed expression2 times using the new value of dot.
- , expression [ ! word ... ] [ ; ]
-
If the first expression is omitted, dot is not modified. The value of the second expression (the expression after the comma character) is used exactly the same way as expression2 above.
- ! word ... [ ; ]
-
If the command begins with the
!
character, no dcmds are executed. The debugger executes$SHELL
-c
followed by the string formed by concatenating the words after the!
character.
3.3. Comments
A word that begins with two forward slash characters (//
)
causes that word and all the subsequent characters up to a NEWLINE
to
be ignored.
3.4. Arithmetic Expansion
Arithmetic expansion is performed to determine the value of an expression.
MDB commands can be preceded by expressions that represent a start address
or a repeat count. Arithmetic expansion can also be performed to compute a
numeric argument for a dcmd. An expression can appear in an argument list
enclosed in square brackets preceded by a dollar sign ($[ expression ]
). In this case, the expression is replaced by its
arithmetic value.
Expressions can contain any of the following special words:
- integer
-
The specified integer value. Integer values can be prefixed with
0i
or0I
to indicate binary values,0o
or0O
to indicate octal values,0t
or0T
to indicate decimal values, and0x
or0X
to indicate hexadecimal values (the default). - 0[tT][0-9]+.[0-9]+
-
The specified decimal floating point value, converted to its IEEE double-precision floating point representation.
- 'cccccccc'
-
The integer value computed by converting each character to a byte equal to its ASCII value. Up to eight characters can be specified in a character constant. Characters are packed into the integer in reverse order (right-to-left), beginning at the least significant byte.
- <identifier
-
The value of the variable named by identifier.
- identifier
-
The value of the symbol named by identifier.
- (expression)
-
The value of expression.
- .
-
The value of dot.
- &
-
The most recent value of dot used to execute a dcmd.
- +
-
The value of dot incremented by the current increment.
- ^
-
The value of dot decremented by the current increment.
The increment is a global variable that stores the total bytes read by the last formatting dcmd. For more information on the increment, refer to the discussion of Formatting Dcmds.
3.4.1. Unary Operators
Unary operators are right associative and have higher precedence than binary operators. The unary operators are:
- #expression
-
Logical negation
- ~expression
-
Bitwise complement
- -expression
-
Integer negation
- %expression
-
Value of a pointer-sized quantity at the object file location corresponding to virtual address expression in the target's virtual address space
- %/[csil]/expression
-
Value of a char-sized, short-sized, int-sized, or long-sized quantity at the object file location corresponding to virtual address expression in the target's virtual address space
- %/[1248]/expression
-
Value of a one-byte, two-byte, four-byte, or eight-byte quantity at the object file location corresponding to virtual address expression in the target's virtual address space
- *expression
-
Value of a pointer-sized quantity at virtual address expression in the target's virtual address space
- */[csil]/expression
-
Value of a char-sized, short-sized, int-sized, or long-sized quantity at virtual address expression in the target's virtual address space
- */[1248]/expression
-
Value of a one-byte, two-byte, four-byte, or eight-byte quantity at virtual address expression in the target's virtual address space
3.4.2. Binary Operators
Binary operators are left associative and have lower precedence than unary operators. The binary operators, in order of precedence from highest to lowest, are:
*
-
Integer multiplication
%
-
Integer division
#
-
Left-hand side rounded up to next multiple of right-hand side
+
-
Integer addition
-
-
Integer subtraction
<<
-
Bitwise shift left
>>
-
Bitwise shift right
==
-
Logical equality
!=
-
Logical inequality
&
-
Bitwise AND
^
-
Bitwise exclusive OR
|
-
Bitwise inclusive OR
3.5. Quoting
Each metacharacter described in Syntax terminates
a word unless the metacharacter is quoted. Characters can be quoted by enclosing
them in a pair of single quotation marks ('
) or double
quotation marks ("
).. Quoting characters forces MDB to
interpret each character as itself without any special significance. A single
quotation mark cannot appear inside single quotation marks. Inside double
quotation marks, MDB recognizes the C programming language character escape
sequences.
3.6. Shell Escapes
The !
character can be used to create a pipeline
between an MDB command and the user's shell. Shell escapes are available only
when using mdb
and not when using kmdb
.
If the $SHELL
environment variable is set, MDB will fork
and exec
this $SHELL
program for shell escapes. If $SHELL
is not set, /bin/sh
is used. The shell is invoked
with the -c
option followed by a string formed by concatenating
the words after the !
character.
The !
character takes precedence over all other metacharacters,
except semicolon (;
) and NEWLINE
. After
a shell escape is detected, the remaining characters up to the next semicolon
or NEWLINE
are passed “as is” to the shell. The
output of shell commands cannot be piped to MDB dcmds. The output of commands
executed by a shell escape is sent directly to the terminal, not to MDB.
3.7. Variables
A variable is a variable name, a
corresponding integer value, and a set of attributes. A variable name is a
sequence of letters, digits, underbars, or periods. Use the >
dcmd
or ::typeset
dcmd to assign a value to a variable.. Use
the ::typeset
dcmd to manipulate the attributes of a variable.
Each variable's value is represented as a 64-bit unsigned integer. A variable
can have one or more of the following attributes: read-only (cannot be modified
by the user), persistent (cannot be unset by the user), and tagged (user-defined
indicator).
The following variables are defined as persistent:
- 0
-
Most recent value printed using the
/
,\
,?
, or=
dcmd. - 9
-
Most recent count used with the
$<
dcmd. - b
-
Virtual address of the base of the data section.
- cpuid
-
The CPU identifier corresponding to the CPU on which
kmdb
is currently executing. - d
-
Size of the data section in bytes.
- e
-
Virtual address of the entry point.
- hits
-
The count of the number of times the matched software event specifier has been matched. See Event Callbacks.
- m
-
Initial bytes (magic number) of the target's primary object file, or zero if no object file has been read yet.
- t
-
Size of the text section in bytes.
- thread
-
The thread identifier of the current representative thread. The value of the identifier depends on the threading model used by the current target. See Thread Support.
In addition, the MDB kernel and process targets export the current values of the representative thread's register set as named variables. The names of these variables depend on the target's platform and instruction set architecture.
3.8. Symbol Name Resolution
As explained in Syntax, a symbol identifier present in an expression context evaluates to the value of this symbol. The value typically denotes the virtual address of the storage associated with the symbol in the target's virtual address space. A target can support multiple symbol tables including, but not limited to, the following symbol tables:
-
Primary executable symbol table
-
Primary dynamic symbol table
-
Runtime link-editor symbol table
-
Standard and dynamic symbol tables for each of a number of load objects (such as shared libraries in a user process, or kernel modules in the illumos kernel)
The target typically searches the symbol tables of the primary executable first, then one or more of the other symbol tables. Note that ELF symbol tables contain only entries for external, global, and static symbols. Automatic symbols do not appear in the symbol tables processed by MDB.
Additionally, MDB provides a private user-defined symbol table that
is searched prior to any of the target symbol tables. The private symbol table
is initially empty. Use the ::nmadd
and ::nmdel
dcmds
to manipulate the private symbol table.
Use the ::nm
-P
dcmd to display the
contents of the private symbol table. The private symbol table enables you
to create symbol definitions for program functions or data that were either
missing from the original program or stripped out. These definitions are then
used whenever MDB converts a symbolic name to an address, or converts an address
to the nearest symbol.
Because targets contain multiple symbol tables, and each symbol table
can include symbols from multiple object files, different symbols with the
same name can exist. MDB uses the backquote character (`
)
as a symbol-name scoping operator to enable you to obtain the value of the
desired symbol in this situation.
You can specify the scope used to resolve a symbol name as either: object`
name, or file`
name, or object`
file`
name. The object identifier refers to
the name of a load object. The file identifier refers to the basename of a
source file that has a symbol of type STT_FILE
in the symbol
table of the specified object. The object identifier's interpretation depends
on the target type.
The MDB kernel target expects object to specify
the basename of a loaded kernel module. For example, the symbol name specfs`_init
evaluates to the value of the _init
symbol in
the specfs
kernel module.
The mdb
process target expects object to
specify the name of the executable or of a loaded shared library. The value
of object can take any of the following forms:
-
Exact match (that is, a full path name): /usr/lib/libc.so.1
-
Exact basename match:
libc.so.1
-
Initial basename match up to a period or dot character (
.
) suffix:libc.so
orlibc
-
Literal string
a.out
, which is accepted as an alias for the executable
The process target will also accept any of these four forms preceded
by an optional link-map ID (lmid
). The lmid
prefix
is specified by an initial LM
followed by the link-map
id in hexadecimal followed by an additional backquote character (`
).
For example, the symbol name LM0`libc.so.1`_init
evaluates
to the value of the _init
symbol in the libc.so.1 library
that is loaded on link-map 0 (LM_ID_BASE
). The link-map
specifier might be necessary to resolve symbol naming conflicts if the same
library is loaded on more than one link map. For more information on link
maps, refer to the Linker and Libraries Guide and the dlopen(3C) manual page. Link-map identifiers
are displayed when symbols are printed according to the setting of the showlmid
option, as described in Summary of Command-line Options.
In the case of a naming conflict between symbols and hexadecimal integer
values, MDB attempts to evaluate an ambiguous token as a symbol first, before
evaluating it as an integer value. For example, the token f
can
refer either to the decimal integer value 15
specified
in hexadecimal (the default base), or to a global variable named f
in
the target's symbol table. If a symbol can have an ambiguous name, use an
explicit 0x
or 0X
prefix to specify
the integer value.
3.9. Dcmd and Walker Name Resolution
As described earlier, each MDB dmod provides a set of dcmds and walkers. Dcmds and walkers are tracked in two distinct, global namespaces. MDB also keeps track of a dcmd and walker namespace associated with each dmod. Identically named dcmds or walkers within a given dmod are not allowed. A dmod with this type of naming conflict will fail to load.
Name conflicts between dcmds or walkers from different dmods are allowed in the global namespace. In the case of a conflict, the first dcmd or walker with that particular name to be loaded is given precedence in the global namespace. Alternate definitions are kept in a list in load order.
Use the backquote
character (`
) in a dcmd or walker name as a scoping operator
to select an alternate definition. For example, if dmods m1
and m2
each provide a dcmd d
, and m1
is
loaded prior to m2
, then you can use the scoping operator
as shown below to specify the dcmd you want:
::d
-
Executes
m1
's definition ofd
::m1`d
-
Executes
m1
's definition ofd
::m2`d
-
Executes
m2'
s definition ofd
If module m1
is unloaded, the next dcmd on the global definition list (m2`d
) is promoted to global visibility. Use the ::which
dcmd
to determine the current definition of a dcmd or walker. Use the ::which
-v
dcmd to display the global definition list.
3.10. Dcmd Pipelines
Use the vertical bar (|
) operator to pipeline dcmds.
The purpose of a pipeline is to pass values from one dcmd or walker to another.
The values passed usually are virtual addresses. Pipeline stages might be
used to map a pointer from one type of data structure to a pointer to a corresponding
data structure, to sort a list of addresses, or to select the addresses of
structures with certain properties.
MDB executes each dcmd in the pipeline in order from left to right.
The left-most dcmd is executed using the current value of dot, or using the
value specified by an explicit expression at the start of the command. A pipe
operator (|
) causes MDB to create a shared buffer between
the output of the dcmd to its left and the MDB parser, and an empty list of
values.
As the dcmd executes, its standard output is placed in the pipe and
then consumed and evaluated by the parser, as if MDB were reading this data
from standard input. Each line must consist of an arithmetic expression terminated
by a NEWLINE
or semicolon (;
). The value
of the expression is appended to the list of values associated with the pipe.
If a syntax error is detected, the pipeline is aborted.
When the dcmd to the left of a |
operator completes,
the list of values associated with the pipe is then used to invoke the dcmd
to the right of the |
operator. For each value in the list,
dot is set to this value, and the right-hand dcmd is executed. Only the output
of the right-most dcmd in the pipeline is written to standard output. If any
dcmd in the pipeline produces output to standard error, these messages are
written directly to standard error and are not processed as part of the pipeline.
3.11. Formatting Dcmds
The /
, \
, ?
,
and =
metacharacters are used to denote the special output
formatting dcmds. Each of these dcmds accepts an argument list consisting
of one or more format characters, repeat counts, or quoted strings. A format
character is one of the ASCII characters described below.
Format characters are used to read and format data from the target.
A repeat count is a positive integer preceding the format character that is
always interpreted in base 10 (decimal). A repeat count can also be specified
as an expression enclosed in square brackets preceded by a dollar sign ($[ ]
). A string argument must be enclosed in double quotation
marks (" "
). No blanks are necessary between format
arguments.
The formatting dcmds are:
/
-
Display data from the target's virtual address space starting at the virtual address specified by dot.
\
-
Display data from the target's physical address space starting at the physical address specified by dot.
?
-
Display data from the target's primary object file starting at the object file location corresponding to the virtual address specified by dot.
=
-
Display the value of dot in each of the specified data formats. The
=
dcmd is useful for converting between bases and performing arithmetic.
In addition to dot, MDB keeps track of another global value called the increment. The increment represents the distance between dot and the address following all the data read by the last formatting dcmd.
For example, let dot equal address addr,
where addr displays as a 4-byte integer. After
a formatting dcmd is executed with dot equal to addr,
the increment is set to 4
. The plus (+
)
operator, described in Arithmetic Expansion,
would now evaluate to the value A+4
, and could be used
to reset dot to the address of the next data object for a subsequent dcmd.
Most format characters increase
the value of the increment by the number of bytes corresponding to the size
of the data format. The number of bytes in various data formats are shown
below. Use the ::formats
dcmd to display the list of format
characters from within MDB.
The format characters are:
- +
-
Increment dot by the count (variable size)
- -
-
Decrement dot by the count (variable size)
- B
-
Hexadecimal int (1 byte)
- C
-
Character using C character notation (1 byte)
- D
-
Decimal signed int (4 bytes)
- E
-
Decimal unsigned long long (8 bytes)
- F
-
Double (8 bytes)
- G
-
Octal unsigned long long (8 bytes)
- H
-
Swap bytes and shorts (4 bytes)
- I
-
Address and disassembled instruction (variable size)
- J
-
Hexadecimal long long (8 bytes)
- K
-
Hexadecimal uintptr_t (4 or 8 bytes)
- N
-
Newline
- O
-
Octal unsigned int (4 bytes)
- P
-
Symbol (4 or 8 bytes)
- Q
-
Octal signed int (4 bytes)
- R
-
Binary int (8 bytes)
- S
-
String using C string notation (variable size)
- T
-
Horizontal tab
- U
-
Decimal unsigned int (4 bytes)
- V
-
Decimal unsigned int (1 byte)
- W
-
Default radix unsigned int (4 bytes)
- X
-
Hexadecimal int (4 bytes)
- Y
-
Decoded time32_t (4 bytes)
- Z
-
Hexadecimal long long (8 bytes)
- ^
-
Decrement dot by increment * count (variable size)
- a
-
Dot as symbol+offset
- b
-
Octal unsigned int (1 byte)
- c
-
Character (1 byte)
- d
-
Decimal signed short (2 bytes)
- e
-
Decimal signed long long (8 bytes)
- f
-
Float (4 bytes)
- g
-
Octal signed long long (8 bytes)
- h
-
Swap bytes (2 bytes)
- i
-
Disassembled instruction (variable size)
- n
-
Newline
- o
-
Octal unsigned short (2 bytes)
- p
-
Symbol (4 or 8 bytes)
- q
-
Octal signed short (2 bytes)
- r
-
Whitespace
- s
-
Raw string (variable size)
- t
-
Horizontal tab
- u
-
Decimal unsigned short (2 bytes)
- v
-
Decimal signed int (1 byte)
- w
-
Default radix unsigned short (2 bytes)
- x
-
Hexadecimal short (2 bytes)
- y
-
Decoded time64_t (8 bytes)
You can also use the /
, \
,
and ?
formatting dcmds to write to the target's virtual
address space, physical address space, or object file. First specify one of
the following modifiers as the first format character, and then specify a
list of words. The words in the list are either immediate values or expressions
enclosed in square brackets preceded by a dollar sign ($[ ]
).
The write modifiers are:
- v
-
Write the lowest byte of the value of each expression to the target beginning at the location specified by dot
- w
-
Write the lowest 2 bytes of the value of each expression to the target beginning at the location specified by dot
- W
-
Write the lowest 4 bytes of the value of each expression to the target beginning at the location specified by dot
- Z
-
Write the complete 8 bytes of the value of each expression to the target beginning at the location specified by dot
You can also use the /
, \
,
and ?
formatting dcmds to search for a particular integer
value in the target's virtual address space, physical address space, and object
file, respectively. First specify one of the following modifiers as the first
format character, and then specify a value and optional mask. The value and
mask are each either immediate values or expressions enclosed in square brackets
preceded by a dollar sign.
If only a value is specified, MDB reads integers of the appropriate
size and stops at the address that contains the matching value. If a value V
and mask M
are specified, MDB reads integers
of the appropriate size and stops at the address that contains a value X
where (X & M) == V
. At the completion of
the dcmd, dot is updated to the address of the match. If no match is found,
dot is left at the last address that was read.
The search modifiers are:
- l
-
Search for the specified 2-byte value
- L
-
Search for the specified 4-byte value
- M
-
Search for the specified 8-byte value
For both user and kernel targets, an address space is typically composed of a set of discontiguous segments. It is not legal to read from an address that does not have a corresponding segment. If a search reaches a segment boundary without finding a match, the search aborts when the read past the end of the segment boundary fails.