7780 mdb could extract NT_PRPSINFO information from core files

Review Request #1478 — Created Feb. 15, 2019 and submitted


This patch add this feature to mdb https://www.illumos.org/issues/7780
It extracts psinfo_t from xdata buffers to present relevant information, also it prints most info that belongs to a NT_PRPSINFO note from elfdump using psinfo_t data.

cneira@Trixie:...illumos-omnios/usr/src/cmd/mdb$   /build/illumos-omnios/proto/root_i386-nd/usr/bin/amd64/mdb  /home/cneira/core
Loading modules: [ libc.so.1 ld.so.1 ]
> ::psinfo
Usage: psinfo [sum: print summary of psinfo_t data | raw: print relevant information of NT_PRPSINFO ]
> ::psinfo raw
        pr_state:   6                   pr_sname:   O
        pr_zomb:    0                   pr_nice:    20
        pr_uid:     100                 pr_gid:     1
        pr_pid:     383885              pr_ppid:    340440
        pr_pgid:    1                   pr_sid:     340440
        pr_addr:    0x00000000          pr_size:    0x6d0
        pr_rssize:  0x38c               pr_wchan:   0
            tv_sec: 1549649231          tv_nsec:    683629550
            tv_sec: 0                   tv_nsec:    1054142
        pr_pri:     59                  pr_oldpri:  40
        pr_cpu:     0
        pr_clname:  TS
        pr_fname:   test
        pr_psargs:  ./test
        pr_syscall: [ SYS#0 ]
            tv_sec: 0                   tv_nsec:    0
        pr_argc:    1                   pr_argv:    0x08039654
        pr_envp:    0x0803965c          pr_wstat:   139
        pr_pctcpu:  0.0%                pr_pctmem:  0.0%
        pr_euid:    100                 pr_egid:    1
        pr_dmodel:  [PR_MODEL_ILP32]
> ::psinfo sum
PID:    383885  (process id)            UID:      100  (real user id)
PPID:   340440  (parent process id)     EUID:     100  (effective user id)
PGID:   383885  (process group id)      GID:        1  (real group id)
SID:    340440  (session id)            EGID:       1  (effective group id)
ZONEID:      0                          CONTRACT:  94
PROJECT:     3                          TASK:      87

START: 2019 Feb  8 15:07:11 (wall timestamp when the process started)
TIME:          0.000000001 seconds   (CPU time used by this process)
CTIME:         0.000000000 seconds   (CPU time used by child processes)
FNAME:  "test"                       (name of the program executed)
PSARGS: "./test"

cneira@Trixie:...illumos-omnios/usr/src/cmd/mdb$   /build/illumos-omnios/proto/root_i386-nd/usr/bin/amd64/mdb  /home/cneira/core.26853
Loading modules: [ libc.so.1 ld.so.1 ]
> ::psinfo sum
PID:     26853  (process id)            UID:      100  (real user id)
PPID:      891  (parent process id)     EUID:     100  (effective user id)
PGID:    26853  (process group id)      GID:        1  (real group id)
SID:       891  (session id)            EGID:       1  (effective group id)
ZONEID:      0                          CONTRACT:  78
PROJECT:     3                          TASK:      69

START: 2019 Jan 30 16:23:46 (wall timestamp when the process started)
TIME:          0.000000009 seconds   (CPU time used by this process)
CTIME:         0.000000000 seconds   (CPU time used by child processes)
FNAME:  "vim"                        (name of the program executed)
PSARGS: "vim t.txt"
> ::psinfo raw
        pr_state:   4                   pr_sname:   T
        pr_zomb:    0                   pr_nice:    20
        pr_uid:     100                 pr_gid:     1
        pr_pid:     26853               pr_ppid:    891
        pr_pgid:    1                   pr_sid:     891
        pr_addr:    0xfffffe12cc0b0070  pr_size:    0x1b88
        pr_rssize:  0xf64               pr_wchan:   0
            tv_sec: 1548876226          tv_nsec:    683326865
            tv_sec: 0                   tv_nsec:    9875073
        pr_pri:     59                  pr_oldpri:  40
        pr_cpu:     0
        pr_clname:  TS
        pr_fname:   vim
        pr_psargs:  vim t.txt
        pr_syscall: [ pollsys ]
            tv_sec: 0                   tv_nsec:    0
        pr_argc:    2                   pr_argv:    0xfffffc7fffdf5c48
        pr_envp:    0xfffffc7fffdf5c60  pr_wstat:   0
        pr_pctcpu:  0.0%                pr_pctmem:  0.0%
        pr_euid:    100                 pr_egid:    1
        pr_dmodel:  [PR_MODEL_LP64]

32 bit core test

cneira@Trixie:~$ file 15\-dlmgmtd\-44
15-dlmgmtd-44:  ELF 32-bit LSB core file 80386 Version 1, from 'dlmgmtd'

cneira@Trixie:~$ /build/illumos-omnios/proto/root_i386-nd/usr/bin/amd64/mdb  /home/cneira/15\-dlmgmtd\-44
Loading modules: [ libc.so.1 libavl.so.1 libnvpair.so.1 libsysevent.so.1 ld.so.1 ]
> ::psinfo
PID:        44  (process id)            UID:        0  (real user id)
PPID:        1  (parent process id)     EUID:      15  (effective user id)
PGID:       44  (process group id)      GID:        0  (real group id)
SID:        44  (session id)            EGID:      65  (effective group id)
ZONEID:      0                          CONTRACT:   8
PROJECT:     0                          TASK:       3

START: 2019 Feb  2 12:27:16   (wall timestamp when the process started)
TIME:  0.21 seconds           (CPU time used by this process)
CTIME: 0.0 seconds           (CPU time used by child processes)
FNAME: dlmgmtd                (name of the program executed)
PSARGS: "/sbin/dlmgmtd"
> ::psinfo -v
        pr_state:   6                   pr_sname:   O
        pr_zomb:    0                   pr_nice:    20
        pr_uid:     0                   pr_gid:     0
        pr_pid:     44          pr_ppid:    1
        pr_pgid:    0                   pr_sid:     44
        pr_addr:    0x0                 pr_size:    0xbd0
        pr_rssize:  0x4fc               pr_wchan:   0
            tv_sec: 1549121236          tv_nsec:    943744935
            tv_sec: 0                   tv_nsec:    21095691
        pr_pri:     60                  pr_oldpri:  39
        pr_cpu:     0
        pr_clname:  TS
        pr_fname:   dlmgmtd
        pr_psargs:  /sbin/dlmgmtd
        pr_syscall: [ lwp_kill ]
            tv_sec: 0                   tv_nsec:    0
        pr_argc:    1                   pr_argv:    0x8047e10
        pr_envp:    0x8047e18           pr_wstat:   134
        pr_pctcpu:  0.0%                pr_pctmem:  0.1%
        pr_euid:    15                  pr_egid:    65
        pr_dmodel:  [PR_MODEL_ILP32]
  • 0
  • 0
  • 60
  • 0
  • 60
Description From Last Updated
  1. Hi Carlos, first, I want to say thanks for putting this together. I know there's a lot of comments below, but there's a lot of things here that are good. Pleae let me know how I can help you work through the feedback. If anything's not clear, please let me know and thanks for taking a first stab at this!

    1. Hi Robert, thank you for taking the time to review this. I'll work on the comments and let you know if I have some doubts on the way. Thanks again!.

  2. usr/src/cmd/mdb/common/mdb/mdb_cmds.c (Diff revision 1)

    Please make sure to run cstyle as part of git pbchk. The way that these functions are put together isn't correct, it should look like:

    static int
    pct(ushort_t pct)

    I haven't commented on this for the other occurrences in this file.

    1. git-pbchk appears clean.

    2. Hmm, that's odd. cstyle suggests that the functions should be formatted otherwise. I'll point them out explicitly again in the next round of review feedback.

  3. usr/src/cmd/mdb/common/mdb/mdb_cmds.c (Diff revision 1)
    If we're calculating this as a uint_t, then why are we returning a signed value? Shouldn't it be an unsigned value?
  4. usr/src/cmd/mdb/common/mdb/mdb_cmds.c (Diff revision 1)

    I'm not sure that I follow all the calculations that you're doing here. I get that you're trying to conver the 16-bit integer floating point value to something else, but it's not clear where 0x7000 comes from or why you do the check on value below. Would you mind writing a comment that better explains the conversion and what you're expecting as output?

    1. More or less the same as prtpct() from usr/src/cmd/ps/ucbps.c: line 830.
      There is an other problem with this function: argument pct remained unused, so the returned value is just based on stack-garbage.

    2. I added the comments that belong to that function.

  5. usr/src/cmd/mdb/common/mdb/mdb_cmds.c (Diff revision 1)
    This should be replaced with a call to proc_sysname(3PROC). Please don't encode the table here.
  6. usr/src/cmd/mdb/common/mdb/mdb_cmds.c (Diff revision 1)
    This should be replaced with a call to proc_sysname(3PROC). Longer term we should put the name in the note, since the numbers aren't stable.
  7. usr/src/cmd/mdb/common/mdb/mdb_cmds.c (Diff revision 1)

    I'm not sure it's worth bothering with leading zeros. Importantly, in a 64-bit process, this can be a 64-bit value so the choice of 8 seems a bit arbitrary, as that only makes sense for a 32-bit process.

  8. usr/src/cmd/mdb/common/mdb/mdb_cmds.c (Diff revision 1)

    tv_sec and tv_nsec are signed values, not unsigned. Also, both of these are technically long values.

  9. usr/src/cmd/mdb/common/mdb/mdb_cmds.c (Diff revision 1)
    See other notes on the choice of 8 leading zeros.
  10. usr/src/cmd/mdb/common/mdb/mdb_cmds.c (Diff revision 1)
    I think with dmodel strings we should do the same with the libproc bits I mention below.
    1. I did not see a way to get the process data model from libproc, at least I have not found it in the man page.

    2. I was suggesting that we add new functionality to libproc to be able to translate and print those values as strings.

    3. That would be great then. Could this be done in a separate code review ? or just do it along these changes.

    4. I just added proc_dmodelname syscall to libproc to return the data model string for the process.

  11. usr/src/cmd/mdb/common/mdb/mdb_cmds.c (Diff revision 1)

    I think in all the cases where we're trying to add new tables, we should look to other ways to do this. For example, a lot of the various signal and fault codes are in libproc, so we should add those there so we don't duplicate these again. If that's something you'd like some help with, let me know. I'd be happy to help with that part. Though maybe we should add the errno to error string to libc.

    1. I have removed the siginfo handling code as the same information was being outputted by ::status, except for errno codes. Should just add the errno to ::psinfo ?.

    2. You can go either way? I think it's fine to include the errno, it can even be a numeric value to begin with.

    3. I'm not including errno on ::psinfo at this point. I would like to modify libproc to return errno codes to strings before.

  12. usr/src/cmd/mdb/common/mdb/mdb_cmds.c (Diff revision 1)

    Note, this structure should generally be constructed as:

    case ILL_ILLOPC:
            code = "ILL_ILLOPC";
  13. usr/src/cmd/mdb/common/mdb/mdb_cmds.c (Diff revision 1)
    With all of these, I think we should be printing out the actual name of the signal we received, not just the subcode. We can get the name of the signal via proc_signame(3PROC).
  14. usr/src/cmd/mdb/common/mdb/mdb_cmds.c (Diff revision 1)
    cstyle nit: The indentation here (and with others) is incorrect, the continuation should be indented four spaces, you have it right with the following line, though the last two lines can be joined together.
  15. usr/src/cmd/mdb/common/mdb/mdb_cmds.c (Diff revision 1)
    I think it's worth a comment describing where you get these values from and how you figured them out.
  16. usr/src/cmd/mdb/common/mdb/mdb_cmds.c (Diff revision 1)

    So, a general note on this. Right now, this has been implemented in a way that it exists for mdb on a process, core file, live kernel, kernel crash dump, and the kernel debugger. However, the process information and having this only makes sense in the first two cases. To deal with this, the dcmd needs to be defined as part of the process target.

    While there's not a great place for this, I'd probalby use the libc module as a place to implement this.

    1. I have moved ::psinfo to libc, thanks Robert.

  17. usr/src/cmd/mdb/common/mdb/mdb_cmds.c (Diff revision 1)
    I would probably consider structuring the arguments for this in a different way. I'd probalby have ::psinfo show the summary version by default and then add a '-v' flag that describes having verbosity. I don't think it makes sense to have to specify an argument to get something.
  18. usr/src/cmd/mdb/common/mdb/mdb_cmds.c (Diff revision 1)
    I would probably declare these variables at the top, with everything else.
  19. usr/src/cmd/mdb/common/mdb/mdb_cmds.c (Diff revision 1)
    So, mdb_warn() has an interesting behavior. If you don't have a new line character at the end ('\n'), then it will append the most recent mdb error. If you were trying to get it in this particular case, then you have to keep in mind that it might have been generated by the first one, but was clobbered by the second call.
  20. usr/src/cmd/mdb/common/mdb/mdb_cmds.c (Diff revision 1)
    Have you tested what happens when you have a core file of a different bitness in mdb? It seems like this is relying on the sizing of these structures being the same as the underlying host type. Also, what if you're debugging a core file where the elf note is larger that the underlying structure? Wouldn't that end up clobbering memory on the stack?
    I suspect that this will impact how many of the members and offsets are printed.
    1. No, I have not tested this scenario. As I'm reading the buffers available from ::xdata, I thought it was safe just to read the amount of bytes that was reported by the previous mdb_get_xdata call.

    2. It is safe to read it that way; however, the structure you're assigning it to may or may not be the right size for that data. When the library is built 64-bit then it will have the 64-bit version of a psinfo_t. If you're reading the data from a 32-bit core file, then the offsets and sizing will not be right, even though it will fit in. If the opposite can happen with a 32-bit library reading from a 64-bit binary, then this could also happen. I'm not sure if mdb does something to make sure that can't happen off hand (it may rexec to match the bitness of the underlying object being dbugged).

    3. Oh, I see I was only testing cores from programs with data model 32 and 64 bits, but the core itself was 64-bit. I'll need to get a 32-bit core to find out.

    4. I just tested with a 32-bit core and it's working fine. I have added the result to the description

  21. usr/src/cmd/mdb/common/mdb/mdb_cmds.c (Diff revision 1)
    Minor stylistic thing, but since these are exclusive options, I'd probably suggest that you use an 'else if' for the second case.
  22. usr/src/cmd/mdb/common/mdb/mdb_cmds.c (Diff revision 1)

    The help message here is supposed to be a short summary of the options. However, you can supply a function that is run when someone asks for full help (via the ::help dcmd) which is where this additional explanation should be.

    1. I have not address this yet, an option would be : -v print raw like NT_PRPSINFO data from psinfo_t, default just relevant fields from psinfo_t data.

  2. Unused variable.
  1. Thanks, this looks pretty good. I only have a few inor follow ups.

  2. Thanks for describing the usage here.
  3. Could remove the extra blank?
  4. The static uint_t value should be on one line and pct_value should begin the next (see all the other functions in this file for an example).

  5. Same thing about the function declaration cstyle as with pct_value.
  6. A couple of notes on this:
    Shouldn't we be usying the buff that we used up above to print the data? Otherwise don't we still have the same problem with printing 8 digits in a 64-bit program case?
    Also, there should be a space between the '10' and '?'.
  7. It appears that we set up buff, but then don't do anything with it.
  8. It seems we're filling in buff, but then not using it.
  9. A couple of style notes: you don't need parens around this line and there should be a space before the '?'.
  10. In this case, I think you actually don't want the '\n' character so that way mdb_warn can include the more detailed error reason that mdb_get_xdata() failed automatically.
  11. I'd still recommend adding a more detailed help message if you'd like to. The help information you had before was useful, just that it should be in the usage function callback that you can define with a dcmd.

  2. usr/src/cmd/mdb/Makefile.module (Diff revision 4)
    Duplicated copyright notice.
  2. Please use SYS2STR_MAX preprocessor constant instead of arbitrary value to set the size of this array.

    This line should be:

    char sysname[SYS2STR_MAX];
    1. Gergő thanks a lot for taking the time to check this out. I'll work on the comments during the week. Thanks again.

  3. Format string signedness mismatch, pr_uid and pr_gid are unsigned values.
    This line should look like:

        mdb_printf("\tpr_uid:     %u\t\t\tpr_gid:     %u\n",
  4. pr_sid is signed.

        mdb_printf("\tpr_pgid:    %u\t\t\tpr_sid:     %d\n",
  5. Type of pr_size is size_t, should be casted to ulong_t:

        mdb_printf("\tpr_addr:    %s%*spr_size:    %#lx\n",
            buff, strlen(buff) > spbcols ? minspaces : (spbcols - bufflen), " ",
  6. pr_rssize also should be casted to ulong_t, and length modifiers are missing.

        mdb_printf("\tpr_rssize:  %#lx\t\tpr_wchan:   %#lx\n",
            (ulong_t)psinfo.pr_rssize, (ulong_t)psinfo.pr_lwp.pr_wchan);
  7. Type of tv_nsec is long and tv_sec also derived from that type.

    (Just to make these type-correct. Of course, I'm aware the practical limit of tv_nsec. In the case of tv_sec, maybe somebody after this date and time: Tue, 19 Jan 2038 03:14:07 GMT [without leap seconds] want to see correct value :) .)

        mdb_printf("\tpr_start:\n\t    tv_sec: %ld\t\ttv_nsec:    %ld\n",
  8.     mdb_printf("\tpr_time:\n\t    tv_sec: %ld\t\t\ttv_nsec:    %ld\n",
            psinfo.pr_time.tv_sec, psinfo.pr_time.tv_nsec);
  9.     mdb_printf("\tpr_ctime:\n\t    tv_sec: %ld\t\t\ttv_nsec:    %ld\n",
            psinfo.pr_ctime.tv_sec, psinfo.pr_ctime.tv_nsec);
  10.     mdb_printf("\tpr_argc:    %d\t\t\tpr_argv:    0x%lx\n",
            psinfo.pr_argc, (ulong_t)psinfo.pr_argv);
  11.     mdb_printf("\tpr_euid:    %u\t\t\tpr_egid:    %u\n",
  12.         "UID:     %4u  (real user id)\n",
  13.     mdb_printf("PGID:   %6d  (process group id)\tGID:     %4u"
  14.     mdb_printf("SID:    %6d  (session id)\t\tEGID:    %4u"
  15.     mdb_snprintf(buff, sizeof (buff), "%ld.%d seconds",
  16. Please remove one from the two whitespace after the = operator.

  17.     mdb_snprintf(buff, sizeof (buff), "%ld.%d seconds",
  18. usr/src/lib/libproc/common/libproc.h (Diff revision 4)

    Typo in comment and reference to SIG2STR_MAX was removed to make this comment shorter.
    (I think this reference should be in the block comment, a few lines above.)

    Trailing comment separated by the rest of the line with a tab, see here: C Style and Coding Standards for SunOS, Section 7.3. Trailing Comments, Page 8 (the rules that cstyle (1onbld) try to check, comes from this document).

    #define DMODELSTR_MAX 32    /* max. string length of data model */
  19. usr/src/lib/libproc/common/mapfile-vers (Diff revision 4)

    This should be between proc_content2str (line 208) and proc_finistdio (should be in alphabetical order).

  20. usr/src/lib/libproc/common/proc_names.c (Diff revision 4)

    See the style of how systable[] initializied (one line - one value, and line indentation).

  21. usr/src/man/man3lib/libproc.3lib (Diff revision 4)

    Typo and should be in in alphabetical order.

    .It Sy Perror_printf Ta Sy proc_arg_grab
    .It Sy proc_arg_psinfo Ta Sy proc_arg_xgrab
    .It Sy proc_arg_xpsinfo Ta Sy proc_content2str
    .It Sy proc_dmodelname Ta Sy proc_finistdio
    .It Sy proc_fltname Ta Sy proc_fltset2str
    .It Sy proc_flushstdio Ta Sy proc_get_auxv
    .It Sy proc_get_cred Ta Sy proc_get_priv
    .It Sy proc_get_psinfo Ta Sy proc_get_status
    .It Sy proc_initstdio Ta Sy proc_lwp_in_set
    .It Sy proc_lwp_range_valid Ta Sy proc_signame
    .It Sy proc_sigset2str Ta Sy proc_str2content
    .It Sy proc_str2flt Ta Sy proc_str2fltset
    .It Sy proc_str2sig Ta Sy proc_str2sigset
    .It Sy proc_str2sys Ta Sy proc_str2sysset
    .It Sy proc_sysname Ta Sy proc_sysset2str
    .It Sy proc_unctrl_psinfo Ta Sy proc_walk
  22. usr/src/man/man3lib/libproc.3lib (Diff revision 4)

    This should be in line 1223 instead of 1236 (should be listed in alphabetical order).

  2. usr/src/lib/libproc/common/libproc.h (Diff revision 5)

    A typo in the comment still persist:

    's/syscalls/data models/'

    (DMODELSTR_MAX is for data model strings and not for syscalls, even if these constans at this moment expands to the same value.)

  3. You don't have to duplicate the same file again.

    Instead, you have to create links to the common manual page (in this case it's proc_fltname.3proc).

    Please, edit these files:

    • usr/src/man/man3proc/Makefile:
    395 395   proc_flushstdio.3proc     := LINKSRC = proc_initstdio.3proc
    396 396   proc_finistdio.3proc      := LINKSRC = proc_initstdio.3proc
    397 397   
        398 + proc_dmodelname.3proc     := LINKSRC = proc_fltname.3proc
    398 399   proc_signame.3proc        := LINKSRC = proc_fltname.3proc
    399 400   proc_sysname.3proc        := LINKSRC = proc_fltname.3proc
    400 401   
    401 402   proc_sigset2str.3proc     := LINKSRC = proc_fltset2str.3proc

    • usr/src/pkg/manifests/system-library.man3proc.inc:
    230 230   link path=usr/share/man/man3proc/proc_arg_xgrab.3proc target=proc_arg_grab.3proc
    231 231   link path=usr/share/man/man3proc/proc_arg_xpsinfo.3proc target=proc_arg_psinfo.3proc
        232 + link path=usr/share/man/man3proc/proc_dmodelname.3proc target=proc_fltname.3proc
    232 233   link path=usr/share/man/man3proc/proc_finistdio.3proc target=proc_initstdio.3proc
    233 234   link path=usr/share/man/man3proc/proc_flushstdio.3proc target=proc_initstdio.3proc
    234 235   link path=usr/share/man/man3proc/proc_free_priv.3proc target=proc_get_priv.3proc

    See in the illumos Developer's Guide, Adding a new manual page section:

    A single manual page may be used for multiple manuals. For example, the manual page...

    1. thank you very much for reviewing !

  1. Thanks, this looks pretty good. Thank you for writing the manual page for the new libproc function. I have a few minor notes, some stylistic, some correctness based. Thank you for all the work on this.

  2. I don't think you want the '\n' on this, that way you can get the underlying mdb error printed out that occurred automatically for you.
  3. usr/src/lib/libproc/common/libproc.h (Diff revision 6)
    Is there a reason this isn't joined to the previous line like others?
    1. There was no reason!, is fixed now.

  4. usr/src/lib/libproc/common/proc_names.c (Diff revision 6)

    So, I think we have an interesting question here. The kernel represents the model usually as a uint_t (see ddi_model_convert_from(9F) for example). However, in procfs we end up representing it as a signed character. That's kind of unfortunate. I guess we should probably keep it as a char to match the procfs version. I might actually increase it to an int to match the other versions as we can always place a char argument in an int.

    1. I agree, change is done.

  5. usr/src/lib/libproc/common/proc_names.c (Diff revision 6)

    A couple of notes here and some questions. In new code, I would pretty much avoid strcpy and always use strlcpy even when it means doing extra work as it'll end up keeping the code safer against refactoring.

    However, is there a reason that you're copying the string into the name buffer to then copy it out to the user buffer? Rather than copying it to an intermediate buffer, you can just use a pointer to index into dmdls, or just keep track of the offset and pass that to strncpy. I think you're basing this off of proc_signame(); however, that case actually is modifying the string to make sure that "SIG" is there where as here we don't have to. Instead, I would model this more off of proc_fltname().

    Finally, I don't think it's correct to use PR_MODEL_UNKNOWN in the case where we don't recognize the value. I would probably treat this like proc_fltname and proc_signame and instead of using dmdls[0], do something like:

    len = snprintf(buf, bufsz, "DMODEL#%d", dmodel);
    1. Yes, the code was based on proc_signame. I agree with not returning PR_MODEL_UNKNOWN as that will be harder to debug without knowing the actual value that was passed to proc_dmodelname.

  1. Thanks for all your work on this, it looks good.

    1. Thanks a lot Robert and Gergő for taking the time to check this out.

    2. You're welcome.
  1. Ship It!
Review request changed

Status: Closed (submitted)