11203 Support for NVMe drive firmware updates

Review Request #2013 — Created June 21, 2019 and submitted

pwinder
illumos-gate
general
hans, rm

Adds three new sub-commands to nvmeadm:

  • nvmeadm load-firmware - loads firmware files into the drive's holding area.
  • nvmeadm commit-firmware - takes the firmware in the holding area and commits to a slot.
  • nvmeadm activate-firmware - activates a given slot. The firmware in the active slot is loaded by the drive when it is next reset, and becomes the running firmware.

https://www.illumos.org/issues/11203 has an example of how the commands are used.

This process is used by WDC to update NVMe firmware in their IntelliFlash products during manufacturing process.

  • 0
  • 0
  • 27
  • 2
  • 29
Description From Last Updated
seeemef@mac.com
  1. 
      
  2. usr/src/cmd/nvmeadm/nvmeadm.c (Diff revision 1)
     
     
  3. 
      
hans
  1. 
      
  2. usr/src/cmd/nvmeadm/nvmeadm.c (Diff revision 1)
     
     
     
    
      
  3. usr/src/cmd/nvmeadm/nvmeadm.c (Diff revision 1)
     
     
     

    Is there a reason you're not using offset_t and size_t here?

    1. I have changed them both

  4. usr/src/cmd/nvmeadm/nvmeadm.c (Diff revision 1)
     
     

    "multiple"

  5. usr/src/cmd/nvmeadm/nvmeadm_dev.c (Diff revision 1)
     
     
     

    Why did you move this up here?

    1. In case you didn't notice my comment:

      The ioctl which does the firmware commit will provide addition status (SCT and SC codes) in n_arg when it succeeds or fails.
      So, we need to assign res here in case this function exits because ret != 0

  6. 
      
pwinder
  1. 
      
  2. usr/src/cmd/nvmeadm/nvmeadm.c (Diff revision 1)
     
     
     

    offset needs to be > 32 bits for all 32 and 64 bit architectures. The field in the the NVMe command itself is 32 bits, but it is the number of DWORDs. So, theoretically it can be > 32 bits.
    The size could be size_t.

      1. offset_t already typedef-ed as longlong_t (it's 64bit wide independently of the used data model). See in /usr/src/uts/common/sys/types.h, line 242.
        And the last argument of nvme_firmware_load() already has this type.

      2. Please use the "Reply" button to answer questions.

    1. So it is, I'll change it

  3. usr/src/cmd/nvmeadm/nvmeadm.c (Diff revision 1)
     
     

    Fill fix

  4. usr/src/cmd/nvmeadm/nvmeadm_dev.c (Diff revision 1)
     
     
     

    The ioctl which does the firmware commit will provide addition status (SCT and SC codes) in n_arg when it succeeds or fails.
    So, we need to assign res here in case this function exits because ret != 0

  5. 
      
pwinder
seeemef@mac.com
  1. Question about punctuation but otherwise LGTM.

  2. usr/src/cmd/nvmeadm/nvmeadm.c (Diff revision 2)
     
     

    Possibly a full-stop to end the sentence?

  3. 
      
rm
  1. Thanks for putting this together. I have a few notes below. One thing that I think we should do is file a bug for future work so we can actually commit and activate in the same command. I was a little surprised that there was plumbing for NVME_FWC_SAVE_ACTIVATE and NVME_FWC_ACTIVATE_IMMED, but no way to actually trigger them. Also, I feel like we should add a short-cut to list firmware slots rather than going through the log pages. Again, that's as future usability work (not something I'm suggesting you do).

  2. usr/src/cmd/nvmeadm/nvmeadm.c (Diff revision 2)
     
     
    A size_t should use %z, not %l.
  3. usr/src/cmd/nvmeadm/nvmeadm.c (Diff revision 2)
     
     
    errno needs to be set to zero before calling strtoul, you probably also want to check errno afterwards.
  4. usr/src/cmd/nvmeadm/nvmeadm.c (Diff revision 2)
     
     
    Can we spell out this error message please rather than relying on the user knowing the mathematical syntax?
  5. usr/src/cmd/nvmeadm/nvmeadm.c (Diff revision 2)
     
     
    errno needs to be set to zero before calling strtoul, you probably also want to check errno afterwards.
  6. usr/src/cmd/nvmeadm/nvmeadm.c (Diff revision 2)
     
     
    Same slot comment as above.
  7. usr/src/cmd/nvmeadm/nvmeadm_print.c (Diff revision 2)
     
     
    Here and in the other two switch statements, it's not clear what the message log refers to and its not referenced in the manual page at all. Can we improve this error message for users? It'd be rather hard for them to know where to look. While I know you're referring to /var/adm/messages based on the default config, perhaps there's something more detailed we can say?
  8. usr/src/man/man1m/nvmeadm.1m (Diff revision 2)
     
     
    Please update the date in the manual page.
  9. usr/src/man/man1m/nvmeadm.1m (Diff revision 2)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
    The kernel restricts any of these from operating on anything other than namespace zero. I'm not sure it makes sense for us to document that one can specify an optional namespace, as it always impacts the whole controller, no?
  10. usr/src/man/man1m/nvmeadm.1m (Diff revision 2)
     
     
     
     
     
     
     
    It'd really help to indicate what unit these are in. I'm not sure if I expect it to be bytes or if I expect it to be in dwords like in the NVMe spec.
  11. usr/src/man/man1m/nvmeadm.1m (Diff revision 2)
     
     
     
     
     
     
     
     
    We should probalby indicate how to actually list the slots on the device as I think that's the first question I'd have with trying to figure that out.
  12. usr/src/man/man1m/nvmeadm.1m (Diff revision 2)
     
     
     
    I'm not 100% sure where this error message comes from, but as a user it's not the most helpful. While I personally know what this means, it's not clear to most users what action they should next take as a result. If it makes sense to clarify this in the future, then that's fine.
  13. usr/src/uts/common/io/nvme/nvme.c (Diff revision 2)
     
     
     
     
     
     
     
     
     
    It feels like we should actually have a limit on the amount of data that we're asking for from the kernel here to copy in, especially as we're about to do a sleeping allocation. I realize that this isn't wired up right now to something where the arbitrary size is passed in that way, but something to think about.
    1. I'll set a limit on it

  14. usr/src/uts/common/io/nvme/nvme.c (Diff revision 2)
     
     
    Strictly speaking, arithmetic on void * is undefined, but it works with GCC and probably clang. Not sure there's anything to change per se, but just wanted to mention it.
    1. Change it to a caddr

  15. usr/src/uts/common/io/nvme/nvme.c (Diff revision 2)
     
     
    So, I think it's worth explaining in a comment why you're using two pages. I expect that the answer is that you're trying to avoid the SGL issues that would come with larger number of pages, but there could be something else here as well.
    1. Comment added. Yes it is limited to 2 PRPs

  16. usr/src/uts/common/io/nvme/nvme.c (Diff revision 2)
     
     
    Don't we need to validate that action is valid like slot before proceeding with the assignment here?
  17. 
      
pwinder
hans
  1. 
      
  2. usr/src/cmd/nvmeadm/nvmeadm_print.c (Diff revisions 1 - 3)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     

    Nit: end each sentence with a period?

    1. Sure, ok with that!

    2. Actually, I won't add the punctuation here. I will do (and add the period) at the point of formating

  3. 
      
domag02
  1. Format string-argument type mismatches, loss of data errors and code deduplication.
  2. usr/src/cmd/nvmeadm/nvmeadm.c (Diff revision 3)
     
     

    Format string - variable type mismatch: size is unsigned, please use %zu format specifier instead.

  3. usr/src/cmd/nvmeadm/nvmeadm.c (Diff revision 3)
     
     

    Loss of data (value truncation): unsigned long assigned to unsigned int.
    See here: SEI CERT C Coding Standard, INT31-C. Ensure that integer conversions do not result in lost or misinterpreted data

  4. usr/src/cmd/nvmeadm/nvmeadm.c (Diff revision 3)
     
     

    Format string - variable type mismatch: slot is unsigned, please use %u format specifier instead.

  5. usr/src/cmd/nvmeadm/nvmeadm.c (Diff revision 3)
     
     
  6. usr/src/cmd/nvmeadm/nvmeadm.c (Diff revision 3)
     
     

    Same issues.
    See my possible solution in the footer.

  7. usr/src/uts/common/io/nvme/nvme.c (Diff revision 3)
     
     

    Format string - variable type mismatch: nvme->n_pagesize is signed, please use %d format specifier instead.

  8. Because firmware slot numbers handled in multiple functions,
    this part of code should be moved into a new function.
    For example:

    #define NVME_FW_SLOT_MIN    1
    #define NVME_FW_SLOT_MAX    7
    .
    .
    .
    static uint_t get_slot_num(char *);
    .
    .
    .
    /*
     * Convert str to a valid firmware slot number.
     * Accepts decimal, octal or hexadecimal representations.
     */
    static uint_t
    get_slot_num(char *str)
    {
        ulong_t slot;
        char *valend;
    
        errno = 0;
        slot = strtoul(str, &valend, 0);
    
        if (errno != 0 || (slot == 0 && valend == str))
            errx(-1, "numeric slot required");
    
        if (slot < NVME_FW_SLOT_MIN || slot > NVME_FW_SLOT_MAX) {
            errx(-1, "slot must be in the range of %d to %d",
                NVME_FW_SLOT_MIN, NVME_FW_SLOT_MAX);
        }
    
        return ((uint_t)slot);
    }
    

    Or with strtonum(3C):

    #define NVME_FW_SLOT_MIN    1
    #define NVME_FW_SLOT_MAX    7
    .
    .
    .
    static uint_t get_slot_num(char *);
    .
    .
    .
    /*
     * Convert str to a valid firmware slot number.
     * Accepts only decimal representations.
     */
    static uint_t
    get_slot_num(char *str)
    {
        uint_t slot;
        const char *errstr = NULL;
    
        errno = 0;
        slot = (uint_t)strtonum(str, NVME_FW_SLOT_MIN, NVME_FW_SLOT_MAX,
            &errstr);
    
        if (errstr != NULL) {
            errx(-1, "slot number is %s: \"%s\".\n"
                "slot number must be numeric in the range of %d to %d.",
                errstr, str, NVME_FW_SLOT_MIN, NVME_FW_SLOT_MAX);
        }
    
        return (slot);
    }
    

    Note:

    • Value truncation (loss of data) fixed.
    • Format strings fixed.
    • Magic numbers got meaningful names. (C-style)

    New do_firmware_commit() and do_firmware_activate():

    static int
    do_firmware_commit(int fd, const nvme_process_arg_t *npa)
    {
        char *valend;
        uint_t slot;
        uint16_t sct, sc;
    
        if (npa->npa_argc != 1)
            errx(-1, "requires firmware slot number");
    
        slot = get_slot_num(npa->npa_argv[0]);
    
        if (!nvme_firmware_commit(fd, slot, NVME_FWC_SAVE, &sct, &sc))
            errx(-1, "Failed to commit firmware to slot %u: %s",
                slot, nvme_str_error(sct, sc));
    
        if (verbose)
            (void) printf("firmware committed to slot %u\n", slot);
    
        return (0);
    }
    

    static int
    do_firmware_activate(int fd, const nvme_process_arg_t *npa)
    {
        char *valend;
        uint_t slot;
        uint16_t sct, sc;
    
        if (npa->npa_argc != 1)
            errx(-1, "slot number is required");
    
        slot = get_slot_num(npa->npa_argv[0]);
    
        if (!nvme_firmware_commit(fd, slot, NVME_FWC_ACTIVATE, &sct, &sc))
            errx(-1, "Failed to activate slot %u: %s", slot,
                nvme_str_error(sct, sc));
    
        if (verbose)
            printf("Slot %u activated: %s\n", slot,
                nvme_str_error(sct, sc));
    
        return (0);
    }
    

    You can use these example codes as you wish.

pwinder
hans
  1. Ship It!
  2. 
      
seeemef@mac.com
  1. 
      
  2. usr/src/cmd/nvmeadm/nvmeadm_print.c (Diff revision 4)
     
     

    May I suggest hyphen or even dash are wrong, and the former full-stop was correct.

    1. I wanted to avoid the use of full-stop, as this implies the need for a full-stop at the end. And I don't want to place a full-stop at the end in these strings as it forces the punctuation on the formatter. Admittedly I do add the full-stop when it is printed .... but I'd like to leave it as a choice at the point of printing.

  3. usr/src/cmd/nvmeadm/nvmeadm_print.c (Diff revision 4)
     
     

    May I suggest hyphen or even dash are wrong, and the former full-stop was correct.

  4. 
      
domag02
  1. Although the type of the offset variable is aligned to nvme_firmware_load(), this part of code needs more modification to work as expected.

  2. usr/src/cmd/nvmeadm/nvmeadm.c (Diff revision 4)
     
     

    Another type range error, offset variable is signed.
    Just like before, introducing a new helper function might help.
    Example:

    /*
     * Offset (in bytes) must not be larger than
     * UINT32_MAX * sizeof(DWORD)
     */
    #define NVME_FW_OFFSETB_MAX ((u_longlong_t)UINT32_MAX << 2)
    
    /* True, if firmware offset (in bytes) is multiple of 4 */
    #define IS_NVME_FW_OFFSETB_ALIGNED(_off)    (((_off) & 3) == 0)
    .
    .
    .
    static offset_t get_fw_offsetb(char *);
    .
    .
    .
    /*
     * Convert str to a valid firmware offset (in bytes).
     */
    static offset_t
    get_fw_offsetb(char *str)
    {
        u_longlong_t offsetb;
        char *valend;
    
        errno = 0;
        offsetb = strtoull(str, &valend, 0);
        if (errno != 0 || *valend != '\0')
            errx(-1, "offset \"%s\" is invalid", str);
    
        if (!IS_NVME_FW_OFFSETB_ALIGNED(offsetb))
            errx(-1, "offset must be multiple of 4");
    
        if (offsetb > NVME_FW_OFFSETB_MAX)
            errx(-1, "offset is too large");
    
        return ((offset_t)offsetb);
    }
    

    Or if you want to get rid of negation in the input,
    like when strtoull("-18446744073709550616", NULL, 0) will result 1000:

    /*
     * Offset (in bytes) must not be larger than
     * UINT32_MAX * sizeof(DWORD)
     */
    #define NVME_FW_OFFSETB_MAX ((longlong_t)UINT32_MAX << 2)
    
    /* True, if firmware offset (in bytes) is multiple of 4 */
    #define IS_NVME_FW_OFFSETB_ALIGNED(_off)    (((_off) & 3) == 0)
    .
    .
    .
    static offset_t get_fw_offsetb(char *);
    .
    .
    .
    /*
     * Convert str to a valid firmware offset (in bytes).
     */
    static offset_t
    get_fw_offsetb(char *str)
    {
        longlong_t offsetb;
        char *valend;
    
        errno = 0;
        offsetb = strtoll(str, &valend, 0);
        if (errno != 0 || *valend != '\0' || offsetb < 0)
            errx(-1, "offset \"%s\" is invalid", str);
    
        if (!IS_NVME_FW_OFFSETB_ALIGNED(offsetb))
            errx(-1, "offset must be multiple of 4");
    
        if (offsetb > NVME_FW_OFFSETB_MAX)
            errx(-1, "offset is too large");
    
        return ((offset_t)offsetb);
    }
    

    With one of these, do_firmware_load() can be simplified:

    do_firmware_load(int fd, const nvme_process_arg_t *npa)
    {
        int fw_fd;
        ssize_t len;
        offset_t offset = 0;
        size_t size;
        char buf[FIRMWARE_READ_BLKSIZE];
    
        if (npa->npa_argc == 0 || npa->npa_argc > 2)
            errx(-1, "requires firmware file name, and an "
                "optional offset");
    
        if (npa->npa_argc == 2)
            offset = get_fw_offsetb(npa->npa_argv[1]);
    
        fw_fd = open(npa->npa_argv[0], O_RDONLY);
    .
    .
    .
    
  3. 
      
domag02
  1. Cleanup.
  2. usr/src/cmd/nvmeadm/nvmeadm.c (Diff revision 4)
     
     

    Unnecessary to cast offset variable.

  3. 
      
pwinder
rm
  1. Ship It!
  2. 
      
seeemef@mac.com
  1. Ship It!
  2. 
      
pwinder
  1. 
      
  2. usr/src/uts/common/sys/nvme.h (Diff revisions 4 - 5)
     
     

    I have added a missing word "make" ...
    It now reads "Some constants to make verification ... "

  3. 
      
hans
  1. Ship It!
  2. 
      
pwinder
domag02
  1. Ship It!
  2. 
      
seeemef@mac.com
  1. Ship It!
  2. 
      
pwinder
Review request changed

Status: Closed (submitted)

Loading...