11203 Support for NVMe drive firmware updates
Review Request #2013 — Created June 21, 2019 and submitted
Information | |
---|---|
pwinder | |
illumos-gate | |
Reviewers | |
general | |
hans, rm |
Adds three new sub-commands to nvmeadm:
- nvmeadm load-firmware - loads firmware files into the drive's holding area.
- nvmeadm commit-firmware - takes the firmware in the holding area and commits to a slot.
- nvmeadm activate-firmware - activates a given slot. The firmware in the active slot is loaded by the drive when it is next reset, and becomes the running firmware.
https://www.illumos.org/issues/11203 has an example of how the commands are used.
This process is used by WDC to update NVMe firmware in their IntelliFlash products during manufacturing process.
-
-
-
usr/src/cmd/nvmeadm/nvmeadm.c (Diff revision 1) Is there a reason you're not using offset_t and size_t here?
-
-
-
-
usr/src/cmd/nvmeadm/nvmeadm.c (Diff revision 1) offset needs to be > 32 bits for all 32 and 64 bit architectures. The field in the the NVMe command itself is 32 bits, but it is the number of DWORDs. So, theoretically it can be > 32 bits.
The size could be size_t. -
-
usr/src/cmd/nvmeadm/nvmeadm_dev.c (Diff revision 1) The ioctl which does the firmware commit will provide addition status (SCT and SC codes) in n_arg when it succeeds or fails.
So, we need to assign res here in case this function exits because ret != 0
-
Question about punctuation but otherwise LGTM.
-
-
Thanks for putting this together. I have a few notes below. One thing that I think we should do is file a bug for future work so we can actually commit and activate in the same command. I was a little surprised that there was plumbing for NVME_FWC_SAVE_ACTIVATE and NVME_FWC_ACTIVATE_IMMED, but no way to actually trigger them. Also, I feel like we should add a short-cut to list firmware slots rather than going through the log pages. Again, that's as future usability work (not something I'm suggesting you do).
-
-
usr/src/cmd/nvmeadm/nvmeadm.c (Diff revision 2) errno needs to be set to zero before calling strtoul, you probably also want to check errno afterwards.
-
usr/src/cmd/nvmeadm/nvmeadm.c (Diff revision 2) Can we spell out this error message please rather than relying on the user knowing the mathematical syntax?
-
usr/src/cmd/nvmeadm/nvmeadm.c (Diff revision 2) errno needs to be set to zero before calling strtoul, you probably also want to check errno afterwards.
-
-
usr/src/cmd/nvmeadm/nvmeadm_print.c (Diff revision 2) Here and in the other two switch statements, it's not clear what the message log refers to and its not referenced in the manual page at all. Can we improve this error message for users? It'd be rather hard for them to know where to look. While I know you're referring to /var/adm/messages based on the default config, perhaps there's something more detailed we can say?
-
-
usr/src/man/man1m/nvmeadm.1m (Diff revision 2) The kernel restricts any of these from operating on anything other than namespace zero. I'm not sure it makes sense for us to document that one can specify an optional namespace, as it always impacts the whole controller, no?
-
usr/src/man/man1m/nvmeadm.1m (Diff revision 2) It'd really help to indicate what unit these are in. I'm not sure if I expect it to be bytes or if I expect it to be in dwords like in the NVMe spec.
-
usr/src/man/man1m/nvmeadm.1m (Diff revision 2) We should probalby indicate how to actually list the slots on the device as I think that's the first question I'd have with trying to figure that out.
-
usr/src/man/man1m/nvmeadm.1m (Diff revision 2) I'm not 100% sure where this error message comes from, but as a user it's not the most helpful. While I personally know what this means, it's not clear to most users what action they should next take as a result. If it makes sense to clarify this in the future, then that's fine.
-
usr/src/uts/common/io/nvme/nvme.c (Diff revision 2) It feels like we should actually have a limit on the amount of data that we're asking for from the kernel here to copy in, especially as we're about to do a sleeping allocation. I realize that this isn't wired up right now to something where the arbitrary size is passed in that way, but something to think about.
-
usr/src/uts/common/io/nvme/nvme.c (Diff revision 2) Strictly speaking, arithmetic on void * is undefined, but it works with GCC and probably clang. Not sure there's anything to change per se, but just wanted to mention it.
-
usr/src/uts/common/io/nvme/nvme.c (Diff revision 2) So, I think it's worth explaining in a comment why you're using two pages. I expect that the answer is that you're trying to avoid the SGL issues that would come with larger number of pages, but there could be something else here as well.
-
usr/src/uts/common/io/nvme/nvme.c (Diff revision 2) Don't we need to validate that action is valid like slot before proceeding with the assignment here?
-
Format string-argument type mismatches, loss of data errors and code deduplication.
-
usr/src/cmd/nvmeadm/nvmeadm.c (Diff revision 3) Format string - variable type mismatch:
size
is unsigned, please use%zu
format specifier instead. -
usr/src/cmd/nvmeadm/nvmeadm.c (Diff revision 3) Loss of data (value truncation):
unsigned long
assigned tounsigned int
.
See here: SEI CERT C Coding Standard, INT31-C. Ensure that integer conversions do not result in lost or misinterpreted data -
usr/src/cmd/nvmeadm/nvmeadm.c (Diff revision 3) Format string - variable type mismatch:
slot
is unsigned, please use%u
format specifier instead. -
-
usr/src/cmd/nvmeadm/nvmeadm.c (Diff revision 3) Same issues.
See my possible solution in the footer. -
usr/src/uts/common/io/nvme/nvme.c (Diff revision 3) Format string - variable type mismatch:
nvme->n_pagesize
is signed, please use%d
format specifier instead. -
Because firmware slot numbers handled in multiple functions,
this part of code should be moved into a new function.
For example:#define NVME_FW_SLOT_MIN 1 #define NVME_FW_SLOT_MAX 7 . . . static uint_t get_slot_num(char *); . . . /* * Convert str to a valid firmware slot number. * Accepts decimal, octal or hexadecimal representations. */ static uint_t get_slot_num(char *str) { ulong_t slot; char *valend; errno = 0; slot = strtoul(str, &valend, 0); if (errno != 0 || (slot == 0 && valend == str)) errx(-1, "numeric slot required"); if (slot < NVME_FW_SLOT_MIN || slot > NVME_FW_SLOT_MAX) { errx(-1, "slot must be in the range of %d to %d", NVME_FW_SLOT_MIN, NVME_FW_SLOT_MAX); } return ((uint_t)slot); }
Or with strtonum(3C):
#define NVME_FW_SLOT_MIN 1 #define NVME_FW_SLOT_MAX 7 . . . static uint_t get_slot_num(char *); . . . /* * Convert str to a valid firmware slot number. * Accepts only decimal representations. */ static uint_t get_slot_num(char *str) { uint_t slot; const char *errstr = NULL; errno = 0; slot = (uint_t)strtonum(str, NVME_FW_SLOT_MIN, NVME_FW_SLOT_MAX, &errstr); if (errstr != NULL) { errx(-1, "slot number is %s: \"%s\".\n" "slot number must be numeric in the range of %d to %d.", errstr, str, NVME_FW_SLOT_MIN, NVME_FW_SLOT_MAX); } return (slot); }
Note:
- Value truncation (loss of data) fixed.
- Format strings fixed.
- Magic numbers got meaningful names. (C-style)
New
do_firmware_commit()
anddo_firmware_activate()
:static int do_firmware_commit(int fd, const nvme_process_arg_t *npa) { char *valend; uint_t slot; uint16_t sct, sc; if (npa->npa_argc != 1) errx(-1, "requires firmware slot number"); slot = get_slot_num(npa->npa_argv[0]); if (!nvme_firmware_commit(fd, slot, NVME_FWC_SAVE, &sct, &sc)) errx(-1, "Failed to commit firmware to slot %u: %s", slot, nvme_str_error(sct, sc)); if (verbose) (void) printf("firmware committed to slot %u\n", slot); return (0); }
static int do_firmware_activate(int fd, const nvme_process_arg_t *npa) { char *valend; uint_t slot; uint16_t sct, sc; if (npa->npa_argc != 1) errx(-1, "slot number is required"); slot = get_slot_num(npa->npa_argv[0]); if (!nvme_firmware_commit(fd, slot, NVME_FWC_ACTIVATE, &sct, &sc)) errx(-1, "Failed to activate slot %u: %s", slot, nvme_str_error(sct, sc)); if (verbose) printf("Slot %u activated: %s\n", slot, nvme_str_error(sct, sc)); return (0); }
You can use these example codes as you wish.
-
-
usr/src/cmd/nvmeadm/nvmeadm_print.c (Diff revision 4) May I suggest hyphen or even dash are wrong, and the former full-stop was correct.
-
usr/src/cmd/nvmeadm/nvmeadm_print.c (Diff revision 4) May I suggest hyphen or even dash are wrong, and the former full-stop was correct.
-
Although the type of the
offset
variable is aligned tonvme_firmware_load()
, this part of code needs more modification to work as expected. -
usr/src/cmd/nvmeadm/nvmeadm.c (Diff revision 4) Another type range error,
offset
variable is signed.
Just like before, introducing a new helper function might help.
Example:/* * Offset (in bytes) must not be larger than * UINT32_MAX * sizeof(DWORD) */ #define NVME_FW_OFFSETB_MAX ((u_longlong_t)UINT32_MAX << 2) /* True, if firmware offset (in bytes) is multiple of 4 */ #define IS_NVME_FW_OFFSETB_ALIGNED(_off) (((_off) & 3) == 0) . . . static offset_t get_fw_offsetb(char *); . . . /* * Convert str to a valid firmware offset (in bytes). */ static offset_t get_fw_offsetb(char *str) { u_longlong_t offsetb; char *valend; errno = 0; offsetb = strtoull(str, &valend, 0); if (errno != 0 || *valend != '\0') errx(-1, "offset \"%s\" is invalid", str); if (!IS_NVME_FW_OFFSETB_ALIGNED(offsetb)) errx(-1, "offset must be multiple of 4"); if (offsetb > NVME_FW_OFFSETB_MAX) errx(-1, "offset is too large"); return ((offset_t)offsetb); }
Or if you want to get rid of negation in the input,
like whenstrtoull("-18446744073709550616", NULL, 0)
will result 1000:/* * Offset (in bytes) must not be larger than * UINT32_MAX * sizeof(DWORD) */ #define NVME_FW_OFFSETB_MAX ((longlong_t)UINT32_MAX << 2) /* True, if firmware offset (in bytes) is multiple of 4 */ #define IS_NVME_FW_OFFSETB_ALIGNED(_off) (((_off) & 3) == 0) . . . static offset_t get_fw_offsetb(char *); . . . /* * Convert str to a valid firmware offset (in bytes). */ static offset_t get_fw_offsetb(char *str) { longlong_t offsetb; char *valend; errno = 0; offsetb = strtoll(str, &valend, 0); if (errno != 0 || *valend != '\0' || offsetb < 0) errx(-1, "offset \"%s\" is invalid", str); if (!IS_NVME_FW_OFFSETB_ALIGNED(offsetb)) errx(-1, "offset must be multiple of 4"); if (offsetb > NVME_FW_OFFSETB_MAX) errx(-1, "offset is too large"); return ((offset_t)offsetb); }
With one of these,
do_firmware_load()
can be simplified:do_firmware_load(int fd, const nvme_process_arg_t *npa) { int fw_fd; ssize_t len; offset_t offset = 0; size_t size; char buf[FIRMWARE_READ_BLKSIZE]; if (npa->npa_argc == 0 || npa->npa_argc > 2) errx(-1, "requires firmware file name, and an " "optional offset"); if (npa->npa_argc == 2) offset = get_fw_offsetb(npa->npa_argv[1]); fw_fd = open(npa->npa_argv[0], O_RDONLY); . . .
-
-
usr/src/uts/common/sys/nvme.h (Diff revisions 4 - 5) I have added a missing word "make" ...
It now reads "Some constants to make verification ... "
Change Summary:
Changes are because of this email from Gergő Mihály Doma.
"I have some bad news.First, get_slot_number() also affected by the negation in the input.
The second is that when I tested get_slot_number() I found a bug
in the strtoul() implementation.
(See here: Bug #11326 - libc: strtoul() forgets to set errno to EINVAL when no conversion could be performed)Because of these reasons, I recommend the following replacements in get_slot_number():
"ulong_t slot;" with "long slot;" and
"strtoul" with "strtol".Another thing what I noticed is in the error handling:
strtol(), strtoll(), etc. functions can set errno to 'ERANGE' which triggers the "numeric slot required"
error message (in get_slot_number()).
This can be confusing, because the entered value is really numeric, just way too large (or too low).
'get_slot_number()' accepts anything which starts as numeric, but 'get_fw_offsetb()' accepts arguments only
numeric, why are they so different?
* If get_slot_number() going to be more restrictive, then a more appropriate error message will be necessary.
For example, when somebody wants to "multi-slotting" (like "1, 2, 3" with double quotes) now accepted as
slot number 1. In a restrictive version with old error message the user will see this: "numeric slot required".
Maybe something more general error message would be better, just like in 'get_fw_offsetb()': "slot \"%s\" is invalid".Gergő"
There was a question of "multi-slotting". It was never intended to do multi-slotting - only one slot can be specified.
I have changed some of the error messages:
root@omnios-athena:~# nvmeadm load-firmware nvme3 KNGND113.bin -234232
nvmeadm: Offset must be numeric and in the range of 0 to 17179869180
root@omnios-athena:~# nvmeadm load-firmware nvme3 KNGND113.bin 234213443444
nvmeadm: Offset must be numeric and in the range of 0 to 17179869180
root@omnios-athena:~# nvmeadm load-firmware nvme3 KNGND113.bin 2344x
nvmeadm: Offset must be numeric and in the range of 0 to 17179869180
root@omnios-athena:~# nvmeadm load-firmware nvme3 KNGND113.bin dsdsf
nvmeadm: Offset must be numeric and in the range of 0 to 17179869180
root@omnios-athena:~# nvmeadm load-firmware nvme3 KNGND113.bin 23443
nvmeadm: Offset must be multiple of 4root@omnios-athena:~# nvmeadm commit-firmware nvme3 -2312
nvmeadm: Slot must be numeric and in the range of 1 to 7
root@omnios-athena:~# nvmeadm commit-firmware nvme3 xxx
nvmeadm: Slot must be numeric and in the range of 1 to 7
root@omnios-athena:~# nvmeadm commit-firmware nvme3 1x
nvmeadm: Slot must be numeric and in the range of 1 to 7
root@omnios-athena:~# nvmeadm commit-firmware nvme3 1345234513451354341
nvmeadm: Slot must be numeric and in the range of 1 to 7
root@omnios-athena:~# nvmeadm commit-firmware nvme3 "1,2"
nvmeadm: Slot must be numeric and in the range of 1 to 7
root@omnios-athena:~# nvmeadm commit-firmware nvme3 1,2
nvmeadm: Slot must be numeric and in the range of 1 to 7
Diff: |
Revision 6 (+840 -95) |
---|