Topo enumeration of NVMe devices

Review Request #2483 - Created Jan. 9, 2020 and updated

Information
Rob Johnston
illumos-gate
11958, 11959
Reviewers
general

These changes extend the disk and pcibus topo plugins to support enumerating NVMe devices in the topo tree. It also provides a platform-specific topo map for the Supermicro SYS-2028U-E1CNRT+ server platform.

Testing was performed on a Supermicro SYS-2028U-E1CNRT+ with a variety of u.2. NVMe devices and an Intel PCIe Add-in card NVMe device. Testing was also performed on a desktop system with an M.2 NVMe device. I verified that the NVMe devices were properly enumerated. The code changes were fully exercised with UMEM_DEBUG enabled and checks were run to verify no memory leaks or other bad memory juju are being introduced by these changes. The code is pbchk-clean and builds cleanly.

Issues

  • 11
  • 14
  • 3
  • 28
Description From Last Updated
It looks like this is missing the EHCI ports. This platform is based on Broadwell, which means that it should ... Robert Mustacchi Robert Mustacchi
Is this actually determined by SPARC and x86? It looks like cmlb actually uses the _FIRMWARE_NEEDS_FDISK and the other macros. ... Robert Mustacchi Robert Mustacchi
It is documented that this can fail. Shouldn't we verify that it actually returns zero? Robert Mustacchi Robert Mustacchi
A couple of high level questions about this function. Right now we call this once for every namespace on the ... Robert Mustacchi Robert Mustacchi
I notice that a lot of other enumeration modules don't call this. Why do we need to do so here? Robert Mustacchi Robert Mustacchi
Should we be putting topo_mod_dprintf's in the property cases here to be consistent with everything else? Robert Mustacchi Robert Mustacchi
Since pgi is only used in the if statement at +800, perhaps we should just scope it there rather than ... Robert Mustacchi Robert Mustacchi
This whole bit is a bit gnarly. Maybe it's worth a function dedicated to this? Robert Mustacchi Robert Mustacchi
So it's not clear to me why we'd expect them to have an MB label. Can you clarify that a ... Robert Mustacchi Robert Mustacchi
Would di_driver_name() be simpler here? It also gets you out of the string allocs. Also, is there a reason we ... Robert Mustacchi Robert Mustacchi
I think it'd be clearer if you just keep track of the size like you do in the other case. ... Robert Mustacchi Robert Mustacchi
Robert Mustacchi

Thanks for putting to this together Rob.

Why did this have to have? Is it because the dependents need to rely on the pgroups now?
This should be topo_usb_metadata.c, not topo_usb_file.c. I think the existing ones need to be updated (I'll look at that). I think topo_usb_file.c was from an older refactoring.
It looks like this is missing the EHCI ports. This platform is based on Broadwell, which means that it should support operating with xhci disabled and falling back on ehci.
Is this actually determined by SPARC and x86? It looks like cmlb actually uses the _FIRMWARE_NEEDS_FDISK and the other macros. Though maybe this is simpler than using the macros in sys/isa_defs.h that seem to control this.
usr/src/lib/fm/topo/modules/common/disk/disk.c (Diff revision 1)
 
 
 
 
Maybe this would be more useful for the disk_enum dprintf to say what it didn't know how to enumerate?
Minor thing, but based on prototypes.c we try to put new lines on either side of the copyright statement and the other block comment.
Elsewhere you use 'AIC' instead of AiC. Maybe pick one for the whole file? Probably the former?
typedef?

typedef?

usr/src/lib/fm/topo/modules/common/disk/disk_nvme.c (Diff revision 1)
 
 
 
 
 
Are target names guaranteed to never have an 's' in them?
Maybe we can make devpath const in this function? It looks like it's read-only.
It is documented that this can fail. Shouldn't we verify that it actually returns zero?
A couple of high level questions about this function.  Right now we call this once for every namespace on the system. But it seems like we don't ever account for the fact that a namespace may not have a blkdev handle attached for a few reasons:

1) We don't support the namespace
2) We don't handle the case where someone uses nvmeadm to detach the namespace.

In particular, if one fails it seems like the whole thing will fail. Is there a reason that this doesn't use the nvme_identify_nsid_t information to try and get this instead of walking devinfo? It seems like getting the capacity for the namespace that way would be better than going via DKIOCGMEDIAINFO or others.
Rather than encode these here can we mkae them macros that sys/nvme.h defines? It seems like it'd be useful in general.
I assume we don't care if this fails because the auth is optional in the hc fmri?

Perhaps this comment should indicate where it happens for the others? Why do we only need an NVME node for the AIC case and not for the U.2, M.2, etc. cases?

I notice that a lot of other enumeration modules don't call this. Why do we need to do so here?
usr/src/lib/fm/topo/modules/common/disk/disk_nvme.c (Diff revision 1)
 
 
 
 
 
 
 
 
Should we fail the whole enumeration if the parent doesn't have a label? What if we're on a PCIe bus without label information (for example emulated NVMe in an HVM)? It seems like we should just ignore ETOPO_PROP_NOENT.
Should we be putting topo_mod_dprintf's in the property cases here to be consistent with everything else?
I believe this cast is unnecessary.
While the existing code ensures that 'path' is always initialized, maybe it's worth initializing it to NULL to make sure that this is safe in the face of a refactor?
With the comment, this becomes a multi-line if statement (even though it's not from a C perspective). Mind putting parens around that?
If this fails, don't we need to call topo_mod_seterrno()?

Here and all the other devfs places, shouldn't we call topo_mod_seterrno() with EMOD_UKNOWN_ENUM or similar? This is true for all the places you use 'goto out'.

Since pgi is only used in the if statement at +800, perhaps we should just scope it there rather than worry about it in the whole function?
This whole bit is a bit gnarly. Maybe it's worth a function dedicated to this?
So it's not clear to me why we'd expect them to have an MB label. Can you clarify that a little bit, at least here? Also, if someone had some NVMe soldered onto the board, would we not want to catch that?

Also, if we require M.2 to have a topo map, what do we do for something like an Intel NUC? Does that mean we won't enumerate the NVMe devices on it?
Would di_driver_name() be simpler here? It also gets you out of the string allocs.

Also, is there a reason we don't use the NVMe class code and sub-class code here like we do for other stuff?
Are we always guaranteed that we're going to have a label for a PCI device? It's not clear to me that virtualized NVMe will hae one per se.
I think it'd be clearer if you just keep track of the size like you do in the other case. I don't think it's obvious to most folks that setting it back to a valid character is loadbearing for the strlen in topo_mod_strfree(), especially since you're restoring it to a different character.
Loading...