11220 Want support for hyper-v

Review Request #1990 - Created June 8, 2019 and updated

Information
Andy Fiddaman
illumos-gate
master
11220
2c869fb...
Reviewers
general

Author: George Wilson george.wilson@delphix.com
Portions Contributed by: Brad Lewis brad.lewis@delphix.com
Portions Contributed by: Pavel Zakharov pavel.zakharov@delphix.com

For reference, this wad includes:

From Delphix:

  • DLPX-52519 DxOS needs support for Hyper-V drivers
  • DLPX-52831 guid is leaked in storvsc_config_one
  • AZ-307 rem_drv of hv_storvsc driver hangs
  • AZ-308 Storvsc driver needs polling capability.
  • ATA reads timeout after Azure update
  • Fix missing mutex_exit() calls.

And a couple of very minor updates from OmniOS to cater for changes in gate:

  • Adjust hyperv for new C99 Makefile macros
  • Remove duplicate definition of __packed
  • Do not package 32bit kernel drivers

Caveats:

This batch represents the initial code drop from Delphix. It works generally well, but it has the following known issues:

  • Panics if booted in a VM with a single virtual CPU
  • Panics if booted with low memory (< 1GiB)
  • Occasionally panics on first boot
  • Time-sync does not function and error messages are logged
  • Console output is too verbose

These are fixed in OmniOS and will be addressed in follow-up issues/RTIs.

In addition to being used to run Delphix in Azure, These drivers have been running in various OmniOS development VMs under Hyper-V on Windows 10 since they were integrated.

Issues

  • 111
  • 0
  • 0
  • 111
Description From Last Updated
I don't see why this (and other THIRDPARTYLICENSE stuff) is requiring an exception. x void x void
Why this line is needed? x void x void
depending x void x void
Isn't $(i386_ONLY) already assumed by the arch definition above? x void x void
This manifest seems to be missing the third party licenses. x void x void
This manifest seems to be missing the third party licenses. x void x void
What is $(SUBARCH_DIR)? x void x void
Seems like extra line. x void x void
Remove. x void x void
Here and below -- these aren't helping anything as they aren't properly expanded, remove? x void x void
Any reason we can't clean this up for new code? Is it smatch clean? Robert Mustacchi Robert Mustacchi
Presumably we can delete this now? Robert Mustacchi Robert Mustacchi
This and the unlock should probably be written to just use flock(3C). Robert Mustacchi Robert Mustacchi
Feels like this should be an unsigned value. Robert Mustacchi Robert Mustacchi
Are we guaranteed that there will never be gaps in the service space and that this won't miss things? Robert Mustacchi Robert Mustacchi
Shouldn't we actually use a bcopy here or make sure that the rest of the bytes are set to a ... Robert Mustacchi Robert Mustacchi
Either this function is right or all the others are wrong, but this is using value_size as a uint32_t, but ... Robert Mustacchi Robert Mustacchi
It feels like we should memset the other bytes here, otherwise they're just random stack garbage and could leak out ... Robert Mustacchi Robert Mustacchi
This just seems messed up. We're treating the input key size and value size as how much data we should ... Robert Mustacchi Robert Mustacchi
Shouldn't this have key_size/value_size checks? Robert Mustacchi Robert Mustacchi
Not checking for memory allocation failure. Robert Mustacchi Robert Mustacchi
Not checking for malloc failure. The problem is that we're using NULL as a sentinel for there being no link ... Robert Mustacchi Robert Mustacchi
What if strlcat would overflow the buffer that we have? Shouldn't we do something about that case? Robert Mustacchi Robert Mustacchi
Maybe we should use routing sockets to not have to shell out awkwardly to commands? Maybe route get will be ... Robert Mustacchi Robert Mustacchi
If this and others overflow, shouldn't we actually not try and execute the command? Robert Mustacchi Robert Mustacchi
This deserves comments as to what it's trying to actually accomplish. It's hard to evaluate its correctness. Robert Mustacchi Robert Mustacchi
At least the buffer size should be sizeof (tmp) and the next bit. Robert Mustacchi Robert Mustacchi
What if these strl* things end up overflowing meaning we haven't actually properly copied things over. Robert Mustacchi Robert Mustacchi
Is this explicitly trying to do interface name globbing? The comments suggest it's not, this means that we should probably ... Robert Mustacchi Robert Mustacchi
It feels like this would be much clearer if we just did the simple thing to find where the last ... Robert Mustacchi Robert Mustacchi
Other functions are returning the errno. This is returning the EAI keys. Robert Mustacchi Robert Mustacchi
Please don't use raw strcpy on buffers. This is a daemon, it should probably be more rigorous. Robert Mustacchi Robert Mustacchi
len needs to be an ssize_t to properly match pread. Robert Mustacchi Robert Mustacchi
Erm, should we really continue executing in the fact of unknown options? This is going to make it really hard ... Robert Mustacchi Robert Mustacchi
Unchecked malloc. Robert Mustacchi Robert Mustacchi
What does pread have to do with anything here? Robert Mustacchi Robert Mustacchi
Erm, do we really want to use an errno to signify this? It doesn't seem consistent with other uses of ... Robert Mustacchi Robert Mustacchi
Is there anything we should assume about the kernel side when we receive a malformed request? Robert Mustacchi Robert Mustacchi
So, depending on the error condition, how does this not turn into an infinite loop? Robert Mustacchi Robert Mustacchi
Do we really want to assume that the default path is sane for this? Should we either set path or ... Robert Mustacchi Robert Mustacchi
Should we actually try to match the exact string 'dhcp' rather than just any substring? This seems brittle in the ... Robert Mustacchi Robert Mustacchi
Why do we only care about a single nameserver? Robert Mustacchi Robert Mustacchi
All of these snprintf declarations definitely should be doing overflow checking. Robert Mustacchi Robert Mustacchi
Please don't assert this. A driver should properly check it. Robert Mustacchi Robert Mustacchi
I don't think we want to warn every time we have an unsupported property. There are lots of these. Robert Mustacchi Robert Mustacchi
If you're using ddi_strtol, you need to check the return value and you should check the end pointer. This is ... Robert Mustacchi Robert Mustacchi
Elsewhere we say that the minimum is HN_MTU_MIN. Shouldn't we use that and not zero? Robert Mustacchi Robert Mustacchi
Why aren't we looking at an error here? Robert Mustacchi Robert Mustacchi
We should probably actually use structure prefixes. Just plain old 'm' is a pretty unsuporting member name and hard to ... Robert Mustacchi Robert Mustacchi
This probably shouldn't be a static thing we use for RSS. In almost every other driver we usually generate random ... Robert Mustacchi Robert Mustacchi
This comment as to how we set the leader could probably be improved. It's not clear why we're setting anything ... Robert Mustacchi Robert Mustacchi
If the link status hasn't changed you shouldn't call mac_link_update(). Robert Mustacchi Robert Mustacchi
Is there some reason this doesn't need to be locked? Robert Mustacchi Robert Mustacchi
So this all seems a bit weird. Why are we setting the link up or down? mac only expects us ... Robert Mustacchi Robert Mustacchi
It seems like this should really be addressed at some point? Why is it important to reuse the DMA handle ... Robert Mustacchi Robert Mustacchi
Shouldn't this actually be DDI_DEVICE_ATTR_V1? Robert Mustacchi Robert Mustacchi
Is this ever actually defined? Robert Mustacchi Robert Mustacchi
Shouldn't we wait a little bit for a pending TX before we try and tear things down and purposefully leak? Robert Mustacchi Robert Mustacchi
See note, for most other drivers this is random data so that way someone can't figure out the RSS mappings ... Robert Mustacchi Robert Mustacchi
When we wait for hardware, we have an upper bound on how long we'll wait. Should we do something similar ... Robert Mustacchi Robert Mustacchi
See earlier tick note. Robert Mustacchi Robert Mustacchi
It seems this can fail. If it does, what should we do? Wouldn't this mean that rx basically won't work? Robert Mustacchi Robert Mustacchi
Are we guaranteed that only a single instance of the storsvc will ever exist and therefore it makes sense for ... Robert Mustacchi Robert Mustacchi
This is definitely incorrect to use here. We do not initialize FM support or ever call ddi_fm_dma_err_get(). This should probably ... Robert Mustacchi Robert Mustacchi
Why are we sizing this on the maximum number of CPUs that the kernel supports? Do we want to actually ... Robert Mustacchi Robert Mustacchi
same_init(9F) says the name should be set to NULL. Robert Mustacchi Robert Mustacchi
It's not clear to me why failing to send a request should result in us panicking the system here. Could ... Robert Mustacchi Robert Mustacchi
Another case of sema_init not needing a string. Robert Mustacchi Robert Mustacchi
So... if we hit this case on non-debug we'll return with ret equals zero, suggesting success. Given that this has ... Robert Mustacchi Robert Mustacchi
sema_init name. Robert Mustacchi Robert Mustacchi
Do we have any idea of what the answer to this was? It's hard to imagine a world where that ... Robert Mustacchi Robert Mustacchi
Why is this assertion safe? What if it happens on non-debug? We don't seem to handle it. Robert Mustacchi Robert Mustacchi
Is there a reason we're not acting on these XXXs? Robert Mustacchi Robert Mustacchi
snprintf! Robert Mustacchi Robert Mustacchi
Why are we saying that we've set this, but ignored the value entirely? If this is intentional because of some ... Robert Mustacchi Robert Mustacchi
The manual pages says that for capabilities that we recognize, then we should return 0 if we can't set it, ... Robert Mustacchi Robert Mustacchi
Why are we setting this same value twice? Robert Mustacchi Robert Mustacchi
What is this XXX about? Robert Mustacchi Robert Mustacchi
Don't we actually need to do a DMA sync here if we have DMA memory? Maybe scsi_sync_cache_pkt? Robert Mustacchi Robert Mustacchi
If we've not implemented abort and we hit a timeout here, is it really safe to just return the packet ... Robert Mustacchi Robert Mustacchi
A comment about this would really be useful. Robert Mustacchi Robert Mustacchi
If we timed out, won't we have already called the completion callback, but we won't have set STATE_GOT_STATUS. That's normally ... Robert Mustacchi Robert Mustacchi
Are these supposed to be utf-16 values and that's why they're labeled as uint16_t values here? I'm just confused as ... Robert Mustacchi Robert Mustacchi
The fact that we're using the same uint16_t based type to represent both utf16 and utf8 strings is really confusing. ... Robert Mustacchi Robert Mustacchi
So, I get that you're trying to match devices that have a ddi networking name that are probably like hv_netsvc0, ... Robert Mustacchi Robert Mustacchi
Is there any chance this can overflow and in which case we should error rather than actually give it back ... Robert Mustacchi Robert Mustacchi
Huh, utf16 to utf8 isn't a simple divide by two. UTF-16 is a variable width encoding as some data may ... Robert Mustacchi Robert Mustacchi
Why is this assert safe? Robert Mustacchi Robert Mustacchi
open(9E) makes no guarantee that this can't happen in parallel. This should really check oflags, devtype, and the credentials. Robert Mustacchi Robert Mustacchi
This should use locking of the sc state so it's consistent with respect to open(9E). Also, why isn't the daemon_task ... Robert Mustacchi Robert Mustacchi
So, does it really make sense to allow a user process to perform a read that's less than the size ... Robert Mustacchi Robert Mustacchi
If we're unconditionally setting the offset of the uio to zero here, shouldn't we also be doing so in read? ... Robert Mustacchi Robert Mustacchi
I'm a little surprised that we're not checking the uio_resid here before we do the uiomove. If the user hasn't ... Robert Mustacchi Robert Mustacchi
The use of daemon busy here doesn't make a lot of sense to me. The reason I'm confused is that ... Robert Mustacchi Robert Mustacchi
So, this doesn't make sense to me for a number of reasons. First, for detach to have been called, all ... Robert Mustacchi Robert Mustacchi
Shouldn't we do a ddi_taskq_wait before destroying it here? What makes it impossible for something to have been added right ... Robert Mustacchi Robert Mustacchi
Why are full non-static inline functions defined in a header file? Robert Mustacchi Robert Mustacchi
Any particular reason we don't care about this failing? Robert Mustacchi Robert Mustacchi
Note, prfind does honor zone context. I don't believe it's possible to run in zone context here, but just wanted ... Robert Mustacchi Robert Mustacchi
You need to take a hold such as p_lock, if you're going to drop pidlock, otherwise once pidlock is dropped, ... Robert Mustacchi Robert Mustacchi
For what it's worth, I would generally suggest zeroing the other registers or the structure so that way you have ... Robert Mustacchi Robert Mustacchi
I don't see any FM support being enabled here or an equivalent being wired up through the use of the ... Robert Mustacchi Robert Mustacchi
I think this is actually a V1 structure. Robert Mustacchi Robert Mustacchi
Shouldn't it be fine if the real size is grater than the actual size? Robert Mustacchi Robert Mustacchi
We no longer need a 32-bit copy as there's only a 64-bit x86 kernel. Robert Mustacchi Robert Mustacchi
Does the hypervisor give us enough information about transitive versus non-transitive failures so we can know if we're just repeating ... Robert Mustacchi Robert Mustacchi
Why do we need strncmp here? Haven't we converted the hypervisor data to something we know is a null terminated ... Robert Mustacchi Robert Mustacchi
Lock name not needed Robert Mustacchi Robert Mustacchi
What is this else case for ILP32? Presuambly we want to be explicit about that. Robert Mustacchi Robert Mustacchi
What happens if we set something a cpu that someone comes and offlines? For example, we disable a CPU after ... Robert Mustacchi Robert Mustacchi
That page alignment is enough for what? Robert Mustacchi Robert Mustacchi
x void

   
exception_lists/copyright (Diff revision 1)
 
 

I don't see why this (and other THIRDPARTYLICENSE stuff) is requiring an exception.

x void

   
usr/src/cmd/Makefile (Diff revision 1)
 
 

h comes before n.

usr/src/cmd/hyperv/scripts/Makefile (Diff revision 1)
 
 

Why this line is needed?

depending

Isn't $(i386_ONLY) already assumed by the arch definition above?

This manifest seems to be missing the third party licenses.

This manifest seems to be missing the third party licenses.

usr/src/uts/intel/Makefile.rules (Diff revision 1)
 
 

What is $(SUBARCH_DIR)?

x void

   
usr/src/uts/intel/hv_heartbeat/Makefile (Diff revision 1)
 
 

Seems like extra line.

Remove.

Here and below -- these aren't helping anything as they aren't properly expanded, remove?

x void

   
usr/src/uts/intel/Makefile.rules (Diff revision 1)
 
 

OK, looks like it is for amd64/ia32, but we likely don't need the ia32 source now at all?

x void

General question -- why BSD licensed source has the CDDL text added? It would only make it impossible for "upstream" to take the fixes.

Robert Mustacchi
First, thanks for putting this together. I've left what's probably a lot of comments here. I've tried to focus on things that are specific to the integration with illumos and things that aren't code style per se. I've tried, though I'm sure I've messed it up some to try and create a difference in importance by signaling with the 'open an issue' note here about the criticality of things. I know there's a lot of things that come from the fact that these were ported.

I've not really gone through all of the hyper-v specs and made sure that all of the constants and means of how we're talking to the hypervisor make sense, I've tried to focus on the integration with different pieces in illumos.
usr/src/cmd/hyperv/kvp/Makefile (Diff revision 1)
 
 
Any reason we can't clean this up for new code? Is it smatch clean?
usr/src/cmd/hyperv/kvp/Makefile (Diff revision 1)
 
 
 
Presumably we can delete this now?
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 

Is this something that we're generally planning on syncing with Microsoft or are we planning on making it basically our own thing? I'm trying to figure out how much we should make things be a bit more cleaned up to match some of our general trends or not. I've left out some comments that might be in the more style trend as a result. I've left some of those as comments that are explicitly marked as not issues to try and distinguish them.

usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
Really feels like this should be unsigned. It doesn't seem like we should have a negative number of blocks. Same seems true for num_records as num_records gets passed directly to fwrite which takes an unsigned value and if it is negative, that'll be rather wrong.
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
This and the unlock should probably be written to just use flock(3C).
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
This isn't a good way to write an update to a file. We should be writing it out to a temporary file and then renaming it. That way we know the file is on stable storage and also that way it can't result in a partial file. It's not clear to me yet if we won't reuse the pool files. If we're open to it, I can bring librename in which simplifies this process a little bit, or at least gets it (hopefully) right once in a library.
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
cstyle prefers explicit comparisons.
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
 
Is it worth saving and restoring the error that occurred so we can give someone more information about why the daemon couldn't write to the pool?
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
Feels like this should be an unsigned value.
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
This seems like a good use for recallocarray(3C). I guess the theory is that we don't need to initialize the new memory because we'll be reading into it shortly?
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
Maybe we should VERIFY0 rather than ignore?
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
 
This seems like the kind of return value we should just check from a sanity point of view. Especially if someone ever changes this path.
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
 
 
Are we guaranteed that there will never be gaps in the service space and that this won't miss things?
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
kvp_update_mem_state also has this exact same logic. Maybe we should centralize it so we don't have to duplicate the loop?
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
It would be really nice if key was const and key_size was unsigned.
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
 
Note, this shows a slight discrepancy in how we use num_records. Sometimes we're using it to indicate the number of valid records; however, we're also sometimes using it to represent the amount of memory of the read records array. Because we're always reallocating memory every time, this works, but just leaves a bit of an odd smell. Not sure anything should be changed, but it's a bit weird.
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
 

Shouldn't we actually use a bcopy here or make sure that the rest of the bytes are set to a '\0' here? Otherwise if we're copying a shorter name over a longer name, we'll be relying on the nul character rather substantially to cause us to correctly stop reading the records and making sense of it. That seems like it'll be harder to make sense of when someone needs to debug what's on disk.

Actually, looking later on, this seems like it needs to be a bcopy of the maximum value size as when we're in key_add_or_modify, we're not assuming that we have null terminated arrays here.

usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
Either this function is right or all the others are wrong, but this is using value_size as a uint32_t, but the others are treating it as an int32_t. We should probably make them consistent?
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
recodrd seems like a typo
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
 
It feels like we should memset the other bytes here, otherwise they're just random stack garbage and could leak out other information persistently.
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
 
 
 
If value_size is an int, probably need to make sure this isn't negative.
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
 
 
 
 
 
 
 
 
This just seems messed up. We're treating the input key size and value size as how much data we should copy but that has no relation on the actual length of the data stored. Consider the key stored as: "foo.bar". If we look for the key "foo" with key_size 3, this will definitely find it "foo.bar" if it comes before a "foo" in the daemon.
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
 
 
Shouldn't this have key_size/value_size checks?
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
Not checking for memory allocation failure.
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
Not checking for malloc failure. The problem is that we're using NULL as a sentinel for there being no link found. Therefore we probably should actually probably error in this case.
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
 
 
 
 
 
 
 
 
 
What if strlcat would overflow the buffer that we have? Shouldn't we do something about that case?
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
Maybe we should use routing sockets to not have to shell out awkwardly to commands? Maybe route get will be a simpler way of getting this?
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
 
If this and others overflow, shouldn't we actually not try and execute the command?
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
 
 
 
 
 
 
 
 
 
This deserves comments as to what it's trying to actually accomplish. It's hard to evaluate its correctness.
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
At least the buffer size should be sizeof (tmp) and the next bit.
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
 
 
 
 
 
What if these strl* things end up overflowing meaning we haven't actually properly copied things over.
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
Is this explicitly trying to do interface name globbing? The comments suggest it's not, this means that we should probably actually not use strncmp here. If that's right, then this should just be a normal strcmp(), otherwise this'll possibly erroneously match link names.
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
 
 
It feels like this would be much clearer if we just did the simple thing to find where the last one / zero is set rather than the complicated bit manipulations.
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
 
 
 
 
Other functions are returning the errno. This is returning the EAI keys.
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
Please don't use raw strcpy on buffers. This is a daemon, it should probably be more rigorous.
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
len needs to be an ssize_t to properly match pread.
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
Erm, should we really continue executing in the fact of unknown options? This is going to make it really hard to add new options at all.
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
 
It feels like this should also be dropping privileges at some point. Though that doesn't have to happen now. But there are other things beyond just calling daemon() that a daemon should do.
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
 
 
 
Unchecked malloc.
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
What does pread have to do with anything here?
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
Erm, do we really want to use an errno to signify this? It doesn't seem consistent with other uses of exit here. I don't think we should be mixing exit with an errno and exit with EXIT_FAILURE (which would confusingly be an errno). Also in this case, there are other errnos that could occur that we're glomming onto EIO.
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
 
 
 
Is there anything we should assume about the kernel side when we receive a malformed request?
usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
 
 
 
 
 
 
 
 
So, depending on the error condition, how does this not turn into an infinite loop?
Do we really want to assume that the default path is sane for this? Should we either set path or use absolute paths to ipadm and grep?
Should we actually try to match the exact string 'dhcp' rather than just any substring? This seems brittle in the face of future changes to the types of values.

Also if the only thing we care about is the dhcp information, shouldn't we pass -q to grep?
usr/src/cmd/hyperv/scripts/hv_get_dns_info.sh (Diff revision 1)
 
 
 
 
 
Why do we only care about a single nameserver?
Minor, but hdrcheck presumably complains about this.
usr/src/uts/intel/io/hyperv/netvsc/hn_gld.c (Diff revision 1)
 
 
 
All of these snprintf declarations definitely should be doing overflow checking.
Please don't assert this. A driver should properly check it.
I don't think we want to warn every time we have an unsupported property. There are lots of these.
If you're using ddi_strtol, you need to check the return value and you should check the end pointer. This is true for all of the ones here.
Elsewhere we say that the minimum is HN_MTU_MIN. Shouldn't we use that and not zero?
If we're rerturning an error, I'm not sure we should set val.
If we're returning an error we probably shouldn't set val.
Why aren't we looking at an error here?
We should probably actually use structure prefixes. Just plain old 'm' is a pretty unsuporting member name and hard to find in the file.
usr/src/uts/intel/io/hyperv/netvsc/if_hn.c (Diff revision 1)
 
 
 
 
 
 
 
 
 
This probably shouldn't be a static thing we use for RSS. In almost every other driver we usually generate random data for an RSS key size. Does Hyper-V require that we have to use that constant?
This comment as to how we set the leader could probably be improved. It's not clear why we're setting anything in particular for this.
usr/src/uts/intel/io/hyperv/netvsc/if_hn.c (Diff revision 1)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
If the link status hasn't changed you shouldn't call mac_link_update().
Is there some reason this doesn't need to be locked?
usr/src/uts/intel/io/hyperv/netvsc/if_hn.c (Diff revision 1)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
So this all seems a bit weird. Why are we setting the link up or down? mac only expects us to update things if we've actually changed. If the system hasn't changed, we shouldn't tell MAC that things are down.

Also, is there a reason that this is OK to manipulate our main link state structures without holding the softstate lock?
It seems like this should really be addressed at some point? Why is it important to reuse the DMA handle at this point? No other high-performance networking driver has these same concerns. Because this is paravirt, I suspect that we're not actually doing true device-level DMA, but this seems rateher suspicious. At the very least, we should open a follow up to address this.
usr/src/uts/intel/io/hyperv/netvsc/if_hn.c (Diff revision 1)
 
 
 
 
 
 
signed/unsigned arithmetic confusion. This is complicated by the fact that hn_pkt_length is weirdly a signed value.
usr/src/uts/intel/io/hyperv/netvsc/if_hn.c (Diff revision 1)
 
 
 
 
 
 
 
 
 
 
 
 
 
This should probably allocate two extra bytes so we can offset the packet by two bytes such that the IP header is four byte aligned (the MAC header is a number of 4-byte units). I'll make a note to add this to mac(9E) or other pages so this is clearer.
Should really be unsigned.
Do we not need to do any host trusting on IPv6/
usr/src/uts/intel/io/hyperv/netvsc/if_hn.c (Diff revision 1)
 
 
 
 
If these are wrong, then the packet probably shouldn't be trusted. So I think it's probably reasonable to have them.
Shouldn't this actually be DDI_DEVICE_ATTR_V1?
usr/src/uts/intel/io/hyperv/netvsc/if_hn.c (Diff revision 1)
 
 
 
 
 
I guess I don't understand why this is required? Why don't we just do what normal NIC drivers do and honor the life time? If we don't want DMA memory because this is a hypervisor and it can use other things, then that's fine, we can use the KPM segment, but it seems and the original authors agreed, that this isn't a great idea.
Is this ever actually defined?
usr/src/uts/intel/io/hyperv/netvsc/if_hn.c (Diff revision 1)
 
 
 
 
 
 
 
 
 
Shouldn't we wait a little bit for a pending TX before we try and tear things down and purposefully leak?
Are we actually linking the channel if we're closing it?
usr/src/uts/intel/io/hyperv/netvsc/if_hn.c (Diff revision 1)
 
 
 
See note, for most other drivers this is random data so that way someone can't figure out the RSS mappings in play.
usr/src/uts/intel/io/hyperv/netvsc/if_hn.c (Diff revision 1)
 
 
 
 
 
When we wait for hardware, we have an upper bound on how long we'll wait. Should we do something similar here? Note by specifying this in clock ticks you're going to get variable rates depending on what the current tick rate is. I suspect you really want to use drv_usectohz() to get a consistent time regardless of the setting of hz.
See earlier tick note.
It seems this can fail. If it does, what should we do? Wouldn't this mean that rx basically won't work?
If it's possible to further explain this XXX or why this is an XXX that'd be appreciated.
usr/src/uts/intel/io/hyperv/netvsc/if_hn.c (Diff revision 1)
 
 
 
 
It would be great if it could explain why this is fatal.
Can we please comment on why we need to disable pre-emption? We're not disabling interrupts, so it feels like it can still race. Is this a major win against a much simpler mutex?
Are we guaranteed that only a single instance of the storsvc will ever exist and therefore it makes sense for this to be global and not part of the logical device state?
This is definitely incorrect to use here. We do not initialize FM support or ever call ddi_fm_dma_err_get(). This should probably just be set to zero.
Why are we sizing this on the maximum number of CPUs that the kernel supports? Do we want to actually have more sub-channels than CPUs?
same_init(9F) says the name should be set to NULL.
It's not clear to me why failing to send a request should result in us panicking the system here. Could we at least get a comment that explains why that's the case? It's a bit weird when the test function returns a value, but this doesn't.
Another case of sema_init not needing a string.
So... if we hit this case on non-debug we'll return with ret equals zero, suggesting success. Given that this has an assert 0 on debug, this seems bad to drive on.
If this is the case, why are we warning about it but not doing anything about it per se? What action should the user or driver be taking in this case?
sema_init name.
Do we have any idea of what the answer to this was? It's hard to imagine a world where that would result in consistency if we are making dependent, non-idempotent changes to the disk unless there's other hashing going on.
Why is this assertion safe? What if it happens on non-debug? We don't seem to handle it.
Is there a reason we're not acting on these XXXs?
snprintf!
Isn't it the case that we should be able to abort commands even when we have guarantees that they'll complete? If this command has taken a long time, say a command that's going to complete in an hour, we still may want to abort it because it's not worth waiting that long.
Why are we saying that we've set this, but ignored the value entirely? If this is intentional because of some aspect of the virtual hardware, surely we should have a comment that says why that makes sense.
The manual pages says that for capabilities that we recognize, then we should return 0 if we can't set it, but for ones we don't, we should return -1. Shouldn't we check these like the ones above?
Why are we setting this same value twice?
What is this XXX about?
We're relying on the upper levels for winindex, which if invalid would cause this all to fail. Shouldn't we check the return value here?
Don't we actually need to do a DMA sync here if we have DMA memory? Maybe scsi_sync_cache_pkt?
usr/src/uts/intel/io/hyperv/storvsc/hv_storvsc_drv_illumos.c (Diff revision 1)
 
 
 
 
 
 
 
 
It's also fine to have this entry point set to NULL if we don't support this entry point.
I realize that there isn't going to be any desire to simplify this, but if we can actually do device discovery of what LUNs exist via hyper-v like is able to be done with virtio and some others, we should consider future work to move this to SCSAv3, which will simplify a bunch of the device coming and going related logic. Especially if this ever ends up supporting device hotplug at the other end of the bug.
usr/src/uts/intel/io/hyperv/storvsc/hv_storvsc_drv_illumos.c (Diff revision 1)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
If we've not implemented abort and we hit a timeout here, is it really safe to just return the packet and call the completion? Won't hyper-v still have the memory here that's related to this? I'm not sure how this can work without us doing something to the virtual controller. Comments that explain the safety would be greatly appreciated.
usr/src/uts/intel/io/hyperv/storvsc/hv_storvsc_drv_illumos.c (Diff revision 1)
 
 
 
 
 
 
 
 
 
 
A comment about this would really be useful.
usr/src/uts/intel/io/hyperv/storvsc/hv_storvsc_drv_illumos.c (Diff revision 1)
 
 
 
 
 
 
 
 
 
 
If we timed out, won't we have already called the completion callback, but we won't have set STATE_GOT_STATUS. That's normally done by callers before they call storvsc_complete_command(). So I'm not sure how this is intended to work.
Are there more details of what this is about and what goes wrong at all? It'd be great to know what exactly this is about.
Why is this an XXX?
Conventionally we open the brace on the case line.
usr/src/uts/intel/io/hyperv/storvsc/hv_storvsc_drv_illumos.c (Diff revision 1)
 
 
 
 
 
 
 
 
How are command aborted when we don't support sending aborts?
usr/src/uts/intel/io/hyperv/utilities/hv_kvp.h (Diff revision 1)
 
 
 
 
 
 
 
 
 
Are these supposed to be utf-16 values and that's why they're labeled as uint16_t values here? I'm just confused as to what they're supposed to represent when I look at all the casting in the utf8 to utf 16 utilities. If the latter is supposed to work on uint16_t represented values, it's pretty unclear then why it's casting back and forth.

Though looking further, it's not clear if these are supposed to also be thought of as normal characters as we call strlen on these array fields!
usr/src/uts/intel/io/hyperv/utilities/hv_kvp.c (Diff revision 1)
 
 
 
 
 
 
 
The fact that we're using the same uint16_t based type to represent both utf16 and utf8 strings is really confusing. It makes it very hard to know when we're in one state or the next. The casting back and forth here should make that all kind of clear. Though I'm not sure why we need the casts in the uint16_t cases. They should already be the right type due to C's willingness to take arrays as pointer arguments.
So, I get that you're trying to match devices that have a ddi networking name that are probably like hv_netsvc0, etc. but this strcmp method means you can match anything, including things beyond that. Perhaps we should do this a different way to figure out that it's the right thing. One way would be to use the driver name.
Is there any chance this can overflow and in which case we should error rather than actually give it back to the caller?
Huh, utf16 to utf8 isn't a simple divide by two. UTF-16 is a variable width encoding as some data may be in multiple uint16_t units. Shouldn't we be using the actual size that we get for the value? This is true in all of the places in this function we do something similar.
Strictly speaking an uint32_t is just %u or PRIu32. The cast and all is fine, just seems a bit weird that we're going to a larger type, but it's not incorrect.
Strictly speaking an uint64_t is just %lu in the kernel or PRIu64 if we want to handle different data models. The cast and all is fine, just seems a bit weird that we're going to a larger type, but it's not incorrect.
Why is this assert safe?
usr/src/uts/intel/io/hyperv/utilities/hv_kvp.c (Diff revision 1)
 
 
 
 
 
 
 
 
open(9E) makes no guarantee that this can't happen in parallel.

This should really check oflags, devtype, and the credentials.
usr/src/uts/intel/io/hyperv/utilities/hv_kvp.c (Diff revision 1)
 
 
 
 
 
 
 
 
 
 
 
 
This should use locking of the sc state so it's consistent with respect to open(9E).

Also, why isn't the daemon_task cleared out? Shouldn't we set curproc to something else since it's closed the handle?
So, does it really make sense to allow a user process to perform a read that's less than the size of a single message? Shouldn't we make sure there's enough space in the buffer for an entire message? Otherwise, it seems like we need to actually honor uio_offset (should be uio_loffset) and taken into account that we may have partial reads for a given request.
If we're unconditionally setting the offset of the uio to zero here, shouldn't we also be doing so in read? Also, shouldn't we use uio_loffset as we're 64-bit aware?
I'm a little surprised that we're not checking the uio_resid here before we do the uiomove. If the user hasn't given us enough data to form a full request, we don't seem to care in the current form and will just copy zeros. Is that intentional? It seemed like the user and daemon always exchanged a full hv_kvp_msg in each direction.
usr/src/uts/intel/io/hyperv/utilities/hv_kvp.c (Diff revision 1)
 
 
 
 
 
 
 
The use of daemon busy here doesn't make a lot of sense to me. The reason I'm confused is that open() seems to clear daemon_busy. If the userland daemon crashes handling an event, shouldn't it be able to start back up and resume processing that? I guess this is why it's a bit weird to me that this is the flag that indicates the poll state and not something else. Probably a theory statement would help.
usr/src/uts/intel/io/hyperv/utilities/hv_kvp.c (Diff revision 1)
 
 
 
 
 
 
Names for these should all be NULL per the manual page.
usr/src/uts/intel/io/hyperv/utilities/hv_kvp.c (Diff revision 1)
 
 
 
 
 
So, this doesn't make sense to me for a number of reasons. First, for detach to have been called, all of your open handles will have been had to been closed. Secondly, since it's been closed, we can't assume that the proc_t is still valid (in general, it's not good form to try and keep a hold on it). Is there something that this is trying to work around?

I think we could just get rid of this and tracking daemon_task entirely.
Shouldn't we do a ddi_taskq_wait before destroying it here? What makes it impossible for something to have been added right as vmbus_ic_detach was being called and still be processing?
__packed comes from sys/ccompile.h
Why are full non-static inline functions defined in a header file?
Any particular reason we don't care about this failing?
s/Fine/Find?
Note, prfind does honor zone context. I don't believe it's possible to run in zone context here, but just wanted to be aware in case it makes more sense to explicitly use prfind_zone?
Note, it's possible that init may crash and be restarting. In that case, it's not clear if we'll find initpp.
You need to take a hold such as p_lock, if you're going to drop pidlock, otherwise once pidlock is dropped, the process can disappear.
I would have found the code a bit easier to follow if we actually just passed in and out the cpuid_regs we wanted in places so that way we could remember the ordering and match it up with specifications. Not something that has to change at all, but just wanted to mention it.
usr/src/uts/intel/io/hyperv/vmbus/hyperv.c (Diff revision 1)
 
 
 
 
For what it's worth, I would generally suggest zeroing the other registers or the structure so that way you have known values in the other cpuid leaves. Intel and others have started using values in other registers to indicate a subleaf. While I doubt this'll come up in practice, seems like a useful bit of sanity (though I'll admit we're inconsistent about it in cpuid.c).
I don't see any FM support being enabled here or an equivalent being wired up through the use of the bus ops. I don't believe we should be using DDI_DMA_FLAGERR here then.
I think this is actually a V1 structure.
usr/src/uts/intel/io/hyperv/vmbus/hyperv_busdma.c (Diff revision 1)
 
 
 
 
 
Shouldn't it be fine if the real size is grater than the actual size?
We no longer need a 32-bit copy as there's only a 64-bit x86 kernel.
I don't think the inline needs leading underscores.
usr/src/uts/intel/io/hyperv/vmbus/vmbus.c (Diff revision 1)
 
 
 
 
 
 
 
Does the hypervisor give us enough information about transitive versus non-transitive failures so we can know if we're just repeating something that'll always fail?
I think using the MIILISEC and MICROSEC bits for this is confusing. Naively, when you see the first assignment, you say, oh, it's starting off as delay 1ms (as it's 1000 in us). However, when you see this, you think it's trying to see if the value is actually 2us, where as it's really treating the maximum as 2000000 which will end up being 2 seconds if I have the math right there.
usr/src/uts/intel/io/hyperv/vmbus/vmbus.c (Diff revision 1)
 
 
 
 
 
 
 
If this is a hot path for storage and networking above, do we really want to block processing on this? It may not be a problem, but this may be a place where using task queues with pre-allocated elements may be useful. Though they aren't part of the DDI.
usr/src/uts/intel/io/hyperv/vmbus/vmbus.c (Diff revision 1)
 
 
 
 
 
It seems like the pre-emption disable here is important so that we're grabbing and doing MSRs on the current CPU, is that right? It might help to expand that comment or have a bit of a theory statement to that effect.
Feels like a comment describing why a level one interrupt was chosen for this would be useful.
I realize that this is probably the simplest way to get an interrupt vector for this context since there is nothing else, but it may make sense in the fullness of time to add a new type of interrupt so we can just get an arbitrary interrupt that can be programmed without it needing to be one of the special ipi vectors (those are a limited resource) and so then this could use the normal DDI programming interface and not be adding and making calls to this directly.
Why do we need strncmp here? Haven't we converted the hypervisor data to something we know is a null terminated string?

Lock name not needed

What is this else case for ILP32? Presuambly we want to be explicit about that.
usr/src/uts/intel/io/hyperv/vmbus/vmbus_chan.c (Diff revision 1)
 
 
 
 
 
 
 
Any idea what are cases that leads to this?
Should we include the status byte here to help folks understand what went wrong when they report an error? That would only make sense if it's meaningful.
Why is pre-emption being disabled here? What are we trying to race against? This doesn't stop interrupts from occurring.
While DDI_SLEEP is defined to be zero, probably easier for readers if we use the actual constant.
usr/src/uts/intel/io/hyperv/vmbus/vmbus_chan.c (Diff revision 1)
 
 
 
 
 
 
 
 
 
 
 
 
What happens if we set something a cpu that someone comes and offlines? For example, we disable a CPU after boot to disable HT.
usr/src/uts/intel/io/hyperv/vmbus/vmbus_var.h (Diff revision 1)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
It'd be great if we could add the why to the do not change these comments.
That page alignment is enough for what?
usr/src/uts/intel/io/hyperv/vmbus/vmbus_xact.c (Diff revision 1)
 
 
 
 
Maybe a brief comment about why this is a situation to panic in would be useful?
Loading...