11220 Want support for hyper-v

Review Request #1990 — Created June 8, 2019 and updated

citrus
illumos-gate
master
11220
2c869fb...
general

Author: George Wilson george.wilson@delphix.com
Portions Contributed by: Brad Lewis brad.lewis@delphix.com
Portions Contributed by: Pavel Zakharov pavel.zakharov@delphix.com

For reference, this wad includes:

From Delphix:

  • DLPX-52519 DxOS needs support for Hyper-V drivers
  • DLPX-52831 guid is leaked in storvsc_config_one
  • AZ-307 rem_drv of hv_storvsc driver hangs
  • AZ-308 Storvsc driver needs polling capability.
  • ATA reads timeout after Azure update
  • Fix missing mutex_exit() calls.

And a couple of very minor updates from OmniOS to cater for changes in gate:

  • Adjust hyperv for new C99 Makefile macros
  • Remove duplicate definition of __packed
  • Do not package 32bit kernel drivers

Caveats:

This batch represents the initial code drop from Delphix. It works generally well, but it has the following known issues:

  • Panics if booted in a VM with a single virtual CPU
  • Panics if booted with low memory (< 1GiB)
  • Occasionally panics on first boot
  • Time-sync does not function and error messages are logged
  • Console output is too verbose

These are fixed in OmniOS and will be addressed in follow-up issues/RTIs.

In addition to being used to run Delphix in Azure, These drivers have been running in various OmniOS development VMs under Hyper-V on Windows 10 since they were integrated.

  • 53
  • 0
  • 48
  • 10
  • 111
Description From Last Updated
This and the unlock should probably be written to just use flock(3C). rm rm
Feels like this should be an unsigned value. rm rm
Are we guaranteed that there will never be gaps in the service space and that this won't miss things? rm rm
Shouldn't we actually use a bcopy here or make sure that the rest of the bytes are set to a ... rm rm
Either this function is right or all the others are wrong, but this is using value_size as a uint32_t, but ... rm rm
It feels like we should memset the other bytes here, otherwise they're just random stack garbage and could leak out ... rm rm
This just seems messed up. We're treating the input key size and value size as how much data we should ... rm rm
Shouldn't this have key_size/value_size checks? rm rm
Not checking for memory allocation failure. rm rm
Not checking for malloc failure. The problem is that we're using NULL as a sentinel for there being no link ... rm rm
What if strlcat would overflow the buffer that we have? Shouldn't we do something about that case? rm rm
Maybe we should use routing sockets to not have to shell out awkwardly to commands? Maybe route get will be ... rm rm
If this and others overflow, shouldn't we actually not try and execute the command? rm rm
This deserves comments as to what it's trying to actually accomplish. It's hard to evaluate its correctness. rm rm
At least the buffer size should be sizeof (tmp) and the next bit. rm rm
What if these strl* things end up overflowing meaning we haven't actually properly copied things over. rm rm
Is this explicitly trying to do interface name globbing? The comments suggest it's not, this means that we should probably ... rm rm
It feels like this would be much clearer if we just did the simple thing to find where the last ... rm rm
Other functions are returning the errno. This is returning the EAI keys. rm rm
Please don't use raw strcpy on buffers. This is a daemon, it should probably be more rigorous. rm rm
len needs to be an ssize_t to properly match pread. rm rm
Erm, should we really continue executing in the fact of unknown options? This is going to make it really hard ... rm rm
Unchecked malloc. rm rm
What does pread have to do with anything here? rm rm
Erm, do we really want to use an errno to signify this? It doesn't seem consistent with other uses of ... rm rm
Is there anything we should assume about the kernel side when we receive a malformed request? rm rm
So, depending on the error condition, how does this not turn into an infinite loop? rm rm
Why do we only care about a single nameserver? rm rm
It seems like this should really be addressed at some point? Why is it important to reuse the DMA handle ... rm rm
Shouldn't we wait a little bit for a pending TX before we try and tear things down and purposefully leak? rm rm
It seems this can fail. If it does, what should we do? Wouldn't this mean that rx basically won't work? rm rm
Why is this assertion safe? What if it happens on non-debug? We don't seem to handle it. rm rm
Is there a reason we're not acting on these XXXs? rm rm
Why are we saying that we've set this, but ignored the value entirely? If this is intentional because of some ... rm rm
Don't we actually need to do a DMA sync here if we have DMA memory? Maybe scsi_sync_cache_pkt? rm rm
If we've not implemented abort and we hit a timeout here, is it really safe to just return the packet ... rm rm
A comment about this would really be useful. rm rm
If we timed out, won't we have already called the completion callback, but we won't have set STATE_GOT_STATUS. That's normally ... rm rm
Are these supposed to be utf-16 values and that's why they're labeled as uint16_t values here? I'm just confused as ... rm rm
The fact that we're using the same uint16_t based type to represent both utf16 and utf8 strings is really confusing. ... rm rm
So, I get that you're trying to match devices that have a ddi networking name that are probably like hv_netsvc0, ... rm rm
Is there any chance this can overflow and in which case we should error rather than actually give it back ... rm rm
Huh, utf16 to utf8 isn't a simple divide by two. UTF-16 is a variable width encoding as some data may ... rm rm
Why is this assert safe? rm rm
open(9E) makes no guarantee that this can't happen in parallel. This should really check oflags, devtype, and the credentials. rm rm
This should use locking of the sc state so it's consistent with respect to open(9E). Also, why isn't the daemon_task ... rm rm
So, does it really make sense to allow a user process to perform a read that's less than the size ... rm rm
If we're unconditionally setting the offset of the uio to zero here, shouldn't we also be doing so in read? ... rm rm
I'm a little surprised that we're not checking the uio_resid here before we do the uiomove. If the user hasn't ... rm rm
The use of daemon busy here doesn't make a lot of sense to me. The reason I'm confused is that ... rm rm
So, this doesn't make sense to me for a number of reasons. First, for detach to have been called, all ... rm rm
Shouldn't we do a ddi_taskq_wait before destroying it here? What makes it impossible for something to have been added right ... rm rm
Why are full non-static inline functions defined in a header file? rm rm
xvoid
  1. 
      
  2. exception_lists/copyright (Diff revision 1)
     
     

    I don't see why this (and other THIRDPARTYLICENSE stuff) is requiring an exception.

    1. So that pbchk and friends don't report that the file does not have a current copyright.

  3. 
      
xvoid
  1. 
      
  2. usr/src/cmd/Makefile (Diff revision 1)
     
     

    h comes before n.

  3. usr/src/cmd/hyperv/scripts/Makefile (Diff revision 1)
     
     

    Why this line is needed?

  4. Isn't $(i386_ONLY) already assumed by the arch definition above?

    1. Yes, but it doesn't hurt and seems to be what other packages do.

  5. This manifest seems to be missing the third party licenses.

  6. This manifest seems to be missing the third party licenses.

  7. usr/src/uts/intel/Makefile.rules (Diff revision 1)
     
     

    What is $(SUBARCH_DIR)?

  8. 
      
xvoid
  1. 
      
  2. usr/src/uts/intel/hv_heartbeat/Makefile (Diff revision 1)
     
     

    Seems like extra line.

  3. Here and below -- these aren't helping anything as they aren't properly expanded, remove?

  4. 
      
xvoid
  1. 
      
  2. usr/src/uts/intel/Makefile.rules (Diff revision 1)
     
     

    OK, looks like it is for amd64/ia32, but we likely don't need the ia32 source now at all?

  3. 
      
xvoid
  1. General question -- why BSD licensed source has the CDDL text added? It would only make it impossible for "upstream" to take the fixes.

  2. 
      
rm
  1. First, thanks for putting this together. I've left what's probably a lot of comments here. I've tried to focus on things that are specific to the integration with illumos and things that aren't code style per se. I've tried, though I'm sure I've messed it up some to try and create a difference in importance by signaling with the 'open an issue' note here about the criticality of things. I know there's a lot of things that come from the fact that these were ported.
    
    I've not really gone through all of the hyper-v specs and made sure that all of the constants and means of how we're talking to the hypervisor make sense, I've tried to focus on the integration with different pieces in illumos.
  2. usr/src/cmd/hyperv/kvp/Makefile (Diff revision 1)
     
     
    Any reason we can't clean this up for new code? Is it smatch clean?
    1. Will do. Yes, it's all smatch clean.

  3. usr/src/cmd/hyperv/kvp/Makefile (Diff revision 1)
     
     
     
    Presumably we can delete this now?
  4. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     

    Is this something that we're generally planning on syncing with Microsoft or are we planning on making it basically our own thing? I'm trying to figure out how much we should make things be a bit more cleaned up to match some of our general trends or not. I've left out some comments that might be in the more style trend as a result. I've left some of those as comments that are explicitly marked as not issues to try and distinguish them.

  5. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
    Really feels like this should be unsigned. It doesn't seem like we should have a negative number of blocks. Same seems true for num_records as num_records gets passed directly to fwrite which takes an unsigned value and if it is negative, that'll be rather wrong.
  6. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
    This and the unlock should probably be written to just use flock(3C).
  7. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
    This isn't a good way to write an update to a file. We should be writing it out to a temporary file and then renaming it. That way we know the file is on stable storage and also that way it can't result in a partial file. It's not clear to me yet if we won't reuse the pool files. If we're open to it, I can bring librename in which simplifies this process a little bit, or at least gets it (hopefully) right once in a library.
  8. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
    cstyle prefers explicit comparisons.
  9. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
     
    Is it worth saving and restoring the error that occurred so we can give someone more information about why the daemon couldn't write to the pool?
  10. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
    Feels like this should be an unsigned value.
  11. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
    This seems like a good use for recallocarray(3C). I guess the theory is that we don't need to initialize the new memory because we'll be reading into it shortly?
  12. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
    Maybe we should VERIFY0 rather than ignore?
  13. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
     
    This seems like the kind of return value we should just check from a sanity point of view. Especially if someone ever changes this path.
  14. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
     
     
    Are we guaranteed that there will never be gaps in the service space and that this won't miss things?
  15. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
    kvp_update_mem_state also has this exact same logic. Maybe we should centralize it so we don't have to duplicate the loop?
  16. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
    It would be really nice if key was const and key_size was unsigned.
  17. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
     
    Note, this shows a slight discrepancy in how we use num_records. Sometimes we're using it to indicate the number of valid records; however, we're also sometimes using it to represent the amount of memory of the read records array. Because we're always reallocating memory every time, this works, but just leaves a bit of an odd smell. Not sure anything should be changed, but it's a bit weird.
  18. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
     

    Shouldn't we actually use a bcopy here or make sure that the rest of the bytes are set to a '\0' here? Otherwise if we're copying a shorter name over a longer name, we'll be relying on the nul character rather substantially to cause us to correctly stop reading the records and making sense of it. That seems like it'll be harder to make sense of when someone needs to debug what's on disk.

    Actually, looking later on, this seems like it needs to be a bcopy of the maximum value size as when we're in key_add_or_modify, we're not assuming that we have null terminated arrays here.

  19. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
    Either this function is right or all the others are wrong, but this is using value_size as a uint32_t, but the others are treating it as an int32_t. We should probably make them consistent?
  20. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
    recodrd seems like a typo
  21. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
     
    It feels like we should memset the other bytes here, otherwise they're just random stack garbage and could leak out other information persistently.
  22. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
     
     
     
    If value_size is an int, probably need to make sure this isn't negative.
  23. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
    This just seems messed up. We're treating the input key size and value size as how much data we should copy but that has no relation on the actual length of the data stored. Consider the key stored as: "foo.bar". If we look for the key "foo" with key_size 3, this will definitely find it "foo.bar" if it comes before a "foo" in the daemon.
  24. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
     
     
    Shouldn't this have key_size/value_size checks?
  25. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
    Not checking for memory allocation failure.
  26. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
    Not checking for malloc failure. The problem is that we're using NULL as a sentinel for there being no link found. Therefore we probably should actually probably error in this case.
  27. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
     
    What if strlcat would overflow the buffer that we have? Shouldn't we do something about that case?
  28. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
    Maybe we should use routing sockets to not have to shell out awkwardly to commands? Maybe route get will be a simpler way of getting this?
  29. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
     
    If this and others overflow, shouldn't we actually not try and execute the command?
  30. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
     
    This deserves comments as to what it's trying to actually accomplish. It's hard to evaluate its correctness.
  31. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
    At least the buffer size should be sizeof (tmp) and the next bit.
  32. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
     
     
     
     
     
    What if these strl* things end up overflowing meaning we haven't actually properly copied things over.
  33. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
    Is this explicitly trying to do interface name globbing? The comments suggest it's not, this means that we should probably actually not use strncmp here. If that's right, then this should just be a normal strcmp(), otherwise this'll possibly erroneously match link names.
  34. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
     
     
    It feels like this would be much clearer if we just did the simple thing to find where the last one / zero is set rather than the complicated bit manipulations.
  35. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
     
     
     
     
    Other functions are returning the errno. This is returning the EAI keys.
  36. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
    Please don't use raw strcpy on buffers. This is a daemon, it should probably be more rigorous.
  37. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
    len needs to be an ssize_t to properly match pread.
  38. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
    Erm, should we really continue executing in the fact of unknown options? This is going to make it really hard to add new options at all.
  39. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
     
    It feels like this should also be dropping privileges at some point. Though that doesn't have to happen now. But there are other things beyond just calling daemon() that a daemon should do.
  40. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
     
     
     
    Unchecked malloc.
  41. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
    What does pread have to do with anything here?
  42. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
    Erm, do we really want to use an errno to signify this? It doesn't seem consistent with other uses of exit here. I don't think we should be mixing exit with an errno and exit with EXIT_FAILURE (which would confusingly be an errno). Also in this case, there are other errnos that could occur that we're glomming onto EIO.
  43. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
     
     
     
    Is there anything we should assume about the kernel side when we receive a malformed request?
  44. usr/src/cmd/hyperv/kvp/hv_kvp_daemon.c (Diff revision 1)
     
     
     
     
     
     
     
     
    So, depending on the error condition, how does this not turn into an infinite loop?
  45. Do we really want to assume that the default path is sane for this? Should we either set path or use absolute paths to ipadm and grep?
  46. Should we actually try to match the exact string 'dhcp' rather than just any substring? This seems brittle in the face of future changes to the types of values.
    
    Also if the only thing we care about is the dhcp information, shouldn't we pass -q to grep?
  47. usr/src/cmd/hyperv/scripts/hv_get_dns_info.sh (Diff revision 1)
     
     
     
     
     
    Why do we only care about a single nameserver?
  48. Minor, but hdrcheck presumably complains about this.
    1. hdrchk seems fine with it, but fixed.

  49. usr/src/uts/intel/io/hyperv/netvsc/hn_gld.c (Diff revision 1)
     
     
     
    All of these snprintf declarations definitely should be doing overflow checking.
  50. Please don't assert this. A driver should properly check it.
  51. I don't think we want to warn every time we have an unsupported property. There are lots of these.
  52. If you're using ddi_strtol, you need to check the return value and you should check the end pointer. This is true for all of the ones here.
  53. Elsewhere we say that the minimum is HN_MTU_MIN. Shouldn't we use that and not zero?
  54. If we're rerturning an error, I'm not sure we should set val.
  55. If we're returning an error we probably shouldn't set val.
  56. Why aren't we looking at an error here?
  57. We should probably actually use structure prefixes. Just plain old 'm' is a pretty unsuporting member name and hard to find in the file.
    1. I've added a prefix to m but I'd like to keep this close to the freebsd driver source to aid future merges, and the other structure members are not as unsporting.

  58. usr/src/uts/intel/io/hyperv/netvsc/if_hn.c (Diff revision 1)
     
     
     
     
     
     
     
     
     
    This probably shouldn't be a static thing we use for RSS. In almost every other driver we usually generate random data for an RSS key size. Does Hyper-V require that we have to use that constant?
    1. The hyperv drivers in FreeBSD and Linux both use this constant as the default key. I can't find any documentation that states this as a requirement but I'd like to investigate this as a follow-up rather than now.

  59. This comment as to how we set the leader could probably be improved. It's not clear why we're setting anything in particular for this.
  60. usr/src/uts/intel/io/hyperv/netvsc/if_hn.c (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
    If the link status hasn't changed you shouldn't call mac_link_update().
  61. Is there some reason this doesn't need to be locked?
  62. usr/src/uts/intel/io/hyperv/netvsc/if_hn.c (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
    So this all seems a bit weird. Why are we setting the link up or down? mac only expects us to update things if we've actually changed. If the system hasn't changed, we shouldn't tell MAC that things are down.
    
    Also, is there a reason that this is OK to manipulate our main link state structures without holding the softstate lock?
  63. It seems like this should really be addressed at some point? Why is it important to reuse the DMA handle at this point? No other high-performance networking driver has these same concerns. Because this is paravirt, I suspect that we're not actually doing true device-level DMA, but this seems rateher suspicious. At the very least, we should open a follow up to address this.
    1. It doesn't seem to be necessary to use the same DMA handle, that's just something that is done.
      Will come back to it..

  64. usr/src/uts/intel/io/hyperv/netvsc/if_hn.c (Diff revision 1)
     
     
     
     
     
     
    signed/unsigned arithmetic confusion. This is complicated by the fact that hn_pkt_length is weirdly a signed value.
  65. usr/src/uts/intel/io/hyperv/netvsc/if_hn.c (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
     
     
     
    This should probably allocate two extra bytes so we can offset the packet by two bytes such that the IP header is four byte aligned (the MAC header is a number of 4-byte units). I'll make a note to add this to mac(9E) or other pages so this is clearer.
  66. Should really be unsigned.
  67. Do we not need to do any host trusting on IPv6/
  68. usr/src/uts/intel/io/hyperv/netvsc/if_hn.c (Diff revision 1)
     
     
     
     
    If these are wrong, then the packet probably shouldn't be trusted. So I think it's probably reasonable to have them.
  69. Shouldn't this actually be DDI_DEVICE_ATTR_V1?
  70. usr/src/uts/intel/io/hyperv/netvsc/if_hn.c (Diff revision 1)
     
     
     
     
     
    I guess I don't understand why this is required? Why don't we just do what normal NIC drivers do and honor the life time? If we don't want DMA memory because this is a hypervisor and it can use other things, then that's fine, we can use the KPM segment, but it seems and the original authors agreed, that this isn't a great idea.
    1. This is linked to the other hack around closing DMA handles.. needs more investigation.

  71. Is this ever actually defined?
  72. usr/src/uts/intel/io/hyperv/netvsc/if_hn.c (Diff revision 1)
     
     
     
     
     
     
     
     
     
    Shouldn't we wait a little bit for a pending TX before we try and tear things down and purposefully leak?
  73. Are we actually linking the channel if we're closing it?
  74. usr/src/uts/intel/io/hyperv/netvsc/if_hn.c (Diff revision 1)
     
     
     
    See note, for most other drivers this is random data so that way someone can't figure out the RSS mappings in play.
  75. usr/src/uts/intel/io/hyperv/netvsc/if_hn.c (Diff revision 1)
     
     
     
     
     
    When we wait for hardware, we have an upper bound on how long we'll wait. Should we do something similar here? Note by specifying this in clock ticks you're going to get variable rates depending on what the current tick rate is. I suspect you really want to use drv_usectohz() to get a consistent time regardless of the setting of hz.
  76. See earlier tick note.
  77. It seems this can fail. If it does, what should we do? Wouldn't this mean that rx basically won't work?
  78. If it's possible to further explain this XXX or why this is an XXX that'd be appreciated.
  79. usr/src/uts/intel/io/hyperv/netvsc/if_hn.c (Diff revision 1)
     
     
     
     
    It would be great if it could explain why this is fatal.
  80. Can we please comment on why we need to disable pre-emption? We're not disabling interrupts, so it feels like it can still race. Is this a major win against a much simpler mutex?
    1. This is a straight import of multi-producer, {single, multi}-consumer lock-less ring buffer from FreeBSD
      https://www.freebsd.org/cgi/man.cgi?buf_ring
      I don't claim to understand it entirely but the intention is that it is lock-less.

  81. Are we guaranteed that only a single instance of the storsvc will ever exist and therefore it makes sense for this to be global and not part of the logical device state?
    1. The variable is set based on the hypervisor version so it should be fine even across multiple instances.

  82. This is definitely incorrect to use here. We do not initialize FM support or ever call ddi_fm_dma_err_get(). This should probably just be set to zero.
  83. Why are we sizing this on the maximum number of CPUs that the kernel supports? Do we want to actually have more sub-channels than CPUs?
  84. same_init(9F) says the name should be set to NULL.
  85. It's not clear to me why failing to send a request should result in us panicking the system here. Could we at least get a comment that explains why that's the case? It's a bit weird when the test function returns a value, but this doesn't.
  86. Another case of sema_init not needing a string.
  87. So... if we hit this case on non-debug we'll return with ret equals zero, suggesting success. Given that this has an assert 0 on debug, this seems bad to drive on.
  88. If this is the case, why are we warning about it but not doing anything about it per se? What action should the user or driver be taking in this case?
  89. Do we have any idea of what the answer to this was? It's hard to imagine a world where that would result in consistency if we are making dependent, non-idempotent changes to the disk unless there's other hashing going on.
  90. Why is this assertion safe? What if it happens on non-debug? We don't seem to handle it.
  91. Is there a reason we're not acting on these XXXs?
  92. Isn't it the case that we should be able to abort commands even when we have guarantees that they'll complete? If this command has taken a long time, say a command that's going to complete in an hour, we still may want to abort it because it's not worth waiting that long.
  93. Why are we saying that we've set this, but ignored the value entirely? If this is intentional because of some aspect of the virtual hardware, surely we should have a comment that says why that makes sense.
  94. The manual pages says that for capabilities that we recognize, then we should return 0 if we can't set it, but for ones we don't, we should return -1. Shouldn't we check these like the ones above?
  95. Why are we setting this same value twice?
  96. What is this XXX about?
  97. We're relying on the upper levels for winindex, which if invalid would cause this all to fail. Shouldn't we check the return value here?
  98. Don't we actually need to do a DMA sync here if we have DMA memory? Maybe scsi_sync_cache_pkt?
  99. usr/src/uts/intel/io/hyperv/storvsc/hv_storvsc_drv_illumos.c (Diff revision 1)
     
     
     
     
     
     
     
     
    It's also fine to have this entry point set to NULL if we don't support this entry point.
  100. I realize that there isn't going to be any desire to simplify this, but if we can actually do device discovery of what LUNs exist via hyper-v like is able to be done with virtio and some others, we should consider future work to move this to SCSAv3, which will simplify a bunch of the device coming and going related logic. Especially if this ever ends up supporting device hotplug at the other end of the bug.
  101. usr/src/uts/intel/io/hyperv/storvsc/hv_storvsc_drv_illumos.c (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
    If we've not implemented abort and we hit a timeout here, is it really safe to just return the packet and call the completion? Won't hyper-v still have the memory here that's related to this? I'm not sure how this can work without us doing something to the virtual controller. Comments that explain the safety would be greatly appreciated.
  102. usr/src/uts/intel/io/hyperv/storvsc/hv_storvsc_drv_illumos.c (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
    A comment about this would really be useful.
  103. usr/src/uts/intel/io/hyperv/storvsc/hv_storvsc_drv_illumos.c (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
    If we timed out, won't we have already called the completion callback, but we won't have set STATE_GOT_STATUS. That's normally done by callers before they call storvsc_complete_command(). So I'm not sure how this is intended to work.
  104. Are there more details of what this is about and what goes wrong at all? It'd be great to know what exactly this is about.
  105. Why is this an XXX?
  106. Conventionally we open the brace on the case line.
  107. usr/src/uts/intel/io/hyperv/storvsc/hv_storvsc_drv_illumos.c (Diff revision 1)
     
     
     
     
     
     
     
     
    How are command aborted when we don't support sending aborts?
  108. usr/src/uts/intel/io/hyperv/utilities/hv_kvp.h (Diff revision 1)
     
     
     
     
     
     
     
     
     
    Are these supposed to be utf-16 values and that's why they're labeled as uint16_t values here? I'm just confused as to what they're supposed to represent when I look at all the casting in the utf8 to utf 16 utilities. If the latter is supposed to work on uint16_t represented values, it's pretty unclear then why it's casting back and forth.
    
    Though looking further, it's not clear if these are supposed to also be thought of as normal characters as we call strlen on these array fields!
  109. usr/src/uts/intel/io/hyperv/utilities/hv_kvp.c (Diff revision 1)
     
     
     
     
     
     
     
    The fact that we're using the same uint16_t based type to represent both utf16 and utf8 strings is really confusing. It makes it very hard to know when we're in one state or the next. The casting back and forth here should make that all kind of clear. Though I'm not sure why we need the casts in the uint16_t cases. They should already be the right type due to C's willingness to take arrays as pointer arguments.
  110. So, I get that you're trying to match devices that have a ddi networking name that are probably like hv_netsvc0, etc. but this strcmp method means you can match anything, including things beyond that. Perhaps we should do this a different way to figure out that it's the right thing. One way would be to use the driver name.
  111. Is there any chance this can overflow and in which case we should error rather than actually give it back to the caller?
  112. Huh, utf16 to utf8 isn't a simple divide by two. UTF-16 is a variable width encoding as some data may be in multiple uint16_t units. Shouldn't we be using the actual size that we get for the value? This is true in all of the places in this function we do something similar.
  113. Strictly speaking an uint32_t is just %u or PRIu32. The cast and all is fine, just seems a bit weird that we're going to a larger type, but it's not incorrect.
  114. Strictly speaking an uint64_t is just %lu in the kernel or PRIu64 if we want to handle different data models. The cast and all is fine, just seems a bit weird that we're going to a larger type, but it's not incorrect.
  115. Why is this assert safe?
  116. usr/src/uts/intel/io/hyperv/utilities/hv_kvp.c (Diff revision 1)
     
     
     
     
     
     
     
     
    open(9E) makes no guarantee that this can't happen in parallel.
    
    This should really check oflags, devtype, and the credentials.
  117. usr/src/uts/intel/io/hyperv/utilities/hv_kvp.c (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
     
     
    This should use locking of the sc state so it's consistent with respect to open(9E).
    
    Also, why isn't the daemon_task cleared out? Shouldn't we set curproc to something else since it's closed the handle?
  118. So, does it really make sense to allow a user process to perform a read that's less than the size of a single message? Shouldn't we make sure there's enough space in the buffer for an entire message? Otherwise, it seems like we need to actually honor uio_offset (should be uio_loffset) and taken into account that we may have partial reads for a given request.
  119. If we're unconditionally setting the offset of the uio to zero here, shouldn't we also be doing so in read? Also, shouldn't we use uio_loffset as we're 64-bit aware?
  120. I'm a little surprised that we're not checking the uio_resid here before we do the uiomove. If the user hasn't given us enough data to form a full request, we don't seem to care in the current form and will just copy zeros. Is that intentional? It seemed like the user and daemon always exchanged a full hv_kvp_msg in each direction.
  121. usr/src/uts/intel/io/hyperv/utilities/hv_kvp.c (Diff revision 1)
     
     
     
     
     
     
     
    The use of daemon busy here doesn't make a lot of sense to me. The reason I'm confused is that open() seems to clear daemon_busy. If the userland daemon crashes handling an event, shouldn't it be able to start back up and resume processing that? I guess this is why it's a bit weird to me that this is the flag that indicates the poll state and not something else. Probably a theory statement would help.
  122. usr/src/uts/intel/io/hyperv/utilities/hv_kvp.c (Diff revision 1)
     
     
     
     
     
     
    Names for these should all be NULL per the manual page.
  123. usr/src/uts/intel/io/hyperv/utilities/hv_kvp.c (Diff revision 1)
     
     
     
     
     
    So, this doesn't make sense to me for a number of reasons. First, for detach to have been called, all of your open handles will have been had to been closed. Secondly, since it's been closed, we can't assume that the proc_t is still valid (in general, it's not good form to try and keep a hold on it). Is there something that this is trying to work around?
    
    I think we could just get rid of this and tracking daemon_task entirely.
  124. Shouldn't we do a ddi_taskq_wait before destroying it here? What makes it impossible for something to have been added right as vmbus_ic_detach was being called and still be processing?
  125. __packed comes from sys/ccompile.h
  126. Why are full non-static inline functions defined in a header file?
  127. Any particular reason we don't care about this failing?
    1. If it fails, vmbus_ic_sendresp() will have already issued a warning to the console. This is a heartbeat response to tell the hypervisor that we are alive. If hearbeats are missed for 1 minute, hyper-v will reset the VM.

  128. s/Fine/Find?
  129. Note, prfind does honor zone context. I don't believe it's possible to run in zone context here, but just wanted to be aware in case it makes more sense to explicitly use prfind_zone?
    1. Being explicit makes sense, thanks.

  130. Note, it's possible that init may crash and be restarting. In that case, it's not clear if we'll find initpp.
    1. In which case I don't think there is any harm in calling halt()

  131. You need to take a hold such as p_lock, if you're going to drop pidlock, otherwise once pidlock is dropped, the process can disappear.
    1. This:

                      mutex_enter(&pidlock);
                      initpp = prfind_zone(P_INITPID, ALL_ZONES);
                      mutex_exit(&pidlock);
                      if (initpp)
                          psignal(initpp, SIGPWR);
      

      seems to be a common construct in the kernel code - I suspect this hyperv code was copied from the xen driver. I can't take p_lock since psignal() takes that at the start.
      It seems to be a really small window where the process can go away and would result in a panic during shutdown, and it's init so it should be there.

      Grateful for any suggestions, but dropping this for now.

    2. Fixed by moving psignal() inside the pidlock mutex hold. This is something that is done in other parts of the kernel.

  132. I would have found the code a bit easier to follow if we actually just passed in and out the cpuid_regs we wanted in places so that way we could remember the ordering and match it up with specifications. Not something that has to change at all, but just wanted to mention it.
    1. I agree, but this is one place I don't want to diverge from the FreeBSD implementation unecessarily, even to define constants for the indices to regs

    2. I changed my mind, I've re-implemented this.

  133. usr/src/uts/intel/io/hyperv/vmbus/hyperv.c (Diff revision 1)
     
     
     
     
    For what it's worth, I would generally suggest zeroing the other registers or the structure so that way you have known values in the other cpuid leaves. Intel and others have started using values in other registers to indicate a subleaf. While I doubt this'll come up in practice, seems like a useful bit of sanity (though I'll admit we're inconsistent about it in cpuid.c).
  134. I don't see any FM support being enabled here or an equivalent being wired up through the use of the bus ops. I don't believe we should be using DDI_DMA_FLAGERR here then.
  135. I think this is actually a V1 structure.
  136. usr/src/uts/intel/io/hyperv/vmbus/hyperv_busdma.c (Diff revision 1)
     
     
     
     
     
    Shouldn't it be fine if the real size is grater than the actual size?
    1. Indeed - I don't think we need a console message about it.

  137. We no longer need a 32-bit copy as there's only a 64-bit x86 kernel.
  138. I don't think the inline needs leading underscores.
  139. usr/src/uts/intel/io/hyperv/vmbus/vmbus.c (Diff revision 1)
     
     
     
     
     
     
     
    Does the hypervisor give us enough information about transitive versus non-transitive failures so we can know if we're just repeating something that'll always fail?
  140. I think using the MIILISEC and MICROSEC bits for this is confusing. Naively, when you see the first assignment, you say, oh, it's starting off as delay 1ms (as it's 1000 in us). However, when you see this, you think it's trying to see if the value is actually 2us, where as it's really treating the maximum as 2000000 which will end up being 2 seconds if I have the math right there.
    1. Added a comment to clarify.

  141. usr/src/uts/intel/io/hyperv/vmbus/vmbus.c (Diff revision 1)
     
     
     
     
     
     
     
    If this is a hot path for storage and networking above, do we really want to block processing on this? It may not be a problem, but this may be a place where using task queues with pre-allocated elements may be useful. Though they aren't part of the DDI.
  142. usr/src/uts/intel/io/hyperv/vmbus/vmbus.c (Diff revision 1)
     
     
     
     
     
    It seems like the pre-emption disable here is important so that we're grabbing and doing MSRs on the current CPU, is that right? It might help to expand that comment or have a bit of a theory statement to that effect.
  143. Feels like a comment describing why a level one interrupt was chosen for this would be useful.
  144. I realize that this is probably the simplest way to get an interrupt vector for this context since there is nothing else, but it may make sense in the fullness of time to add a new type of interrupt so we can just get an arbitrary interrupt that can be programmed without it needing to be one of the special ipi vectors (those are a limited resource) and so then this could use the normal DDI programming interface and not be adding and making calls to this directly.
    1. With help, I'll file a follow-up for this.

  145. Why do we need strncmp here? Haven't we converted the hypervisor data to something we know is a null terminated string?
  146. Lock name not needed

  147. What is this else case for ILP32? Presuambly we want to be explicit about that.
    1. Actually, since bits are only being set and cleared in a single struct member, we don't need full emulation of the FreeBSD test_and_{set,clear}_bit() functions. Just using atomic_{set,clear}_long_excl() is sufficient.

  148. usr/src/uts/intel/io/hyperv/vmbus/vmbus_chan.c (Diff revision 1)
     
     
     
     
     
     
     
    Any idea what are cases that leads to this?
    1. From reading the code, it happens when the hypervisor response to a channel disconnect message with an error code and, upon checking, the channel is still present. There is no additional information in the FreeBSD commit nor in the review https://reviews.freebsd.org/D8569

  149. Should we include the status byte here to help folks understand what went wrong when they report an error? That would only make sense if it's meaningful.
  150. Why is pre-emption being disabled here? What are we trying to race against? This doesn't stop interrupts from occurring.
    1. I don't know, it doesn't seem necessary.
      The FreeBSD code does not disable preemption here.

  151. While DDI_SLEEP is defined to be zero, probably easier for readers if we use the actual constant.
  152. usr/src/uts/intel/io/hyperv/vmbus/vmbus_chan.c (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
     
     
    What happens if we set something a cpu that someone comes and offlines? For example, we disable a CPU after boot to disable HT.
    1. I'll test but as far as I can see, nothing.
      The driver allocates per-CPU task queues and DMA buffers.
      For the task queues, there is no binding to a specific CPU (unlike the FreeBSD driver which binds the task queue to a specific CPU).
      The relevant DMA buffer is used depending on the CPU on which an interrupt is received. If a CPU goes offline, nothing will break.

      This is perhaps something that could be cleaned up or improved later, although I think it is useful to keep the code fairly close to the upstream source.

  153. usr/src/uts/intel/io/hyperv/vmbus/vmbus_var.h (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
    It'd be great if we could add the why to the do not change these comments.
    1. Comment updated. I still don't know, the commit history and hyper-v documentation does not indicate a preferred synthetic interrupt source for these, just that they should be non-zero and different (to avoid event timers being deferred by messages).
      Everything is still bound to the same interrupt vector, although there is a TODO comment to change this.

  154. That page alignment is enough for what?
    1. For any hypervisor message sent using the transaction API. It certainly should be a valid assumption - almost all uses of this cast the DMA memory to a struct.

  155. usr/src/uts/intel/io/hyperv/vmbus/vmbus_xact.c (Diff revision 1)
     
     
     
     
    Maybe a brief comment about why this is a situation to panic in would be useful?
    1. It's a "should never happen" case. Perhaps an assert would be better?

  156. 
      
citrus
  1. 
      
  2. usr/src/cmd/hyperv/kvp/Makefile (Diff revision 1)
     
     
     

    Done.

  3. 
      
Loading...