11329 improved Virtio framework

Review Request #2068 - Created July 8, 2019 and updated

Information
Joshua Clulow
illumos-gate
up/10902
7366, 10012, 11329
bae945e...
Reviewers
general

11329 improved Virtio framework
10012 vioblk should not accept an all-zero serial number
7366 vioif happily creates rx descriptors until it consumes all memory

NOTE: I recommend starting with the block comment in virtio.h.


  

Issues

  • 26
  • 29
  • 5
  • 60
Description From Last Updated
This is a DDI_DEVICE_ATTR_V1 structure (V0 does not have .devacc_attr_access, or at least that field is ignored with V0). It ... Andy Fiddaman Andy Fiddaman
Should we file a future RFE to cover enabling / toggling the write cache? Robert Mustacchi Robert Mustacchi
QEMU 0.14.x further says it was supposed to be modeled off of the ATA IDENTIFY command and was considered deprecated ... Robert Mustacchi Robert Mustacchi
It'd be worth a comment about how you're always using the dma descriptor that's part of the chain to cover ... Robert Mustacchi Robert Mustacchi
If we're making copies can it be const? Robert Mustacchi Robert Mustacchi
Is this just a sanity check that we're doing? If so do we want to think about overflow? Robert Mustacchi Robert Mustacchi
ENOTTY and ENXIO seem really weird here, but I guess that's what was already done. Robert Mustacchi Robert Mustacchi
Is seg_max explicitly just for data and it's safe for us to add these? Robert Mustacchi Robert Mustacchi
Why is this kstat persistent? Robert Mustacchi Robert Mustacchi
I see that while we get the RO flag here and we pass it to blkdev in the media info, ... Robert Mustacchi Robert Mustacchi
Normally these are non-sleeping allocations for NIC drivers as they are being done in the mc_start() entry point. I guess ... Robert Mustacchi Robert Mustacchi
You comment elsewhere that the length value isn't always correct or safe to use. Perhaps it's worth noting why it's ... Robert Mustacchi Robert Mustacchi
I was a little surprised to see that we don't allow the host to checksum anything for us. Maybe we ... Robert Mustacchi Robert Mustacchi
To protect against future refactoring, should we make sure that this conditional also includes !vif_tx_drain? Robert Mustacchi Robert Mustacchi
Is there some point at which we should time this out and move to reset the device? Robert Mustacchi Robert Mustacchi
Do we want to do this every tx or only when we've crossed some threshold? Robert Mustacchi Robert Mustacchi
Any particular reason that you dropped support for VIRTIO_NET_F_STATUS? Robert Mustacchi Robert Mustacchi
This should be an unsigned long to match the actual types of the txcopy/rxcopy thresholds. While I know you constrain ... Robert Mustacchi Robert Mustacchi
Since you're not constructing the buffer to know that it fits, you need to check the return value of snprintf ... Robert Mustacchi Robert Mustacchi
ether_sprintf() seems highly dubious and not safe! It uses a single shared global kernel buffer. This seems like something we ... Robert Mustacchi Robert Mustacchi
Shouldn't we destroy the mutex and list we just created before freeing the memory? Robert Mustacchi Robert Mustacchi
Any reason not to check for the overflow here? Robert Mustacchi Robert Mustacchi
It's not clear to me what this is referring to here, fwiw. I think it's referring to how we're basically ... Robert Mustacchi Robert Mustacchi
If we need to take a mutex to return this information, how does the caller manage to use it correctly ... Robert Mustacchi Robert Mustacchi
If this comment is accurate, then shouldn't this be placed after we actually read the index at line 845? Otherwise, ... Robert Mustacchi Robert Mustacchi
If this is just returning DDI_SUCCESS and failure maybe we should just use a boolean_t rather than further overload the ... Robert Mustacchi Robert Mustacchi
Joshua Clulow
Review request changed

Description:

   

11329 improved Virtio framework

    10012 vioblk should not accept an all-zero serial number
    7366 vioif happily creates rx descriptors until it consumes all memory

  +
  +

NOTE: I recommend starting with the block comment in virtio.h.

Andy Fiddaman

   
usr/src/uts/common/io/vioblk/vioblk.h (Diff revision 1)
 
 

It seems wrong to me to use an enum for things that are actually flag bits. Does this cause any problems with compiler checks? If two bits are set, the value of a vioblk_req_status_t does not match any of the enum values.

  1. The reason I'm using an enum for this is so that MDB is able to render the field in terms of the constant symbols (using the CTF data) rather than just as a number. As far as I can tell, the compiler doesn't (at least currently) have a problem with this. I'm not using any switch statements with this value, which is where I'd expect to run into trouble.

  2. Interesting, thanks for the explanation!

usr/src/uts/common/io/vioblk/vioblk.c (Diff revision 1)
 
 

VERIFY0? I'm not sure what cstyle says about using ! for non-boolean operations.

  1. In this case, I'm using vbr_status as a set of flags so though the result isn't strictly 1 or 0, it is (in the C sense) more boolean than int/bitfield. I thought it was clearer to write this in the boolean style, as that's the intent of the comparison here; i.e., "Verify that the request is not COMPLETE".

This is a DDI_DEVICE_ATTR_V1 structure
(V0 does not have .devacc_attr_access, or at least that field is ignored with V0). It seems that our man page needs a fix.

  1. Ah, thanks! I'll file a follow-up ticket to fix the manual page.

Robert Mustacchi
Thanks for putting this together. The new virtio framework looks like this has greatly simplified things and will make it easier to add newer virtio devices, which is pretty great.
usr/src/uts/common/io/vioblk/vioblk.h (Diff revision 1)
 
 
Shouldn't this be BLK and not net?
usr/src/uts/common/io/vioblk/vioblk.h (Diff revision 1)
 
 
 
 
 
 
 
Should we file a future RFE to cover enabling / toggling the write cache?
usr/src/uts/common/io/vioblk/vioblk.h (Diff revision 1)
 
 
Any particular reason the data array isn't here like in the spec? I guess it's not necessary.
usr/src/uts/common/io/vioblk/vioblk.h (Diff revision 1)
 
 
 
 
 
QEMU 0.14.x further says it was supposed to be modeled off of the ATA IDENTIFY command and was considered deprecated during its era.
usr/src/uts/common/io/vioblk/vioblk.c (Diff revision 1)
 
 
It'd be worth a comment about how you're always using the dma descriptor that's part of the chain to cover the status and header information. Also talking about how that DMA memory is allocated, etc. I think that and a little bit about the chain management and other things might be helpful.
usr/src/uts/common/io/vioblk/vioblk.c (Diff revision 1)
 
 
 
 
 
 
 
If we're making copies can it be const?
usr/src/uts/common/io/vioblk/vioblk.c (Diff revision 1)
 
 
VIOBLK_REQSTAT_ALLOCATED seems a bit weird to me right now. It's always set and never appears to be cleared. Did you mean to clear it before putting it back on the list in vioblk_req_free()?
usr/src/uts/common/io/vioblk/vioblk.c (Diff revision 1)
 
 
 
 
I find this comment a little confusing. I get why for the asynchoronous approach it has to be freed there, but it leaves the reader a question of what happens in the polled case which is a bit of an adventure to determine.
usr/src/uts/common/io/vioblk/vioblk.c (Diff revision 1)
 
 
Perhaps this should assert that the vib_mutex is held like in the vioblk_rquest path?
usr/src/uts/common/io/vioblk/vioblk.c (Diff revision 1)
 
 
Is this just a sanity check that we're doing? If so do we want to think about overflow?
usr/src/uts/common/io/vioblk/vioblk.c (Diff revision 1)
 
 
 
 
 
 
 
ENOTTY and ENXIO seem really weird here, but I guess that's what was already done.
usr/src/uts/common/io/vioblk/vioblk.c (Diff revision 1)
 
 
Shouldn't this be BLK and not NET? Though it is correctly queue zero.
usr/src/uts/common/io/vioblk/vioblk.c (Diff revision 1)
 
 
Is seg_max explicitly just for data and it's safe for us to add these?
usr/src/uts/common/io/vioblk/vioblk.c (Diff revision 1)
 
 
Why is this kstat persistent?
usr/src/uts/common/io/vioblk/vioblk.c (Diff revision 1)
 
 
I see that while we get the RO flag here and we pass it to blkdev in the media info, that doesn't stop us from issuing write commands. Does that make sense?
usr/src/uts/common/io/vioblk/vioblk.c (Diff revision 1)
 
 
Should we check for PCI_EINVAL32?
usr/src/uts/common/io/vioif/vioif.h (Diff revision 1)
 
 
 
 
 
 
 
 
 
 
So... what you actually want is that the IP header is 4 byte aligned. You don't want the MAC header to be 4 byte aligned, instead you want it on a 2-byte phase.
usr/src/uts/common/io/vioif/vioif.h (Diff revision 1)
 
 
 
 
 
 
 
 
 
 
FWIW, I try to avoid bitfields in structs and instead just use flags for things like this. Doesn't matter either way.
usr/src/uts/common/io/vioif/vioif.c (Diff revision 1)
 
 
 
 
 
If they're going to be NULL, I don't think you ned to specify them in as explicitly initialized.
usr/src/uts/common/io/vioif/vioif.c (Diff revision 1)
 
 
Why don't we care if this fails to add a buffer? Might be worth a comment. I have a few guesses, but I'm not sure which, if any, is right.
usr/src/uts/common/io/vioif/vioif.c (Diff revision 1)
 
 

FWIW, I find 'mems' an odd term.

usr/src/uts/common/io/vioif/vioif.c (Diff revision 1)
 
 
Normally these are non-sleeping allocations for NIC drivers as they are being done in the mc_start() entry point. I guess in this case you're doing it from attach, so it should be fine, but it's just a little different from other NIC drivers.
usr/src/uts/common/io/vioif/vioif.c (Diff revision 1)
 
 
 
 
See earlier notes, but this is the place you want to make sure that the ethernet header is actually being started with at a 2-byte phase offset so that IP is 4 byte aligned. As in the addr % 4 == 2.
usr/src/uts/common/io/vioif/vioif.c (Diff revision 1)
 
 
You comment elsewhere that the length value isn't always correct or safe to use. Perhaps it's worth noting why it's ok for us to do so here?
usr/src/uts/common/io/vioif/vioif.c (Diff revision 1)
 
 
I was a little surprised to see that we don't allow the host to checksum anything for us. Maybe we should file an RFE to cover doing so in the future?
usr/src/uts/common/io/vioif/vioif.c (Diff revision 1)
 
 
 
 
 
So there's no reason that you need to do this outside of the lock, that I know of. I'm not sure either way if we need to consider the implications and possibilities for a race in updates and saying it's corked or not.
usr/src/uts/common/io/vioif/vioif.c (Diff revision 1)
 
 
To protect against future refactoring, should we make sure that this conditional also includes !vif_tx_drain?
usr/src/uts/common/io/vioif/vioif.c (Diff revision 1)
 
 
 
 
 
 
 
 
 
 
 
Is there some point at which we should time this out and move to reset the device?
usr/src/uts/common/io/vioif/vioif.c (Diff revision 1)
 
 
 
 
 
 
This goto fail ends up returning B_TRUE to suggest that we've consumed the mblk_t to vioif_send. However, the mblk_t hasn't actually been consumed. The actual failure paths of the vioif_tx_inline and vioif_tx_external will always consume the mblk_t. I think we need to either do a freemsg() here before the goto fail or we need to do something else.
usr/src/uts/common/io/vioif/vioif.c (Diff revision 1)
 
 
 
 
 
 
 
 
Do we want to do this every tx or only when we've crossed some threshold?
usr/src/uts/common/io/vioif/vioif.c (Diff revision 1)
 
 
Any particular reason that you dropped support for VIRTIO_NET_F_STATUS?
usr/src/uts/common/io/vioif/vioif.c (Diff revision 1)
 
 
Is there any particular reason we don't care about the number of rx buffers that we added here?
usr/src/uts/common/io/vioif/vioif.c (Diff revision 1)
 
 
This should be an unsigned long to match the actual types of the txcopy/rxcopy thresholds. While I know you constrain it in the setprop to things that'll always fit in a uint_t, I don't think it makes sense to encode that assumption in other parts. If it does, then the types should be changed.
usr/src/uts/common/io/vioif/vioif.c (Diff revision 1)
 
 
Since you're not constructing the buffer to know that it fits, you need to check the return value of snprintf and return EOVERFLOW if it doesn't fit.

Also, vif_tx_copy_thresh and vif_rxcopy_thresh are unsigned long values, the percent format needs a %l (once the truncation to value is fixed).
usr/src/uts/common/io/vioif/vioif.c (Diff revision 1)
 
 
ether_sprintf() seems highly dubious and not safe! It uses a single shared global kernel buffer. This seems like something we shouldn't use!
usr/src/uts/common/io/vioif/vioif.c (Diff revision 1)
 
 
FYI, this doesn't have to be kept around after mac_register succeeds.
usr/src/uts/common/io/vioif/vioif.c (Diff revision 1)
 
 
There's no need to keep the registration around. It can be freed immediately after you call mac_register(), fwiw.
usr/src/uts/common/io/virtio/virtio.h (Diff revision 1)
 
 
Can you add a comment about endianness here somewhere? Mostly that everything is in host endianness for legacy devices which is why folks don't see anything related to endianness and that if this needs to be adapted for 1.x it will need to be done? It may also be useful to indicate that no one really supports 1.x which is why this only is legacy.

It might also be useful to talk a bit somewhere (maybe virito_main.c) about how you chose to place membars and when they are needed and not needed.

Similarly, there isn't much talking about the lock ordering between vio_mutex and the queues. This probably would be useful to include.
usr/src/uts/common/io/virtio/virtio.h (Diff revision 1)
 
 
How does one reset it?
usr/src/uts/common/io/virtio/virtio.h (Diff revision 1)
 
 
This is a bit weird since descriptor chains and submitting them or receiving them haven't been covered at all at this point. It may make sense to move this after the descriptor chain section.
usr/src/uts/common/io/virtio/virtio.h (Diff revision 1)
 
 

Maybe mention an example of what would be considered synchronous or that it doesn't apply?

init_early() threw me off for a bit. It made me think that this should only be used early in boot or something similar. It seems like the point of it is to only allocate the handle so it can be used for binding cases.
usr/src/uts/common/io/virtio/virtio_dma.c (Diff revision 1)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
I'm not sure that it's required, but often after allocating DMA memory I do a bzero to scrub it to make sure that it's in a sane state as I'm not sure the framework makes any guarantees about its contents and I'd rather not leak data if we could avoid it.
There's nothing to change per se, but it took me a moment to realize that the explanation was below the bit. But I know sometimes that style makes more sense.
Is there a reason that these and the other attributes aren't const? It seems that we're passing them directly to a number of different routines, which can result in them changing the actual DMA attributes down the line. We probably want to actually copy them to a per-usage value.
Shouldn't this be DDI_DEVICE_ATTR_V1? Especially if you're specifying .devacc_attr_access.
usr/src/uts/common/io/virtio/virtio_main.c (Diff revision 1)
 
 
 
Currently all the consumers of this cast it's return value away. This only seems to return success. Is there any reason we bother with a return value then?
I find having a macro for the register number in the implementation header with a comment is useful. YMMV.
Shouldn't we destroy the mutex and list we just created before freeing the memory?
usr/src/uts/common/io/virtio/virtio_main.c (Diff revision 1)
 
 
 

It's not clear to me from the spec that we've ever promissed a 64-bit read to necessairily be plausible, fwiw.

Any reason not to check for the overflow here?
Can't this fail?
It's not clear to me what this is referring to here, fwiw. I think it's referring to how we're basically doing a single large DMA allocation and parcelling it up between the driver and device parts.
Should we pre-emptively zero the rest of this so that way if we ever use other fields we're not using stack garbage in the search?
usr/src/uts/common/io/virtio/virtio_main.c (Diff revision 1)
 
 
 
 
 
 
 
 
 
 
If we need to take a mutex to return this information, how does the caller manage to use it correctly as they won't be holding the queue lock and it can change out from under them?
usr/src/uts/common/io/virtio/virtio_main.c (Diff revision 1)
 
 
 
 
 
 
 
If this comment is accurate, then shouldn't this be placed after we actually read the index at line 845? Otherwise, it's not clear to me what we're trying to protect in this case.
I realize the != NULL check here is a bit irrelevant for the vic_indirect_dma cases because the current label can only be reached if other stuff is bad. Still, I wonder if we should then be putting a check in virtio_dma_fini() to allow for a NULL argument in case something like this ever gets refactored.
If this is just returning DDI_SUCCESS and failure maybe we should just use a boolean_t rather than further overload the values? Though I can also see how it's being used directly for users and in that case you'd rather use the DDI values.
usr/src/uts/common/io/virtio/virtio_main.c (Diff revision 1)
 
 
 
I think this comment will be clearer if you make it clear the writes to the descriptors have been done in other threads possibly and that the gotcha is making sure those have all posted before it reads and sees the following change. I'm still trying to convince myself we don't need another membar after we set the index, but I'm not sure.
usr/src/uts/common/io/virtio/virtio_main.c (Diff revision 1)
 
 
 

This comment threw me off because the code has the length in bytes, which is right. Maybe clarify in the comment it's the length in bytes? I thought it was just the number of entries based on the comment.

How often do we fall back to FIXED interrupts? Is there ever a case where we should just used a shared MSI-X rather than a shared legacy interrupt?

This appears unused?

usr/src/uts/common/io/virtio/virtio_main.c (Diff revision 1)
 
 
 
 
 
 
 
 
 
 
 
 
 
When using INTX interrupts, none of the queues will have viq_handler_added. It doesn't appear we'll remove the interrupt handler otherwise which I think we likely still need to do in that case before we free them below.
Is it worth logging an error if this fails?
usr/src/uts/common/io/virtio/virtio_main.c (Diff revision 1)
 
 
 
 
 
 
 
How do we distinguish between MSI zero being successfully programmed and it ignoring the write?
Loading...