7865 Allow i40e to use multiple rings

Review Request #365 — Created Feb. 16, 2017 and submitted

pwinder
illumos-gate
general

Added RSS and multi-ring support

Used iperf on 40Gb and 10Gb to verify throughput.
Used dtrace to confirm all rings are in use.

  • 0
  • 0
  • 6
  • 1
  • 7
Description From Last Updated
rm
  1. In general this looks good. Is it possible to rebase this on top of the changes for 25 GbE support, which give a more consistent update to the common code?
  2. usr/src/uts/common/io/i40e/i40e_main.c (Diff revision 1)
     
     
    We should call i40e_check_acc_handle here to make sure that we didn't end up with an invalid PCI read and get a bogus number of interrupts that we request. While we generally check this at the end of various processes, it's worthwhile to do it here before we attempt to request something ridiculous from the OS.
  3. usr/src/uts/common/io/i40e/i40e_main.c (Diff revision 1)
     
     
    Is there any particular reason that you decided to limit this to the number of allocated interrupts as opposed to ixgbe which round robins the rings across multiple interrupts? I don't think there's anything we need to change at this phase, just curious to understand why.
    1. It just didn't cross my mind!

  4. usr/src/uts/common/io/i40e/i40e_main.c (Diff revision 1)
     
     
    There's nothing that guarantees that this value is a power of 2. Shouldn't we be rounding this up to the nearest power of 2 before we try and find the largest set bit. For example, if we have 17 queues, ddi_fls() will only return 16, which would be an incorrect number.
    
    If we have to enable traffic classes, then this also suggests that we need to actually put a cap on the number of interrupts or transmit queue pairs that we end up using to make sure that we are within the power of two and don't exceed 64 (the maximum value that can be put in this register).
    1. If we had 17 queues the ddi_fls would return 5 (last bit set is 5 counting from 1). And from Section 10.2.2.19.3, the value in the register is represented as 2^^n where n is tc_queues in this case. So, 2^^5 is 32. If we were to round up in this case it would be 2^^6 which is 64.
      I will put a cap of 64 (based on the mask) for the number of traffic queues.

  5. usr/src/uts/common/io/i40e/i40e_main.c (Diff revision 1)
     
     
    I really don't want to follow the model of the other drivers and have an ifdef for this support. This is also the only occurrence of this. Either we know this will work for the 722 and we shouldn't ifdef it or we should remove it all.
    
    Since we're going to have to do other work for the 722, I think we should remove this for now and add a comment that the 722 hardware needs to be set up differently.
    1. I based it on the model in i40e_common.c
      I will remove all X722 defines I have added.

  6. usr/src/uts/common/io/i40e/i40e_main.c (Diff revision 1)
     
     
    Why should we use the existing value in the register here to determine what the hashing is? Given that it's initialized to zero, if for some reason it was non-zero, what would have caused that and why would we want to use that?
    1. No reason. I will removed the read and just assigned the PCTYPES

  7. usr/src/uts/common/io/i40e/i40e_main.c (Diff revision 1)
     
     
    Please do not use a blocking allocation here, please use KM_NOSLEEP and handle failure. This is happening in the context of the GLDv3 mc_start(9E) entry point, which can be at an arbitrary point in the kernel's lifetime. The surrounding callers all use KM_NOSLEEP.
    1. I did consider make this an array allocated on the stack - but I wasn't keen, it has a max size of 512. How do feel about that?
      Otherwise I'll make it a KM_NOSLEEP.

  8. usr/src/uts/common/io/i40e/i40e_main.c (Diff revision 1)
     
     
    See earlier comment on the X722.
  9. 
      
pwinder
daleg
  1. One thing I'm worried about is the departure from the FreeBSD source of the core files this incurs. On a purely technical level, I'm not bothered by the changes to files under i40e/core, but the one thing I am worried about is these external-to-FreeBSD changes getting lost or mangled in the future when any updates pulled in from FreeBSD.

    Maybe leave a breadcrumb of some sort, or create something akin to a #ifdef ILLUMOS and wrap these changes in it? Is this even something worth worrying about?

    1. The changes to i40e_common.c were taken verbatim from FreeBSD, so were the changes to i40e_prototype.h and i40e_adminq_cmd.h (with the exception of one additional #define). From what I can tell we have already picked and chosen which bits of the common code to pull from FreeBSD. Robert did the initial version, I'd like to read his opinion. If we add the #ifdef now, it would only apply to the changes here and not the driver as a whole, so would be somewhat inconsistent.

    2. Oh, alright. That's all I was worried about... if the changes under i40e/core are indeed just newer revs of those files being pulled in from FreeBSD, then there's no problem.

    3. We don't pick and choose what we pull from FreeBSD. I update the common code from Intel 100% when we do updates. But since I'll be following up with the XXV710 support shortly, there's no need for Paul to double that effort. If we need to add information specific to us, then it should never go in the i40e/core files. What's currently here is fine.
  2. 
      
pwinder
  1. 
      
  2. usr/src/uts/common/io/i40e/i40e_main.c (Diff revision 2)
     
     

    There is a typo here. It should say 2^^6 = 64

  3. 
      
rm
  1. Excepting the typo you already mentioned, the follow ups all look good here. Thanks Paul!
  2. 
      
pwinder
Review request changed

Status: Closed (submitted)

Loading...