xnf calls ddi_dma_nextcookie too many times and panics the system

Review Request #2553 — Created May 9, 2020 and discarded

andy_js
illumos-gate
12712
general

I can't provide a stack trace unfortunately as I've only observed this issue in Amazon's EC2 but it's pretty obvious what's happening from looking at the code: it's calling ddi_dma_nextcookie at the end of a loop in xnf_mblk_map. The call on the final iteration tickles the check in ddi_dma_nextcookie causing the system to panic.

With the patch applied the system no longer panics and xnf is usable.

  • 1
  • 0
  • 0
  • 0
  • 1
Description From Last Updated
Where this has been fixed elsewhere, these loops were changed into something like: const ddi_dma_cookie_t *c; for (c = ddi_dma_cookie_iter(dma_handle, ... citrus citrus
citrus
  1. 
      
  2. usr/src/uts/common/xen/io/xnf.c (Diff revision 1)
     
     
     
     

    Where this has been fixed elsewhere, these loops were changed into something like:

    const ddi_dma_cookie_t *c;
    
    for (c = ddi_dma_cookie_iter(dma_handle, NULL);
    c != NULL; c = ddi_dma_cookie_iter(dma_handle, c)) {
    

    I think that's cleaner and easier to follow.

    1. Perhaps... but then I have to define 'i' outside the loop and then increment it inside. I'd rather not.

    2. Moved to Gerrit: https://code.illumos.org/c/illumos-gate/+/676

    3. As we've noted in ddi_dma_cookie_iter(9F), the ddi_dma_nextcookie() API is to be avoided at all costs:

           The ddi_dma_nextcookie() function was the historical function that was
           associated with obtaining DMA cookies.  It should not be used due to
           several flaws.  The ddi_dma_nextcookie() function mutates the underlying
           DMA handle meaning that a driver cannot obtain a cookie a second time and
           thus a device driver using this interface must either manually keep storage
           of the cookie around wasting space or rebind the handle, wasting time.  In
           addition, there is no way for the function to indicate that a driver has
           consumed all of its cookies.  If for some reason a device driver calls the
           ddi_dma_nextcookie() function more times than there are cookies, the
           results are undefined.  In short, this function should not be used for any
           purpose.  Use the ddi_dma_cookie_iter(), ddi_dma_cookie_get(), or
           ddi_dma_cookie_one() functions instead.
      

      Looking at the code, it seems like the only reason we're using i at all is to:

      • Do something different on the first cookie
      • Keep track of how many cookies there are

      Indeed, in the new API approach, you should actually not be passing a cookie or cookie count pointer to ddi_dma_addr_bind_handle(9F) at all:

             cookiep
                          A pointer to the first ddi_dma_cookie(9S) structure.  This
                          should be left as NULL in new callers.
      
      
             ccountp
                          Upon a successful return,  ccountp points to a value
                          representing the number of cookies for this DMA object.
                          This can be left as NULL in new callers.  The cookie count
                          can be obtained by calling ddi_dma_ncookies(9F).
      

      Note in particular that for ddi_dma_cookie_iter() you must have initialised the prior cookie pointer to NULL, which you can use to discriminate between the first cookie and the subsequent cookies.

    4. OK. I will see what I can do.

  3. 
      
andy_js
Review request changed

Status: Discarded

Loading...