deadman panic due to mutex lock contention inside lookuppnatcred()

Review Request #664 - Created Sept. 21, 2017 and updated

Information
Youzhong Yang
illumos-gate
8642
Reviewers
general

User application doing fstatat() on entries in the same folder can induce a deadman panic. The fix is quite simple - if there's startvp, release the process lock and then do VN_HOLD().

Using the test program attached to the bug report, the panic was not reproducible. In addition, we also ran our in-house load testing, which caused server crashes frequently.

C Fraire
LGTM
Jerry Jelinek

As I mentioned in email to illumos-developer and in my comments on the bug report, this fix is incorrect and should be integrated. -1

  1. I don't think thread fffffea35f491b00 was stuck. mutex_adaptive_tryenter+0x10 is the address right after "lock cmpxchgq %rdx,(%rdi)", i.e. the thread succeeded in getting the lock. That's why "fffffea358790740::mutex" tells the mutex is held by the thread itself. Why would you think the thread is stuck there?

    fffffea35f491b00::findstack -v
    stack pointer for thread fffffea35f491b00: fffffe4229785700
    fffffe42297858f0 mutex_adaptive_tryenter+0x10()
    fffffe4229785920 vn_rele+0x24(fffffea358790740)
    fffffe4229785b90 lookuppnvp+0xcd3(fffffe4229785c70, 0, 1, 0, fffffe4229785e68, fffffea327222840, fffffea358790740, fffffea472018648)
    fffffe4229785c30 lookuppnatcred+0x176(fffffe4229785c70, 0, 1, 0, fffffe4229785e68, fffffea358790740, fffffea472018648)
    fffffe4229785d40 lookupnameatcred+0xdd(fffffd7fffdff642, 0, 1, 0, fffffe4229785e68, fffffea358790740, fffffea472018648)
    fffffe4229785d90 lookupnameat+0x39(fffffd7fffdff642, 0, 1, 0, fffffe4229785e68, fffffea358790740)
    fffffe4229785e30 cstatat_getvp+0x10f(3, fffffd7fffdff642, 1, fffffe4229785e68, fffffe4229785e60)
    fffffe4229785ed0 cstatat+0x6f(3, fffffd7fffdff642, fffffd7fffdffa70, 0, 10)
    fffffe4229785f00 fstatat+0x42(3, fffffd7fffdff642, fffffd7fffdffa70, 0)
    fffffe4229785f10 sys_syscall+0x19f()
    fffffea358790740::mutex
    ADDR TYPE HELD MINSPL OLDSPL WAITERS
    fffffea358790740 adapt fffffea35f491b00 - - yes

  2. How could the thread not be stuck here? We see in the dump that for this mutex, the thread owns the mutex (so the cmpxchgq instruction worked as expected) and the code in the function is not blocking in any way, so why is the thread stuck here until the deadman fires? To be clear, the lock prefix on the cmpxchgq shouldn't block. According to the Intel developers guide, lock does:

    Causes the processor’s LOCK# signal to be asserted during execution of the accompanying instruction (turns the instruction into an atomic instruction). In a multiprocessor environment, the LOCK# signal ensures that the processor has exclusive use of any shared memory while the signal is asserted.
    

    So, it appears that somehow this CPU is stuck here.

  3. I take back my previous reply, because the dump shows _m_owner = 0xfffffea35f491b01, so what I wrote does not stand.

    There's one thing I don't understand:

    > fffffea35f491b00::findstack -v
    stack pointer for thread fffffea35f491b00: fffffe4229785700
      fffffe42297858f0 mutex_adaptive_tryenter+0x10()
      fffffe4229785920 vn_rele+0x24(fffffea358790740)
      fffffe4229785b90 lookuppnvp+0xcd3(fffffe4229785c70, 0, 1, 0, fffffe4229785e68, fffffea327222840, fffffea358790740, fffffea472018648)
      fffffe4229785c30 lookuppnatcred+0x176(fffffe4229785c70, 0, 1, 0, fffffe4229785e68, fffffea358790740, fffffea472018648)
      fffffe4229785d40 lookupnameatcred+0xdd(fffffd7fffdff642, 0, 1, 0, fffffe4229785e68, fffffea358790740, fffffea472018648)
      fffffe4229785d90 lookupnameat+0x39(fffffd7fffdff642, 0, 1, 0, fffffe4229785e68, fffffea358790740)
      fffffe4229785e30 cstatat_getvp+0x10f(3, fffffd7fffdff642, 1, fffffe4229785e68, fffffe4229785e60)
      fffffe4229785ed0 cstatat+0x6f(3, fffffd7fffdff642, fffffd7fffdffa70, 0, 10)
      fffffe4229785f00 fstatat+0x42(3, fffffd7fffdff642, fffffd7fffdffa70, 0)
      fffffe4229785f10 sys_syscall+0x19f()
    

    stack pointer for thread is fffffe4229785700, but frame pointer of mutex_adaptive_tryenter() is fffffe42297858f0, so why such a big gap (fffffe42297858f0 - fffffe4229785700 = 1F0)? maybe the thread was interrupted? if that's the case, then most likely this 'mutex_adaptive_tryenter+0x10' cannot be trusted, for the reason that ::findstack results can sometimes mislead.

  4. 0x1f0 doesn't seem like a big gap. It's 496 bytes, which isn't very much for this stack. However, I do agree that this dump is odd and I can't explain why this thread is not making progress. It appears that this thread is the key thing blocking the system and triggering the deadman. It looks like we're exactly where the thread is releasing the vnode hold, it has acquired the mutex to do that, but then it doesn't continue to make progress. I cannot explain why that is, based on what we see in this dump and the fact that mutex_adaptive_tryenter is a non-blocking call and the fact that this thread has acquired the mutex. All I can think of is some kind of HW issue (especially given the hyperthreading problems on this box), but I don't know what the actual problem might be.

  5. I too am -1 on this change for the time being. While it's not clear what the root cause is, changing the order of lock manipulation should not change the outcome in this specific instance.

  6. It remains mysterious that disabling hyper-threading works well.

    Well, the change does help in our environment. I know in SmartOS 'deadman' is disabled by default, but we have it enabled. Without the change, we had one OS panic per day accross a bunch of servers(with variety of Intel CPUs). Since we deployed the change, zero panic. What happens in the real world matters, right? and I don't see what kind of harm this change can cause.

    I will try to generate another crash dump on a server with different CPU and update the bug report.

    By the way, I think it is a fact that "::findstack -v" sometimes produce funky results.

Loading...