8955 bound+reserved ports can be leaked (when NFS client reboots too quickly)

Review Request #810 — Created Jan. 9, 2018 and updated


If an NFS client comes and goes too quickly, the result is "zombie" cm_entries because, without the DEAD flag, such entries won't be cleaned up.

We've been using this change in production for at least 2 years. Originally, we had identified how these cm_entries were leaked (thus bound+reserved ports), and verified this change allowed these cm_entries to be cleaned up (releasing these bound+reserved ports).

  1. If I understand the bug report correctly, the problem seen is at the NFS server side, but the fix goes to the client RPC code. Could you please elaborate what is the scenario we are facing here? Thanks.

    1. The problem happens when the NFS server is trying to connect back to the client's NLM service, but since the client rebooted in the meantime its NLM port changed, so the NFS server is unable to connect to the original client's NLM port and such connection is effectively dead. Due to this bug the connection is then leaked with the port in the bound state.

  2. usr/src/uts/common/rpc/clnt_cots.c (Diff revision 1)

    Isn't it possible to run into the same problem when connected == FALSE here?

    1. Sure - so x_dead can be handled identically to x_needdis? I.e. something like "cm_entry->x_dead = cm_entry->x_needdis = (cm_entry->x_connected == FALSE);"?
      Is this also true for the proposed setting of x_dead on line 2402?

    2. Maybe. Definitely, this code smells and it calls for some refactoring.

  3. usr/src/uts/common/rpc/clnt_cots.c (Diff revision 1)

    I think this comment is not true and it should be removed.

    1. As well, remove same comment from ~2407?

    2. Yes, sure. But since this code is poor (see above) you may leave the comment here to keep your change as short as possible.

  1. Ship It!
  1. Ship It!