9881 smbd terminated by SIGABRT after smb_account_free()

Review Request #1229 — Created Oct. 10, 2018 and submitted

vgusev
illumos-gate
master
9881
39a1b92...
general
gwr, jbk

Fix double free when lsa_lookup_sid() failed

Double free can occur if lsa_lookup_sid() returns error with polluted @info argument.

Vulnerable are lsa_LookupSids and lsa_LookupSids2 calls.

Before fix:

~# rpcclient -U ""%"" -c "lookupsids S-1-5" 192.168.1.18
result was NT_STATUS_IO_TIMEOUT

And core files at smb server side (/core.smbd.1538999930) .

After fix:

~# rpcclient -U ""%"" -c "lookupsids S-1-5" 192.168.1.18
S-1-5 unknown*unknown* (8)

No core files at server side.

  • 0
  • 0
  • 0
  • 6
  • 6
Description From Last Updated
vgusev
vgusev
jbk
  1. 
      
    1. I hope that I answer for all questions. If I drop important note, please fill free to re-open it.

  2. Couldn't you just have smb_account_free() reset/clear all the values after it free's them?

      1. It is additional operations whereas is the most cases zeroing is not required, in short this lead to waste CPU cycles consumption.
      2. Also it s not encouraged (especially in Linux developement practices), because:
        a. Promotes to use nnn_free() any time even it is not required (just in advance).
        b. Clears information needed for debugging.

      3. This could lead to bad desinging code. For example, consider code like this that avoids deadlock:

        if (!mutex_locked(m))
        lock_mutex(m)

      4. This could lead to this one:

        smb_account_free(p)
        ...
        smb_account_free(p)
        ...
        smb_account_free(p)

      And this would work, but who will care and catch this bad code if it works ?

      In short, if most of calls smb_account_free() needed this fixing, it would be proper solution to move bzero to smb_account_free(). But currently it is just some of them.

  3. 
      
gwr
  1. 
      
  2. That's the same suggestion I made in a comment on the issue.
    There are quite a few places in this file that potentially have the same problem. Having smb_account_free clear out the fields it frees would solve that problem for all those call sites.

    1. I went back and looked at the other call sites for smb_account_free and they look OK.
      Given that, I'm OK with adding a couple bzero calls to lsar_svc.c

  3. 
      
gwr
  1. 
      
  2. I don't see why we need another smb_account_free call here.
    Just adding the bzero is enough (but let's put it at the end of the loop, just after the existing call to smb_account_free.

    1. smb_account_free is required because account is valid and filled in case of result == NT_STATUS_SUCCESS.

      If we just call bzero() without freeing before, it will lead to memory leaks.

    2. I think your evaluation is incomplete. That's not the problem.

      I tried your test case with a debugger stop in there.
      After fixing the two missing bzero calls at the end of each loop,
      I observe that when lsa_lookup_sid returns an error, it leaves
      garbage in the smb_account_t. I don't think it should do that.

      Here's what I'd suggest for fixing that problem:

      diff --git a/usr/src/lib/smbsrv/libmlsvc/common/lsalib.c b/usr/src/lib/smbsrv/li
      bmlsvc/common/lsalib.c
      index e995cf1c68..6eadf8ab58 100644
      --- a/usr/src/lib/smbsrv/libmlsvc/common/lsalib.c
      +++ b/usr/src/lib/smbsrv/libmlsvc/common/lsalib.c
      @@ -333,6 +333,7 @@ lsa_lookup_name_builtin(char *domain, char *name, smb_accoun
      t_t *info)
      
              if (!smb_account_validate(info)) {
                      smb_account_free(info);
      +               bzero(info, sizeof (smb_account_t));
                      return (NT_STATUS_NO_MEMORY);
              }
      
      @@ -528,6 +529,7 @@ lsa_lookup_sid_builtin(smb_sid_t *sid, smb_account_t *ainfo)
      
              if (!smb_account_validate(ainfo)) {
                      smb_account_free(ainfo);
      +               bzero(ainfo, sizeof (smb_account_t));
                      return (NT_STATUS_NO_MEMORY);
              }
      

      With those (now four) places clearing memory after smb_account_free, the test goes OK.

    3. BTW, the comment about "evaluation incomplete" might look derogatory, but that was not my intent.
      The key thing that was not obvious after reading the bug and the changes was the fact that
      we're getting junk from lsa_lookup_sid when it returns an error. That's all I meant.

    4. I beleive it was complete, because in the review header I placed:

      "Double free can occur if lsa_lookup_sid() returns error with polluted @info argument".

      Anyway I don't vote for proposed fix, because this still missing another functions that can pollute @info argument, such as:

      lsar_lookup_sids(), smb_sam_lookup_sid()

      and maybe other.

      In other words, it has potential possibility now or in the future, that bug returns. In the future, I mean if somebody add changing to lsa_lookup_sid() or its sub-calls.

    5. Additionally note, the proposed fix doesn't care for case when @account is freed in the loop at lsarpc_s_LookupSids() in the line 847.

      Consider case, if you apply your fix and:
      1-st loop: @account is freed at line 847
      2-nd loop: lsa_lookup_sid() is failed with NT_STATUS_INVALID_SID, i.e. during check smb_sid_isvalid(sid).

      You will get double free at the line 847 or line 865.

    6. If the account is freed at line 847, the account into would be cleared out by a bzero on line 848 (inserted in the fix).
      Once that hapens, I don't think there remains a double-free opportunity in lsarpc_s_LookupSids. Am I missing something?

      I would prefer a fix with these properties:
      a: There should be no cleanup to do by the caller of a "get" function that returns an error.
      b: Cleanup of successful "get" returned data should happen exactly once or each get.

      I think just inserting the four bzero calls I suggested would do that, no?

      Just to make sure we're talking about the same proposed change, I put the
      four insertions I'm talking about here:
      https://www.illumos.org/attachments/download/1725/il-9881-gwr.patch

    7. I considered just diff you put here,i.e. without bzero in lsarpc_s_LookupSids. But once again. Copy-paste previous my comment:

      I don't vote for proposed fix, because this still missing another functions that can pollute @info argument, such as:

      lsar_lookup_sids(), smb_sam_lookup_sid()

      and maybe other.

      In other words, it has potential possibility now or in the future, that bug returns. In the future, I mean if somebody add changing to lsa_lookup_sid() or its sub-calls.

    8. You considered "diff you put here, i.e. without bzero in lsarpc_s_LookupSids"? That's not what I proposed.
      I would not call this fix "complete" unless it includes making lsa_lookup_sid stop returning garbage on error.

      Furthermore, once you do that, all you need is two bzero calls added in lsarpc_s_LookupSids.
      The additional cleanup you want to add to the success paths is then harmless, but unnecessary.

    9. After that I looked patch you add to the ticket again this is not fully fix and again:
      ...
      because this still missing another functions that can pollute @info argument, such as:

      lsar_lookup_sids(), smb_sam_lookup_sid()

      and maybe other.

      In other words, it has potential possibility now or in the future, that bug returns. In the future, I mean if somebody add changing to lsa_lookup_sid() or its sub-calls.


      Let me copy-paste comment from from the ticket about convention:

      I would prefer a fix with these properties:
      a: There should be no cleanup to do by the caller of a "get" function that returns an error.
      b: Cleanup of successful "get" returned data should happen exactly once or each get.

      As I wrote, there should be calling convention for lsa_lookup_sid(). And I was thinking about this during preparing fix to achieve (a) and (b) as well. But I also saw that all another SMB functions that work with smb_account_t argument do not cleanup argument during returning error, in other words pollute it. For instance: lsar_lookup_sids() and smb_sam_lookup_sid() and many-many others.

      And that brought a dualism, because either all those smb functions should be modified or neither of them.

      Also there was a question why passed argument was left polluted in many-many functions and maybe it was done intentionally ? Maybe it is important do not zeroing it? And possible answer is: to keep information when function returns error for analyzing in dtrace func:return probes. If we bzero() before return, this information is cleared and brings impossibility to find out for what account error is returned.

  3. I'd suggest adding the bzero here.

    1. Sorry, it maybe RB bug, but I don't see your comment when looking at diff. So let me assume that you suggest to add bzero at the end of cycle after existing smb_account_free(&account) (line 847 original or line 851 in patched).

      Just note that problem is not only in going to the next loop, but also in exitiing from loop due to error: "goto lookup_sid_failed". Therefore we can't postpone cleanup because "goto lookup_sid_failed" can occur and we will get doulbe free again.

    2. Like this (I think) is minimal and sufficient. no?

      diff --git a/usr/src/lib/smbsrv/libmlsvc/common/lsar_svc.c b/usr/src/lib/smbsrv/libmlsvc/common/lsar_svc.c
      index 28d5e73948..0e6048942f 100644
      --- a/usr/src/lib/smbsrv/libmlsvc/common/lsar_svc.c
      +++ b/usr/src/lib/smbsrv/libmlsvc/common/lsar_svc.c
      @@ -845,6 +845,7 @@ lsarpc_s_LookupSids(void *arg, ndr_xa_t *mxa)
                  goto lookup_sid_failed;
      
              smb_account_free(&account);
      +       bzero(&account, sizeof (smb_account_t));
          }
      
          param->domain_table = domain_table;
      @@ -997,6 +998,7 @@ lsarpc_s_LookupSids2(void *arg, ndr_xa_t *mxa)
                  goto lookup_sid_failed;
      
              smb_account_free(&account);
      +       bzero(&account, sizeof (smb_account_t));
          }
      
          param->domain_table = domain_table;
      
    3. No, because of "goto lookup_sid_failed" before.

      Please look again at the bug. The real problem is that smb_account_free() is called when lsa_lookup_sid() returns error. Another code that calls lsa_lookup_free() doesn't do anything in case of error, but two functions lsarpc_s_LookupSids() and lsarpc_s_LookupSids2() consider complex cases with error and without error and this can lead to double free:
      1. Double free in smb_account_free(&account) at the end of loop (line 847)
      2. Double free after lookup_sid_failed label.

      In another words, lsa_lookup_sid() can returns error with polluted @info argument.

      Let me explain another fix idea. Another fix could be place just bzero() directly in lsa_lookup_sid() and sub-calls. In this case no additional smb_account_free() is required. You can vote any idea.

      Yes, this is not trivial fix :)

    4. I did read the bug report. Anyway, yes, there's more going on here than that loop calling smb_account_free when it should.

  4. 
      
vgusev
gwr
  1. Ship It!
  2. 
      
jbk
  1. Ship It!
  2. 
      
vgusev
Review request changed

Status: Closed (submitted)

Loading...