8868 /usr/xpg4/bin/grep dumps core in find_nl()

Review Request #785 — Created Nov. 30, 2017 and submitted

mbarden
illumos-gate
8868
general

8868 /usr/xpg4/bin/grep dumps core in find_nl()

New tests were added to the grep_xpg4 test suite for these issues. The rest of the test suite was run for regression testing.

  • 3
  • 0
  • 1
  • 1
  • 5
Description From Last Updated
On the first loop through here, data_len will be zero and conptr will be NULL. However, if the context flag ... rm rm
On our first iteration, before we've assigned conptrend to a non-sentinel value at line 1300, do we really mean for ... rm rm
This part confused me a little bit. So I tried to put together an example here to try and make ... rm rm
tsoome
  1. 
      
  2. usr/src/cmd/grep_xpg4/grep.c (Diff revision 1)
     
     

    space between -1

  3. 
      
mbarden
tsoome
  1. Ship It!
  2. 
      
rm
  1. 
      
  2. usr/src/cmd/grep_xpg4/grep.c (Diff revision 2)
     
     
     
     
     

    On the first loop through here, data_len will be zero and conptr will be NULL. However, if the context flag isn't set, we won't allocate conbuf at line line 943. This'll mean that conptr will be NULL and conptrend will underflow to ULONG_MAX. Should setting this be conditional on conflag? If not, can we have a comment explaining this case please?

    Actually, in general, how is it valid to point to conptr-1? Wouldn't we be storing to illegal memory if we then executed line 1001 on the same iteration?

    1. conptr and conptrend shouldn't ever be dereferenced if conflag is not set. I can add a "conflag != 0 &&" here if it would help make that clearer.

      The point about line 1001 is valid; Although it (for some reason) didn't cause a crash in my test, it still shouldn't be done, and I've added a (conptrend >= conptr) check on the dereference.

  3. usr/src/cmd/grep_xpg4/grep.c (Diff revision 2)
     
     

    On our first iteration, before we've assigned conptrend to a non-sentinel value at line 1300, do we really mean for this value to result in -1 (if it gets cast to a signed value). As if conbuf was at say 0x1000, won't conptrend at this point be 0xfff? So the subtraction will result in -1? It feels like this won't result in the correct expectations around whether or not we have enough space in the buffer when we're at the initial size and for a very specific value of line_len.

    1. This is a good point. The problem I'm trying to solve is thus: We need a way to represent that the context buffer currently represents no data. Previously, we assumed that the buffer contained no data when conptr == conptrend, because this was our initial condition. However, when a line is empty ("\n"), conptr == conptrend, and so the assumption doesn't hold. My fix for this was to set conptrend to (conptr - 1), because it obviated this need. Clearly I missed some consequences for this. I believe these consequences are localized to the area between the top of the loop and the memcpy(conptrend...).

      I've made it so that when conptrend < conptr, this comparison is to conbuflen alone (as that condition means the context is currently empty).

  4. usr/src/cmd/grep_xpg4/grep.c (Diff revision 2)
     
     
     
     
     

    This part confused me a little bit. So I tried to put together an
    example here to try and make sense of where we are. So let's use this
    kind of dummy example.

    conbuflen: 0x2001 (The default value)
    conptrend: 0x0f
    conptr: 0x10
    line_len: 0x04

    o This says we memcpy 4 bytes at 0x10.
    o We add five to conptrend, so it goes to 0x14
    o We clobber the last byte we copied to have a new line

    Does this make sense? I guess the assumption is that we'll always have a
    newline character of some kind in the last slot otherwise we wouldn't
    have broken out of the growing loop for reading conext, right?

    Maybe we can write down in a comment somewhere what the invariants or
    expectations for conptrend and conptr are and what they are supposed to
    point to with respect to being valid or invalid characters?

    1. Those 4 bytes will occupy bytes 0x10, 11, 12, and 13. Byte 14 will be empty, and we overwrite it with a newline. line_len explicitly excludes the newline character, so the line "a\n" will have a line_len of 1. You are correct, however, that we won't leave the read_input loop until we either have a newline in the prntbuf, or we've completely run out off input data (in which case the newline-finding part before L_start_process doesn't happen).

      I'll try and find an appropriate place to document how conptr/conptrend work. In short, in general, conptr points to the beginning of the context, and conptrend points to the final newline in the context, except when we have no context; in that case, conptrend instead points to the (probably invalid) character before the context, and should not be used prior to the initialization of that context.

  5. usr/src/cmd/grep_xpg4/grep.c (Diff revision 2)
     
     
     

    I'm a bit confused here on what conptrend is always supposed to point to here. Line 1304

    1. Odd, this duplicated a comment at some point. Ignore this one, it turned into the one down below.
  6. 
      
mbarden
rm
  1. Thanks for the follow ups and fixing this.

  2. 
      
alp
  1. Sorry, guys, I have new core dumps. Run ggrep test suite.

    There are more SKIPs and less PASSes. Also I got 2 15-MB core files.

    core '/var/cores/grep.26655' of 26655: grep -i ()\1 in
    fee8dce2 uselocale (0, 0, 0, 0, 0, 0) + f
    fee8e643 mbrtowc (7648064, 7648178, 3, 7648068, fee69b37, 8341c00) + 1e
    feea14fd wgetnext (8046000, e, 76480e8, fef52000, 8046000, 7648179) + 56
    feea15d6 p_b_symbol (8046000, e, 7648108, fef52000, 8046000, 764817b) + 9e
    feea1a95 p_b_term (8046000, 83417e8, 7648160, fef52000, 8046000, fef52000) + 1c5
    feea200f p_bracket (8046000, b5, 7648160, fef52000, b5, 764825b) + 15e
    feea1dda bothcases (8046000, b5, 76481d8, feea1cb9, 83417a8, ff) + 79
    feea1e3f ordinary (8046000, b5, 7648240, fef52000, 8046000, fef52000) + 4d
    feea20ba p_bracket (8046000, b5, 7648240, fef52000, b5, 764833b) + 209
    feea1dda bothcases (8046000, b5, 76482b8, feea1cb9, 8341768, ff) + 79
    feea1e3f ordinary (8046000, b5, 7648320, fef52000, 8046000, fef52000) + 4d
    feea20ba p_bracket (8046000, b5, 7648320, fef52000, b5, 764841b) + 209
    feea1dda bothcases (8046000, b5, 7648398, feea1cb9, 8341728, ff) + 79
    feea1e3f ordinary (8046000, b5, 7648400, fef52000, 8046000, fef52000) + 4d
    .....

    1. The following GNU grep test allows to reproduce this core dump - http://git.savannah.gnu.org/cgit/grep.git/tree/tests/case-fold-titlecase

  2. 
      
mbarden
Review request changed

Status: Closed (submitted)

Loading...