9531 Want netstat -u to show PIDs associated with sockets

Review Request #2311 — Created Sept. 18, 2019 and submitted

citrus
illumos-gate
master
9531, 11707, 11708
32b7e5d...
general

9531 Want netstat -u to show PIDs associated with sockets
11707 provide 64-bit libdhcpagent
11708 netstat should be smatch and CERRWARN clean

Also available to view/review in github at https://github.com/citrus-it/illumos-omnios/commits/ig_9531_netstat-u

  • 0
  • 0
  • 43
  • 8
  • 51
Description From Last Updated
citrus
citrus
rm
  1. Thanks for putting this together. Would you mind updating the main ticket (9531) with the approach that you ended up taking, example output in the verbose and non-verbose modes? It's probably useful to include information about why the other approach that you reference wasn't used because of the contention issues you saw.

  2. Why do we need Makefile.cmd.64 here now? I don't see this discussed in any of the bugs. It'd help to understand if this is related to adding -u support or if this was done as just a separate thing while working on netstat.
    1. Netstat needs to be 64-bit so that it can Pgrab() 64-bit and 32-bit processes in order to extract the socket information.

    2. OK, we should make that clear somewhere. I think that's a pretty easy nuance to miss. I guess we should do this assuming $(BUILD64) is set?

    3. I've added a comment.
      I don't see anything else under usr/src/cmd/ that builds only a 64-bit binary predicating on $(BUILD64). It seems to be used for dual-arch components.
      I can add it but then a 32-bit netstat binary would be built if with BUILD64=# and it would only show socket information for 32-bit processes.

      Let me know your preference.

    4. My preference would be to do this if BUILD64 was set to #. If BUILD64 was set to # then there wouldn't be 64-bit components, so it wouldn't be a problem.
    5. Ok.

  3. Can you use the sys/ccompile.h \_\_NORETURN macro? Also is there a reason this isn't static? Now that netstat is only in a single file, not sure why it'd need to be external. I assume it's not worth trying to unify fail and fatal?
    1. No reason, I've changed it to use that. I'm leaving both functions in place for now.

      It is not static because there is a fail() prototype in usr/src/cmd/stat/common/statcommon.h which is included by this, and that prototype is not static.

    2. Wait, if it's coming from another header, then why are we redeclaring it here at all?
    3. I added it so that I could mark it as noreturn to satisfy gcc/smatch once I removed the warning suppressions.
      I'll see if it can be done in the header instead.

  4. Maybe write a mini-theory statement about what it's for, why it exists, how it works, etc.? For example, if you had tried an AVL tree here but found it too slow, that'd be useful to record as that might be a natural question someone has (note this isn't suggesting that the hashing should be rewritten with just an AVL tree).
    1. I've added some notes.

  5. usr/src/cmd/cmd-inet/usr.bin/netstat/netstat.c (Diff revision 2)
     
     
     
     

    I'd recommend commenting where these values came from. Even if you just picked them arbitrarily.

  6. It'd be helpful to have a comment that explains why you're checking for the ph_type and ph_family to be zero here.
    1. I don't actually have to since I'm checking that it's a character device I can just go straight to checking the major number.
      ph_type and ph_family will always be zero for a TLI socket.

      It was just belt and braces - now removed.

  7. Mind putting parens around this multi-line if expression? Same at the one a few below.
  8. I think it's clearer if you have an unknown sentinel that's actually outside of function scope, as in it's its own top-level value. I think that makes it clearer about its purpose and why it exists.
  9. Please put a void in a function that takes no arguments to make it clear it takes no arguments. Otherwise, sometimes a compiler will think it is a K&R style declaration.
  10. Since you've opted into c99, you can scope these in the for loop variable declarations and limit their visibility. (I would suggest limiting local variables to their minimum required scope in general)
  11. When using indexes elsewhere you use unsigned integers. Given that this is iterating all of the indexes, probably it should be unsigned to be consistent (also there can never be a negative index).
  12. In all of the definitions I have, pid is always a signed type. So I think you need %ld or %d depending on bitness.
  13. I would make these arguments const if you can. Though I know the proc handle probalby can't.
  14. Should ip or arp be in the list? I don't recall if they're TLI or not.
    1. I haven't seen any, and this list is copied from pfiles.c
      https://github.com/illumos/illumos-gate/blob/master/usr/src/cmd/ptools/pfiles/pfiles.c#L289

  15. Is there a reason this memory allocation failure isn't considered fatal?
    1. I think because I consider the whole exercise of trying to find process information best effort, given it's a merge of two separately taken snapshots etc.
      For this to fail though, a fatal error is probably in order.

    2. OK, I guess that's a reasonable perspective. We should probably make sure it's clear in the manual page it's best effort then.

    3. Tweaked the manual a tad.

    4. I was thinking about this a bit more and now there's no way for me as a user to know this occurred. I would consider issuing a warning when this occurs, still printing everything that was gathered, but exiting non-zero to make it clear that an internal error occurred.
    5. I made it fatal in the last revision. If we're that short of memory, exiting seems like a valid option.

  16. usr/src/cmd/cmd-inet/usr.bin/netstat/netstat.c (Diff revision 2)
     
     
     
     
     
     
    I assume we don't care about truncation at all here for users?
    1. No, I don't think we care. The code allows up to 128 characters (I couldn't find a definitive limit for usernames anywhere).

  17. Why is this in units of longs? Can you expand the comment there? Does a sockaddr_storage not do the trick?
    1. This was copied verbatim from pfiles.c
      https://github.com/illumos/illumos-gate/blob/master/usr/src/cmd/ptools/pfiles/pfiles.c#L766

      To be honest, we only need sa_family which is the first element of struct sockaddr, so I could just use a plain struct sockaddr - I'll give it a try.

  18. usr/src/cmd/cmd-inet/usr.bin/netstat/netstat.c (Diff revision 2)
     
     
     
     

    Please put parens around multi-line continuation for this if statement.

    1. I meant iff (https://en.wikipedia.org/wiki/If_and_only_if ) but I can change or expand it if it is not commonly understood.

    2. I do know the abbreviation, but wasn't sure. I would expand it as otherwise to a lot of folks I think it'll look like a typo.

    3. Done.

  19. Any particular reason we're ignoring the return value here? If this fails, that probably indicates something has gone rather wrong, right?
    1. Just the best-effort thing again. It can only fail here with EINTR.

  20. See earlier notes about signedness of i and scoping of ph/ph2.
  21. See earlier notes about signedness of i and scoping of ph/ph2.
  22. If you're cleaning these up, the casts shouldn't be necessary, fwiw.
    1. These (reverse argument order) were to satisfy smatch. I'll drop the casts too.

  23. I would add a comment above each of these blocks explaining why they're done this way and not the old way so someone doesn't later come along and do the opposite of what you've done.
    1. Done - you can imagine how many inconsistencies there were in the format strings before I did this...

  24. Isn't a major always a 32-bit value? I'm not sure we should be using PRIx64 for it here.
  25. Aren't sie_flags defined as a uint64_t and therefore %x is incorrect?
  26. Based on this, I assume you're trying to make sure that transport_count can't overflow. However, signed overflow is undefined behavior. Therefore you probably want to make this an unsigned value or make sure when you're adding to it it doesn't overflow before you actually do the arithmetic.
    1. It's actually copied straight from the extant gather_attrs().
      It should be just checking that it is not 0. I'll fix gather_attrs() too.

    2. I was looking at the new version and there is still an overflow potential, but I guess it's not the most realistic? Not sure if it's worth worrying about or not.
    3. It's probably not realistic, but I added overflow detection for transport_count and switched calloc() to recallocarray().

  27. Why are we comparing mib_id here to EXPER_SOCK_INFO and not to mib_id like when we counted the number of elements? If these are different, can't we end up overflowing?
    1. It's because the data is a mixture of tables. We look through for the tables relating to the protocol that is being looked at and count up how many connections there are before allocating the info array, then look through the list again for the extra information tables.

      The data from the kernel will never cause an overflow here. The EXPER_SOCK_INFO tables contain socket information indexed by the position of the corresponding connection in the mib_id table.

      However, this looks like a good place for an assertion; I'll add one.

  28. I suspect you're tyring to make sure that the length and offsets don't overflow the amount of data we have right? In that case the arithmetic should be done on uintptr_t's I think.
    1. It's a pattern from the extant gather_attrs() again but yes, it looks like it should be uintptr_t.

  29. usr/src/cmd/cmd-inet/usr.bin/netstat/netstat.c (Diff revision 2)
     
     
     
     
     
    In this section and looping we currently seem to have an assumption that the number of attrs and the number of socket info structures will be identical. Why is that? I can see that there are cases in the kernel where we'll not generate it if we can't allocate memory. How do we ensure that these are always the same. In particular, cases like at 5532, 5542. What makes sure they're actually in sync?
    1. The attrs and socket info structures are additional information for each TCP connection. Both attrs and info are indexed with the TCP connection offset in the MIB so they are the same size as each other and the same size as the number of connections. However the information may not be present at all, or not present for a particular connection.

    2. OK. I would strongly suggest improving the comment to make it clear about the design there. It was kind of non-obvious to me from the kernel code.

    3. Done.

  30. usr/src/cmd/cmd-inet/usr.bin/netstat/netstat.c (Diff revision 2)
     
     
     
     
     
    Why are the unsigned gauges being converted to signed values? Doesn't this risk truncation?
  31. Just to make sure the Xflag really is just an internal debugging flag that you use to dump internal state and that's why it's been added all over the place, right?
    1. Yes - Xflag was pre-existing, I just cleaned up the existing output and added more.

  32. This has the same question as the TCP bits do on iptr/aptr.
  33. How did you determine that this was sufficient size? Is it ok for us to truncate this if we're wrong? Any reason not to just use asprintf so we don't have to guess?
  34. Same attrs questions as in TCP.
  35. Cast shouldn't be necessary here.
  36. Looks like signed/unsigned mismatch. I think ks_ndata is unsigned, but you're comparing to a signed value (i).
  37. usr/src/lib/libdhcpagent/common/dhcpagent_ipc.h (Diff revision 2)
     
     
     
     
     
     
     
     
     
     
    Isn't a time_t signed. If so, shouldn't this be an int32_t? Also, since this is private, we could put it in now as an int64_t so that way 64-bit stuff isn't being truncated. We shouldn't assume a library will always be 64-bit.
    1. You're right, I'll fix and move to int64_t. I didn't assume the library would always be 32-bit, but I did assume that the cleanup would happen once somebody moves dhcpagent et al. to 64-bit. netstat is currently the only 64-bit consumer.

  38. usr/src/man/man1m/netstat.1m (Diff revision 2)
     
     
    Should this mention that they might be truncated? Can we talk about the behavior a user should expect for multiple processes that have the same fd open. As in they should expect it to be an exhaustive list (given that it's a moving snapshot) or not.
  39. usr/src/uts/common/fs/sockfs/socksubr.c (Diff revision 2)
     
     
    Comment needs to explain _why_ it must be done before.
  40. usr/src/uts/common/fs/sockfs/socksubr.c (Diff revision 2)
     
     
     
     
     
     
    Since we're rewriting these should we verify that we don't overflow the buffers with snprintf now instead?
  41. usr/src/uts/common/inet/ip/ipclassifier.c (Diff revision 2)
     
     
     
    Since this is only ever used in loops where we're appending them to mib2 data, rather than kmem_alloc/freeing it every iteration, should we just ask callers to pass in the struct and we'll fill it out. That way a single stack allocation can be done and we don't have to keep going back and forth in kmem? Doesn't matter either way per se.
    1. Yes, better, fixed.

  42. usr/src/uts/common/inet/ip/ipclassifier.c (Diff revision 2)
     
     
     
    Presumably we need to actually hold the conn_lock while checking this. Are callers required to have done so? If so, should we be asserting that?
    1. Callers don't have the lock, but they do hold a reference on connp, acquired through ipcl_get_next_conn()
      I don't think we need to hold a lock - the calling function reads values from connp without a lock (tcp_snmp_get() for example).

    2. Yes, but it's not doing any upcalls. Can you make sure that as long as you have a hold, CONN_CLOSING can't be set and therefore it's safe to perform the upcall due to having the hold (or CONN_CLOSING being set isn't a requirement for making the upcall)?
    3. You're right, it needs to acquire the mutex, fixed.
      Thanks for catching this.

  43. You probably want KM_NOSLEEP | KM_NORMALPRI.
  44. usr/src/uts/common/inet/tcp/tcp_stats.c (Diff revision 2)
     
     
    Conventionally the != is on the previous line.
  45. usr/src/uts/common/sys/socketvar.h (Diff revision 2)
     
     
    I would make sure to double check 32-bit / 64-bit alignment on the updated version of the structure.
  46. usr/src/uts/common/sys/strsubr.h (Diff revision 2)
     
     
     
     
     
     

    This comment would be more helpful if it explained why it was needed. In this case looking at spec_clone() doesn't really make it clearer as to why this member is useful.

  47. 
      
citrus
citrus
citrus
hadfl
  1. Ship It!
  2. 
      
rm
  1. Thanks for the follow ups here and working on this! I have a few follow ups on the changed bits.

  2. usr/src/cmd/cmd-inet/usr.bin/netstat/netstat.c (Diff revisions 2 - 5)
     
     
    I see you've moved the fatal in place of the fail macro. Should this also have the __NORETURN attribute applied?
    1. Well not originally, because fatal() has an early return if format is NULL, it is not a noreturn function. You spotted this in another comment. I've fixed and added the __NORETURN tag.

  3. usr/src/cmd/cmd-inet/usr.bin/netstat/netstat.c (Diff revisions 2 - 5)
     
     
    Because we've changed SENDQ and RECVQ to an int64_t to deal with overflow, this should be `"%6" PRIu64` or similar, I believe. This is also true of the RECVQ bit.
  4. usr/src/cmd/cmd-inet/usr.bin/netstat/netstat.c (Diff revisions 2 - 5)
     
     
     
     
     
    Mind putting braces around this new multi-line if?
  5. usr/src/cmd/cmd-inet/usr.bin/netstat/netstat.c (Diff revisions 2 - 5)
     
     
     
    I'm not sure if this is from the original netstat (it shows as new between a diff of two things), but surely if format is null I imagine we should just call exit and not continue?
    1. It is from the original, but I agree and have fixed it (as well as marking fatal() as __NORETURN now that it is)

  6. usr/src/man/man1m/netstat.1m (Diff revisions 2 - 5)
     
     
     
    When I see snapshot, I think that we're going to be able to gather this information atomically. How about the following:
    
    'While the system gathers this information, the processes associated with a given endpoint may change. If such a change occurs, it may not be reflected in the output.'
    1. Much better, taken verbatim, thanks.

  7. usr/src/uts/common/inet/ip/ipclassifier.c (Diff revisions 2 - 5)
     
     
    Why is it OK to drop the mutex here? What ensures that this data, like the upcalls, can't be changed out from under you? Should we be putting a reference on the conn_t or does our caller make sure that it has one? I think that'll make sure that if someone is calling ip_quiesce_conn() that we won't have our upcalls and other data torn out from under us.
    
    It's not clear to me if this is ultimately being called from a ipcl_walk() call or not.
    1. I don't have a complete answer to this (help would be appreciated).

      We do have a reference on the conn_t as this function is always being called from a loop that uses ipcl_get_next_conn()

      The reason for the CONN_CLOSING check, according to the original netstat change notes, is to handle sockets that pop off the sockmod. In this case, the sonode associated with the connection can be freed but a dangling reference left in the conn_t upper_handle. I checked other instances where socket upcalls are used, and none of them seem to check for this or worry about a race between checking and making the upcall. Some upcalls are made while holding the mutex, others are not.

      It might be possible to hold the conn lock while making the upcall - I will investigate.

    2. OK. So, in this case, based on my reading, the upcalls are supposed to be valid while we hold a reference. So I don't think it can go away until after we finish the call to ipcl_get_next_conn(). So I don't think it makes sense to hold the conn lock during the upcall in this case. I guess what's there is probably fine. I would probably add a comment about all of this above there so someone else doesn't get confused in the future.

  8. usr/src/uts/common/inet/udp/udp_stats.c (Diff revisions 2 - 5)
     
     
     
    Can we join these lines?
    1. Indeed.. no idea how that happened!

  9. 
      
citrus
citrus
citrus
gwr
  1. 
      
    1. BTW, some may find this link helpful:
      https://github.com/illumos/illumos-gate/compare/master...citrus-it:ig_9531_netstat-u

  2. usr/src/uts/common/sys/socketvar.h (Diff revision 8)
     
     

    Why not uint64_t here too? (as you did some other places)
    for all of:
    char si_son_straddr[ADRSTRLEN];
    char si_lvn_straddr[ADRSTRLEN];
    char si_fvn_straddr[ADRSTRLEN];

    1. This is part of the change that survives from Mohamed's review request a few years ago.

      Before this change, there was a struct k_sockinfo that wrapped struct sockinfo and added these additional strings to data returned from the kstat request.
      Since struct sockinfo is declared as a private interface, Mohamed just moved them as-is directly into struct sockinfo before adding the new fields needed for netstat -u:

      -#define    ADRSTRLEN (2 * sizeof (void *) + 1)
      -/*
      - * kernel structure for passing the sockinfo data back up to the user.
      - * the strings array allows us to convert AF_UNIX addresses into strings
      - * with a common method regardless of which n-bit kernel we're running.
      - */
      -struct k_sockinfo {
      -   struct sockinfo ks_si;
      -   char        ks_straddr[3][ADRSTRLEN];
      -};
      

      So I haven't changed any types here, which bit could uint64_t be used for?

    2. Maybe I'm confused. I thought those were kernel memeory addresses. Are they network addresses? AF_UNIX?

    3. No, you aren't confused, I was. They're string representations of 64-bit kernel memory addresses, built using snprintf("%p")
      Since netstat seems to be the only in-gate consumer of the sockfs:sock_unix_list kstat, and it's a private interface that I'm nodifying anyway, yes I could change these to uint64_t.

    4. I think it's fine to keep them as strings for the ISA independence that it provides for now as per the original comment, fwiw.

    5. It also allows easier testing with the previous netstat binary which I kept around for the purpose.

  3. 
      
rm
  1. In general, everything looks good. Thanks for all the work here.

  2. usr/src/cmd/cmd-inet/usr.bin/netstat/netstat.c (Diff revisions 5 - 8)
     
     
    Shouldn't this kind of warning go to stderr?
  3. usr/src/cmd/cmd-inet/usr.bin/netstat/netstat.c (Diff revisions 5 - 8)
     
     
    Same question about stderr.
  4. 
      
citrus
rm
  1. Ship It!
  2. 
      
citrus
Review request changed

Status: Closed (submitted)

Loading...