libtopo: add support for directed graph based topologies

Review Request #2529 — Created March 13, 2020 and submitted


Libtopo currently only supports building topologies based on a tree structure. The problem is that not all topologies map naturally to a tree structure.

These changes extend libtopo to support building topologies that are based on directed graph structures.

Specifically, this ticket will cover adding new APIs to libtopo for the following:
- building digraphs
- iterating over graph vertices
- iterating over vertice edges
- finding all paths between an arbitrary pair of vertices
- serializing a graph-based topology to XML
- deserialzing a graph-based topology from XML

See the testing notes in the ticket:

  • 0
  • 0
  • 62
  • 13
  • 75
Description From Last Updated
  1. Thanks for putting this together, Rob. I'd be very curious to see how this fit together with the sas topo changes on top.

    1. Sorry, apparently this bit didn't get published with my original follow up review. Sorry.

  2. usr/src/lib/fm/topo/libtopo/common/topo_builtin.c (Diff revision 1)
    Is there a reason that this can't use topo_mod_enumerate()? One difference between the two right now is that there's a topo_node_hold() done on the node before we enumerate it and a rele afterwards. Do we need to do a hold/rele on tdg->tdg_rootnode around this in addition to the mod enter/exit? A comment here might be helpful if it should be different.
    1. To be honest, I don't remember anymore why I made these different between the tree and digraph cases. It's something that goes back to the original prototyping several months ago. I went and change the code to use topo_mod_enumerate() and retested things and it worked fine so...I guess it's fixed :)

  3. cstyle should probably point out that the '{' should be joined to the prior line.
    1. For some reason cstyle didn't flag these, but I moved the opening brace up to the previous line for all these struct defs.

  4. ... API is can ...
    I think there may be an extra word in that bit of the sentence.
  5. Was 'adjecently' maybe adjacency?
  6. usr/src/lib/fm/topo/libtopo/common/topo_digraph.c (Diff revision 1)
    So, in this bit, I think that we're defining a path as a series of nodes that one is traversing is that right? It might help to add a sentence in here that says, each entry between the '/' is the next node on the path.
    I guess the idea is that each node can be used to define the route, which is why you've called out cables here and that if it's important to call out a specific route between two entries, then all of the nodes are listed. Is that right?
    Maybe we can add a little bit more here about when it's good to use this or not. For example, would it make sense to use this to represent two logical services that can't talk to one another over a given TCP port, say? Maybe a better way to phrase it is is this OK to use if we don't have every step of the way between two entities?
    1. Yeah, I reworked this comment to hopefully explain this better.

  7. Can you describe the locking details in a bit more detail here? I can see that we're locking the top-level digraph when inserting edges and verticies for doing various operations, but it's not clear to me what it is supposed to protect as the actual verticies and edges themselves don't do any locking.
    My memory of the topo API is that while a given module will only be invoked serially for a given bit of topo, it wasn't clear to me that internally you couldn't have two modules iterating and enumerating something both of which might try to add an edge or a vertex at the same time or remove a vertex that's being iterated by someone else.
    1. Any libtopo code that exclusively executes during snapshot enumeration can assume a single-threaded environment.

      However, any topo code that can called after the snapshot is created should assume it's in an MT environment.

      Most topo operations don't modify the state of the snapshot. The cases that I can think of are:

      1) topo prop get operations for properties that are backed by topo methods
      - they may modify node property state

      2) topo_prop_set operations

      3) and obviously calling topo_snap_release() or topo_close()

      Cases #1 and #2 use per-node locks to enforce synchronization.

      For case #3, the issue is there could be a situation where one or more threads were to access the snapshot (they've got a cached tnode_t ptr or are walking the snapshot) while another thread is calling topo_snap_release() or topo_close().

      Libtopo doesn't really have a story for case #3. It's really left up to the library consumer to handle that case - for example, fmd implements its own reference counting mechanism around topo snapshots so that individual fmd modules can do a hold/rele on a snapshot to ensure it doesn't get freed out from underneath them.

      Getting back to the topo_digraph case, we're not exposing any APIs that modify the state of the graph structure once the snapshot is created. The only state that can change is the state of the underlyng topo nodes for each vertex, which are already protected by the per-node locks. The places in topo_digraph.c where I'm currently taking the lock are all during snapshot enumeration (which is single threaded, so I'm wondering why I did this). And the only place where the digraph state can be modified post-enumeration is when we teardown the snapshot. And libtopo prevents concurrent calls to topo_snap_release() or topo_close() via the topo_hdl lock. So maybe we don't need this lock at all?

    2. Do you have any thoughts on this? Right now I'm leaning towards removing tdg_lock, because I'm not convinced we even need it.

    3. Thanks for expanding on this a lot more. I think I defer to you on whether we'll need the lock in the end or not, but would you mind writing that up in the block comment based on whatever you do. For example, if we don't need the lock, the reasons why or if we do, the reasons why.

    4. If you don't think we need it, then I guess it should be fine to get rid of it.

    5. I've removed the lock and added an explanation of the locking strategy (or lack thereof) to the comment block at the top of the file.

  8. Just to make sure I have the right understanding, a given scheme should only ever have a single digraph right? The same way that a scheme only ever has a single tree and root?
  9. Do we need to take a topo_mod_hold() for this? Or would that screw with how the reference counting works for modules?
  10. I would duplicate the comment from +232 here.
  11. It may be worth considering an error checking mutex here. Otherwise the pthreads APIs won't error about unlocking an unlocked mutex or one you don't own. Though it may be better to do that in one larger pass over libtopo and to add the a pthread version of mutex_enter()/mutex_exit().
    1. See earlier comment regarding locking. I'm thinking maybe this lock can go away.

  12. Would it be possible to see how the consumer in the sas topo change would use all this? I would have expected that this function would go through and clean up any vertices and edges that had been associated with a digraph. Doesn't the digraph basically own the vertices and edges once it's been added? Or is the idea that this should always be separate.
    Maybe you could add a bit about the ownership and destruction path to the theory statement?
    1. Yeah, I was thinking more that libtopo itself would be the primary consumer of topo_digraph_destroy() and so I modelled it to behave similarly to topo_tree_destroy(). If you look at topo_close() it first calls topo_snap_destroy(), which iterates through the trees and destroys all the nodes. Then it iterates through the trees again, destroying the topo tree structure via topo_tree_destroy(). I basically duplicated that how logic for the digraphs.

    2. I see. Now that I understand a bit more about how this fits in, I think that makes sense. Mind leaving a comment for someone there in the future?
    3. Yes, I've expanded the comment block above topo_digraph_destroy() to explain all this better.

  13. I was a bit surprised to not see a topo_mod_rele() here and just a straight up topo_mod_free() of the root node. Do we not need to call topo_node_destroy() to clean up things here? If this is the way to do things because the root is special (though I'm not sure how things like the children hash list are expected to be cleaned up work), then a comment here would be useful.
    1. The "root" node on the topo_digraph_t is not a fully-formed node so topo_node_destroy would attempt to free some things that aren't set up on it. I did add code to unregister all the methods that are attached to the root node - which I was leaking. Currently the code is findleaks clean when examing process cores taken after running the tests and also after running SAS enumeration.

  14. This error path doesn't set a topo error. It also doesn't log an error via topo_mod_dprintf(). Should we send this to the err path for that?
    1. I added a topo_mod_dprintf() here.

  15. Should we check somewhere that we're not going to add a vertex that causes us to overflow nverticies?
  16. I'm not sur ethat topo_mod_free() correclty handles freeing a NULL pointer. Do we need to check that these are non-NULL?
    1. Ugh - you're right. Fixed.

  17. I've noticed that all the iteration functions here have a boolean_t for this being the last entry. That's not something I'm used to seeing in other iteration functions. Does this help consumers out in some way?
    1. This is an artifact of the original JSON serialization code (before we switched to XML). In the case the JSON serialization code, it was useful in the callbacks to know if it was the last element in an array (and hence didn't need a trailing comma when serialized). The XML serialization code doesn;t use it and neither does the sas enumeration code that's coming in illumos#12331, but I left it in there as I thought it may be generally useful for some future use case.

    2. OK, seems reasonable to keep it. Thanks for the additional background.

    3. No problem!

  18. cstyle: I don't believe case statements should be in parens.
  19. FALLTHRU shouldn't be needed if there's no code here.
  20. Shouldn't this set a topo error?
  21. usr/src/lib/fm/topo/libtopo/common/topo_digraph.c (Diff revision 1)
    Should we worry about these counts overflowing?
  22. Should this name reflect that it only does outgoing edges? Is it possible we'll want to later introduce one that has incoming support?
    1. I had thought about the possibility of adding support for iterating over incoming edges in the future should the need arise. My thinking was that if we added that we'd do it by adding a flag param to topo_edge_iter() that could be used to specify which edges we wanted to iterate over. I think this would minimize code duplications, as opposed to having separate functions for iterating over incoming vs outgoing edges.

    2. OK, sounds reasonable.

  23. Please remove the parens from these statements.
  24. This shouldn't be needed if there's no code between the case statements.
  25. cstyle: This should be joined to the previous line.
  26. It might be useful if this said semantically what it was trying to do. The fact that it's a helper function is fairly obvious. The set of paths it's trying to accumulate less so.
    1. Agree - I've added a much more descriptove comment block for this function.

  27. It's not clear to me how this stops executin in the face of graph with a loop or if you go down paths that'll never reach to. Maybe the latter is fine because we'll just recursively walk everything, not find it and eventually hit an end. But in the face of loops, it seems like this'll maybe infinite loop?
    1. Yep, there was some brokenness here that I didn't see during testing since none of the actual SAS topologies we tested with have loops. I updated digraph-test-in.xml to have a loop and it immediately exposed the problem. I've updated the code to properly handle cycles (hopefully!)

  28. Since curr_path is only used in a constant form can we make it const?
  29. usr/src/lib/fm/topo/libtopo/common/topo_digraph.c (Diff revision 1)
    Don't we need to check whether or not this succeeded?
  30. Isn't this missing a free of pathstr?
  31. So there's a signed/unsigned integer mismatch here. This function returns an int which is the number of paths. However, we're storing this along the way as an uint_t. This means that we'll end up doing implicit conversion and if we have 2^31 paths, we'll start thinking this failed. I think it'll be clearer to just have this function return 0/-1 and have the number of paths be a uint_t that it takes out.
    1. Yes, that makes sense. Done!

  32. Shouldn't this be moved up to +551? If you're going to directly return here, don't we need to free pathcomp? It'll probably be easier if everything goes out the err path?
    1. Yeah, I restructured this code some to clean that up.

  33. Missing free of curr_path in this error path.
  34. Why does this function return an ssize_t if it never returns a negative number and the consumers of it always pass it back to something that takes a size_t?
    1. Now that I've added code to check if snprintf returns -1, this function can return -1.

  35. Do we need to worry about snprintf returning -1?
  36. Surely doesn't like the '0' and the || with no spaces between them?
    1. pbhck didn't complain for some reason. I fixed it.

  37. Do we need to worry about snprintf returning -1?
  38. What does this +1 represent, a null terminator? Maybe it's better to start with bufsz equal to 1 in case someone else changes things around later?
    1. Yeah, it's for the NUL. I changed the code to initialize bufsz to 1.

  39. Do we need to worry about snprintf returning -1?
  40. Do we need to worry about snprintf returning -1?
  41. Why is this EMOND_NOMEM here but EMOD_FMRI_NVL when adding things at +719. Shouldn't they both be the same?
    1. The EMOD_FMRI_NVL was more for the scheme_vers != FM_PATH_SCHEME_VERSION case. I broke this code up into two seperate if statements so we can set EMOD_NOMEM for the nvlist_add failure case.

  42. Any reason not to include the NUL character in this?
    1. Fixed per comment below.

  43. So, this construct is a little worrisome. This does copy all of the bytes, but explicitly will not copy over the NUL terminator because n is set to the string length which doesn't include it. This means that by default strncpy() won't insert one and we're relying on the fact that we did a topo_mod_zalloc. Since we're doing the logical equivalent of the strdup, I'd just do a topo_mod_alloc() and then explicitly do a bcopy() of fmrilen+1 bytes to make sure that everything is exactly the same. Otherwise, someone looking at this in the future or someone who incorrectly changes the topo_mod_zalloc() to something else might not realize that it was load bearing for ensuring the NUL terminator.
    1. I reworked this code per your suggestion above.

  44. usr/src/lib/fm/topo/libtopo/common/topo_digraph.c (Diff revision 1)
    If npairs is zero should we still continue?
    1. No. I added a check for this.

  45. I presume something was meant to go here?
    1. Ooops - yeah, fixed.

  46. Should there be consumers of this in libtopo or should this be wired up to something? Or is this coming in a subsequent change?
    1. Beside the automated tests, the next consumer will be a CLI tool called sastopo. That will be delivered in illumos#12331.

  47. If we're going to ignore every single fprintf error code should we at least check ferror() somewhere in the whole thing to make sure that we can note that writing this out might have failed?
    1. Yes, I added a check at the end of topo_digraph_serialize()

  48. Why does the uint64 use hex where as all the other unsigned types use base 10? Should they be consistent so it's easier to tell or does it not matter?
    1. This mirrors the behavior of the existing behavior of the XML serialization/deserialization for tree topologies. At least for the sas enumeration use-case, where we use uint64_t properties to represent SAS addresses, the hex form is much nicer and easier to correlate with the output of other tools.

  49. Should this really be being written to the file? Is the goal to make sure that the file is invalid xml?
    1. I went and changed all the error messages in the serialization code to use topo_dprintf

  50. Here and elsewhere I assume we don't care about propagating the topo error that caused us to fail?
    1. I reworked all the topo_digraph_serialize code paths so that topo errno is alwways set.

  51. usr/src/lib/fm/topo/libtopo/common/topo_digraph_xml.c (Diff revision 1)
    Maybe it's better to have a common function for this since this looks the same as the DATA_TYPE_NVLIST handling.
    1. I refactored most of the duplicated code between serialize_property() and serialize_nvpair() into separate functions.

  52. For this and all the other array entries, can we leverage the existing nvlist array handling to try and reduce the duplication of all of the ways to write this out with a shared common function?
    1. I refactored most of the duplicated code between serialize_property() and serialize_nvpair() into separate functions.

  53. Unchecked return values?
    1. I added code to check the retval on the uname() and time() calls.

  54. It feels like things could change that cause the sizeof (tstamp) to not be large enough. It's probably worth checking that this is correct and/or also giving tstamp more slack.
    1. I increased the buffer size and also added code to check the retval from strftime.

  55. Should we check ferror() here before returning for the user? Or is it up to the caller to do that?
    1. Yes, I added a check at the end of the function.

  56. Since these are coming from the xml document and are just pointers into it, shouldn't we make these const?
  57. Why are we doing this here? We need to do this after all the items are added which is you also have a copy of this at +999. Seems like the only reason to do this is to catch an early duplicate, but I think that might be more confusing than not.
    1. I have no idea why I did that :) As you state, it's clearly unecessary. Removed.

  58. I guess the idea here is to allow others to add children that topo doesn't understand to the xml file and still deal with it?
    1. The general idea was to make the XML schema for digraphs and the associated parser as generic as possible such that new schemes (beyond sas) could be added without having to adjust the parser logic or DTD.

  59. For this and all the other integer properties why do we just truncate the integer? If it's out of range, shouldn't we actually care about that and error on that rather than adding a truncated version?
    1. For what it's worth, this is same behavior as the current XML parsing code for tree topologies.

      But I did go ahead and some code to make sure the converted uint64_t val won't exceed the number of bits available in whatever we're downcasting to. It's not full-proof in that it wont catch, for example, the case where we have something like:

      <nvpair name='property-value' type='int8' value='255' />

      cause it fits in 8 bits, so it'll happily just store -1 in the int8_t

    2. I see, OK. thanks for taking a look at that.

  60. usr/src/lib/fm/topo/libtopo/common/topo_digraph_xml.c (Diff revision 1)
    If there are more children than the property indicated we'll happily go off of the end of nvlarr. Shouldn't we check here that 'i < nelems' and fail if we hit a case where we have more than specified?
    1. I decided to alter the XML schema so that the nelem attribute is no longer created/used for array types. Instead we just count the number of child nvpair or nvlist elements and then size the appropriate arrays based on what's actually in the document. I think this make the XML schema simpler and eliminates the issue you of walking off the end of the array.

      Similarly, I removed the nelem attribute from the "vertices" and "outgoing-edges" elements as they weren't even being used by the deserialization code.

  61. Do we want to check that i actually is equal to nelem to make sure that we found everything that was asked for?
  62. All of the array entries have the same issues with trusting the number of entries and walking off of the end of the array that I describe in the TDG_XML_NVLIST_ARR case. We should fix them all.
    1. Agree, see the above comment.

  63. Maybe it's worth adding a topo_hdl_calloc and a free version of the array format to make all of these easier?
    1. Possibly. I think I'm going to punt on that for now.

  64. Using uint32_t here and int32_t in the free path. Probably should be uint32_t.
  65. Here you're using uint64_t, but everywhere else in this function you're freeing with int64_t. I presume this should have been an int64_t?
  66. In the failure path, vtx is not freed.
  67. Where does nobase.xml come from? Is that a well known URL in libxml?
    1. I copied this from an example I found online for using xmlReadMemory() :) Some of the examples use "noname.xml" and others use "nobase.xml" when the XML being read does not contain URIs that heirachially related to some base XML document. I don't think either of those names have any special meaning and I couldn't find any refernce to them when I grepped through the libxml source code. NEar as I can tell, I think the API just doesn't like NULL being passed in for the base URL so the convention seems to be to pass a non-existent filename when there is no base URL.

    2. Mind adding a comment to that effect? Maybe we should try with an empty string, e.g. "".
    3. Yes - I checked and it also works with an empty string. I changed the code to use that.

  68. I assume that root doesn't need to be freed unlike other libxml2 artifacts like the properties?
    1. Correct. Unlike the xmlGetProp(), it doesn't do any allocations.

      * xmlDocGetRootElement:
      * @doc: the document

      * Get the root element of the document (doc->children is a list
      * containing possibly comments, PIs, etc ...).

      * Returns the #xmlNodePtr for the root or NULL
      xmlDocGetRootElement(xmlDocPtr doc) {
      xmlNodePtr ret;

      if (doc == NULL) return(NULL);
      ret = doc->children;
      while (ret != NULL) {
          if (ret->type == XML_ELEMENT_NODE)
          ret = ret->next;


  69. usr/src/lib/fm/topo/modules/common/ses/ses.c (Diff revision 1)
    The fact that these three are no inconsistent confuses me. It appears that we use these as the same thing as the topo instance, so why would the maximum be a type that's smaller than the others?
    I get that we are using sec_maxinstance and trying to use -1 as a sentinel that it's not set, but we're also using it as the largest value that we'll pass to topo functions, so having it be a different type seems wrong. Maybe we should use a different sentinal like UINT64_MAX or some topo max instance value?
    1. I changed the code to so that sec_maxinstance is a uint64_t and used UINT64_MAX as the "not set" constant.

  70. Conventionally a new line between this and the copyright?
  71. Why do we need a timestamp in all this?
    1. We probably don't need them. I just copied the logging functions from another test I wrote (the ddi_ufm tests). I don;t think it hurts to have them.

  72. Worth cleaning up the whitespace? Did 'git pbchk' not mention this or the bit on the next line?
  73. usr/src/test/os-tests/tests/libtopo/digraph-test.c (Diff revision 1)
    Shouldn't we be checking if what we've serialized we can read back in and is the same as what we serialized out?
    1. I restructured the tests from:
      - deserialize hard-coded xml
      - check paths
      - serialize back to xml

      - deserialize hard-coded xml
      - serialize back to xml
      - deserialize from serialized output
      - check paths

      Which shoud verify that we can reach the serialize data back in and it's sane.

  74. usr/src/test/os-tests/tests/libtopo/digraph-test.c (Diff revision 1)
    Why does the test require root access? Shouldn't all of the topo bits here work without root access?
    1. It doesn't - old habit from writing other topo CLIs where root is needed. I've removed the check.

  75. Feels like we should add some tests that cover doing things like parsing invalid xml files, etc. where we have things that aren't checked for in the dtd like the number of elements in an array mismatch the number found for the different types, etc.
    1. All the instance of "nelem" attributes are gone. But I added a handful of test cases for other types of invalid XML.

  1. In general, this looks great. Thanks for putting this together.

  2. usr/src/lib/fm/topo/libtopo/common/topo_builtin.c (Diff revisions 1 - 3)
    Should this really not have the topo_mod_enter() and topo_mod_exit() around this any more? Don't we need that for serialization purposes?
    1. No. I had the topo_mod_enter() in the original change because I was calling the tmp_enum entry point directly instead of calling topo_mod_enumerate(). Now that it's calling topo_mod_enumerate() for both the tree and digraph cases we don't need it anymore because topo_mod_enumerate() does the topo_mod-enter/exit for us.

    2. Ah, okay. Thanks for clarifying that.

  1. Thanks, this looks pretty good!

  1. Ship It!
Review request changed

Status: Closed (submitted)