Re: avltree / genalloc

From: Laurent Bercot <ska-skaware_at_skarnet.org>
Date: Thu, 12 Apr 2018 20:04:22 +0000

>- When using avltree_iter() we need to pass a avliterfunc_t_ref, which
> takes 3 arguments. If I understand correctly, first is the uint32_t
> index, then the tree's "internal" (unsigned int) index, and my
> pointer.

  "The uint32_t index" is confusing terminology, so to clarify, I call
this number "the data contained in the node".
  An avlnode only stores an uint32_t of data, it doesn't know anything
else; so that uint32_t is "the data". If you want to use the tree
to index anything else than a set of uint32_t, you need to store your
objects somewhere, for instance in a gensetdyn, and use the tree to
store the indices of your objects into the gensetdyn. That is typically
what the "external" field of the avltree is for: to store a pointer
to your storage structure, so you can access the objects to extract
keys from them (in dtok) and compare keys (in kcmp).

  But as far as avlnode/avltree is concerned, the uint32_t is "the data".
  So, the first argument is the data. The second argument is not the
internal index, which user functions should basically never see because
it's an implementation detail and totally irrelevant to them; the second
argument is the height of the current node (root is 0, direct child of
the root is 1, etc.) The third argument is the external pointer.


> So if I wanted, from there, to remove an item from the avltree, I
> could either use avltree_deletenode(tree, ARG2) or
> avltree_delete(tree, KEY) where KEY is the one from dtok(ARG1,p) --
> would all that be correct, or did I miss/misunderstood something?

  avltree_iter isn't meant to perform operations that modify the
structure
of the tree. It *may* work, but there's no guarantee it will - and
typically, if you modify the tree, later values for node height will
certainly be incorrect. I wrote avltree_iter as a shortcut to perform
simple operations such as printing the tree or calling a function on
the data contained in every node; it should not be used with a
tree-modifying function.

  If you need to iterate over a tree-modifying function, I think you
should write the loop yourself, because the complexity of writing the
loop is much lesser than the complexity required to handle tree
structure changes in the middle of a map.


>- So it means avltree_deletenode() needs the avltree's own/internal
> index, whereas gensetdyn_delete() needs "our" uint32_t index,
> correct?

  gensetdyn is totally irrelevant to the tree. (I mean, avltree uses
a gensetdyn internally, to store the avlnodes, but it is an
implementation detail you should not care about.)
  If you are using a gensetdyn to store your objects, then you will
want to use the gensetdyn's indices as data in the tree.

  avltree_deletenode() is a bad API: it uses the internal index to
the node, but the user should basically have no way to interact with
that internal index. I wrote that "just in case", initially thinking
it should, logically, be faster than avltree_delete() since it has
internal information, but it's 1. a YAGNI violation, and 2. wrong
anyway, because to delete a node you need to know the path from the root
to that
node, and just knowing the internal index doesn't give you that - you
need to go on foot and make your key comparisons from the root down,
which is exactly what avltree_delete() does.

  So: forget about avltree_deletenode() - thanks for the fix, I will
apply
the patch, but I will probably remove that macro altogether in a future
major release - and use avltree_delete(), which is the right API. And
the argument to avltree_delete() is the data in the node, i.e. your
uint32_t (that is probably the index of your object in your gensetdyn).

  My bad for creating confusion with a function that should not exist.


> I.e. if I have that uint32_t and want to remove things from both the
> avltree & gensetdyn (I'm using the later to store the actual data
> indexed via the former, btw) then for avltree I need to get the key
> (e.g. via t->dtok(u,p))

  First, use avltree_delete(), with that uint32_t, to remove the node.
Then, use gensetdyn_delete() to remove your object from your gensetdyn.


> (I ended up using avltree_delete directly, guessing you usually do
> that as well maybe?)

  Yes, that is the correct entry point for deletion.


>- And on a completely unrelated topic, is there really no way
> (macro/function) in skalibs to remove an item from a genalloc? I get
> it for stralloc, it's not really thought of as an array but a string,
> you rarely want to remove one char from somehere in the middle,
>whatever.
>
> But with a genalloc, I'd expect one to (often) be wanting to remove an
> item from the array, and not always/only the last one, yet I can't see
> how that's supposed to be done? (Again, as in via a macro or
> something, of course one can - as I do - handle the memmove oneself)

  Well, genalloc is really "stralloc for larger objects", and I try
not to implement for genalloc what I would not implement for stralloc.
If you often need to remove items from a genalloc that are not the last
one, then genalloc may not be the appropriate data structure, and you
may want a gensetdyn instead.

  Really. It has happened to me several times that I wanted to move
around objects in a genalloc, then realized a bit later that no, I
could not do that because I needed the indices to be stable, and
gensetdyn was the proper tool. Or that I had a reasonable maximum
number of objects, and a simple array was a better choice.

  I may end up applying your patch eventually, but to me it feels just
as odd to move objects in a genalloc as it would be to move chars in
a stralloc.

--
  Laurent
Received on Thu Apr 12 2018 - 20:04:22 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:38:49 UTC