It's not as worse as it sounds - bounded containers abundantly leaking are still better than bounded containers unboundedly leaking, so there's hope! And there are also unbounded containers that can at times be as good as to not even leak, it would seem. But to start from the start, let's go through some basics first, as to what these containers even are and how did it even get to such investigations into what they might leak or not leak, when, where and to what extent.
Containers in Ada are these useful data structures for holding other data, generally in an organized manner that may be quite a good fit for various uses: vectors, doubly linked lists, hashmaps, sets and so on. According to the Ada Reference Manual (ARM), the standard library provides two1 implementations for each type of container, with a different approach to handling the overall capacity: bounded (fixed, unchanging capacity) and unbounded (variable capacity changing as needed in use).
Before measuring anything, my own view of these two types of containers was quite simple:
- Bounded containers: having a fixed capacity known upfront means that bounded containers don't require dynamic memory allocation. Therefore, they are also the safer option to use whenever possible since there is no danger of memory leaks.
- Unbounded containers: having a variable capacity that changes transparently while in use means that unbounded containers rely on dynamic memory allocation and while absolutely needed at times, they are best used sparringly since they add to the complexity of the code and they don't provide any way to control or even query with respect to their actual memory allocation, deallocation and use at any given time. Moreover, in GNAT at least, they rely silently and entirely on the automatic mechanism for memory deallocation, the Finalization, which is great in theory indeed but turned out quite a no-go at times, in practice.
Such was the view based on what things are described as, combined with some limited experience using them. But then this whole got put to the test in the usual manner - views are to be acted upon if they are to be anything, not just "expressed" or "held". Meaning that I took the above to inform my implementation of Ada code at all times. As a result, I had quite soon enough concrete and even stress-testing use of code with bounded containers to start noticing some unexpected inconsistencies: far from being entirely predictable in terms of memory use, code that relied entirely on bounded containers and never touched in any way any unbounded ones, was quite clearly ever-expanding in its memory use and needs.
Inconsistencies being the cleavage that they are between expectations and reality, one can certainly stare for quite a long while at them but that doesn't heal nor solve much at all. So I set out instead to test and actually measure as much as possible what was really going on with each type of containers and with that pesky Finalization to boot.
First of all, it turned out that the GNAT bug previously found had less to do with either Iterators or Finalization specifically and more to do with the subset of restrictions that are enabled (or not) at compile time. In fewer words: the same code compiles and runs perfectly fine when there are no restrictions but as soon as restrictions are added, it's anyone's guess what the result might be - some specific subset or the way in which restrictions combine result in the shown bug.
Good luck tracking down the exact subset of restrictions or concrete problem when a GNAT bug pops out at times depending not even on a single restriction but on a subset of such and how they end up interacting towards the final result. The only practical way forwards with this is to note that restrictions turn out in practice more of a bug than a feature if one wants to use the full set of the standard library code otherwise. So the reality is that restrictions may be of use only in very tightly controlled cases but otherwise in general one is best served to compile without any restrictions and leave it at that.
Second, and compiling therefore everything without any restrictions, here are the results of calling in a loop 10`000 times a procedure that creates in turn 1000 local containers with a minimum of one element of their own, when using standard library unbounded and bounded hashmaps, with and without a further hashmap of hashmaps, too2:
- bounded hashmaps, with hashmap of hashmaps: LEAKING memory: Virtual3 use from 6788kB to 9916kB, Real4 use from 720kB to 4088kB.
- unbounded hashmaps, with hashmap of hashmaps: LEAKING memory: Virtual use from 6704kB to 6836kB, Real use from 748kB to 1084kB.
- unbounded hashmaps, with hashmap of hashmaps, with clear explicitly called at the end5: LEAKING memory: Virtual use from 6804kB to 6936kB, Real use from 780kB to 1108kB; PEAK virtual use from 6804kB to 6936kB.
- bounded hashmaps, without hashmap of maps: LEAKING memory: Virtual use from 6788kB to 9912kB, Real use from 720kB to 3936kB; PEAK virtual use from 6788kB to 9912kB.
- unbounded hashmaps, without hashmap of maps: NOT leaking memory: Virtual use from 6804kB to 6804kB, Real use from 772kB to 896kB; PEAK virtual use from 6804kB to 6804kB.
At least the inconsistencies were consistent, as it happens - as surprising as this might be, unbounded containers are in fact more reliable than bounded ones, in all cases. And on top of this, if one actually uses any container, whether bounded or unbounded, with elements that are themselves containers, memory will leak anyway - it's only a matter of how big a leak, not of no leak, with bounded containers abundantly (or even unboundedly?) leaking, while unbounded containers at least leaking way less.
In practical terms: restrictions are best unused and then6 unbounded containers are better than bounded ones in all cases, offering no leaks when elements are plain enough and at least minimal leaks where elements are themselves containers.
For completeness, the test code used above: memleaks_libcontainers.tar.gz
Keksum: 87d831b1e8d2615252b7f7f8f06e3f72e293c613b980d1634814b39ca2fc008ca010d8bd850bf9fefcf49b165e5a669d5c62650becc68c35f9163d4ac8d697dd
If one looks at the type of data that can be stored in such a container, there is a third type, the "indefinite", which allows in principle more flexibility with respect to data types it can handle. These are however not relevant for the discussion here since they are quite entirely separate from both bounded and unbounded containers otherwise. ↩
Memory use as obtained from parsing /proc/self/status ↩
VmSize ↩
VmRSS ↩
This was done with the small hope that perhaps an explicit clear will help - it doesn't. ↩
Only then, though. With restrictions in, the unbounded containers are quite useless. ↩
Comments feed: RSS 2.0