Ossa Sepia

May 4, 2019

Eulora's Client Core - Basic Docs

Filed under: Coding, Eulora — Diana Coman @ 8:56 p.m.

As I'm doing anyway the work required for a saner Eulora client, I'm stuck as well with writing a minimal documentation for it, since on one hand there's no one else who can quite write it anyway and on the other hand there is no such thing at all1 for the old piece of inherited gnarl so the slate is blank, the space infinite and the need impossible to know in advance. So I'll just jot it down here for future reference and if you have anything to say about it, the sooner you write it down in the comments section below the better for you.

As general structure, the current plan for the new client is to have essentially 2 main parts: one is the actual Core, written in Ada and - hopefully2 - provided as a standalone library, blissfully unaware of ANY of the C/CPP legacy tangle; the other is an updated version of the current C/CPP tangle of a "client" that will use the Core for pretty much everything except the actual GUI. The GUI remains provided via CrystalSpace and cal3d, as it currently is. In other words, the Core offers an implementation of a basic Euloran client, to be used from/by any UI, graphical or otherwise as you care to make for yourself. Note that the interface is NOT the only thing you can build on top of the client Core. Essentially any client logic, bots and anything of the kind would still be the sort of thing built on top of / around the Core. A quick diagram of the whole thing looks like this:
2019_05_04_clientcore_struct_narrow

The Core itself is currently made of 3 conceptual layers, in order from bottom to top: encryption utilities (EuCrypt), SMG communications (SMG_Comms), client data cache. The encryption utilities are those of EuCrypt: CRC32 calculation, Serpent encryption/decryption, Keccak encryption/decryption, RNG interface using Fuckgoats, RSA key generation, RSA encryption/decryption using TMSR OAEP schema. The SMG communication part provides mainly: UDP with fixed-size but configurable payload, in/out message queues and attached "consumer" tasks, config file reader, packing/unpacking (i.e. raw, bottom-most layer of SMG communication protocol), read/write of messages conforming to SMG communication protocol specification, thin layer on top for encapsulating a "communication link" i.e. required addresses, ports and keys for both ends. The client data cache keeps effectively the whole world of Eulora as the client knows it at some specific time and provides thread-safe getters and setters for anything and everything that it can possibly hold.

Note that the client's knowledge of the world is by definition a cache as it flows always from the server and entirely in response to client's own requests - there is by design no "push" of information or anything else from server to client at any time. So the client may request from the server anything and everything it wants to3, having however to decide for itself what to make of the answers it receives and even what "obsolete" means in various contexts. At the moment this cache is still in the works and for this reason the least fully-fledged but at any rate, it has to grow around the structuring of data that is discussed in the SMG protocol specification: a potentially infinite hierarchical structure of "objects"4 having each a subset of an overall set of potential "properties"5 with their corresponding "values"6. Due to this trinity of objects-properties-values with its corresponding trinity of ID spaces, I took to calling the whole thing a tricache but the name as the design are still in the works so they'll have to enjoy their own detailed description at a later date. Until then, what have *you* been working on and how is that going?


  1. Perhaps you count this spaghetti piece as documentation but even so, just note who extracted it in the first place, anyway. 

  2. I'm holding quite tightly to this idea as I certainly do NOT want to have it depend on any of the current C/CPP mess. This being said, I can't know in advance what I might still uncover so for now and until it's all done, the qualifier remains "hopefully" rather than surely. 

  3. This in practice is of course quite severely restricted by what it *knows*, unsurprisingly. 

  4. Where "world", "map", "actor" or "pot" are equally objects and not even the only objects. Essentially an object is anything that is present in the game's world. 

  5. A property is a pre-defined and public structure that specifies the *type* of values that make up the property and their actual meaning. 

  6. The values may be given directly (e.g. "3" or "somename") or by reference aka as an ID. This ID may be itself either the ID of a suitable object or the ID of a value that is essentially too big to pass around every time - in the protocol's specification this sort of ID is currently called for historical reasons "filename" but it's worth noting that there is no imposition for it to be stored as a file at any point. 

April 17, 2019

Reading Notes on FFA Ch1 - Ch4

Filed under: Coding — Diana Coman @ 7:57 p.m.

While Stan says - and I believe him - that each chapter of FFA is meant to take one somewhere between 1 and 2 hours to grasp, I can confidently say by now that either I don't do grasping at all1 or I'm a very, very slow grasper/learner (quite possible at that). Sure, I could always *read* one FFA chapter in 1-2 hours and even with pen and paper but it took way longer to get to the point where I'm happy to sign it. And yes, it's also perfectly true that *now* it takes me about 1 hour give or take to go *again* through one of the FFA chapters I have already studied but that's the sort of "oh, 3 hours only; after how many years?" timing. While part of this "speed-up" is inevitably down to familiarity with the material, there is also a large component of pitfalls that I can avoid now, after having been in and out of them before. And while I tend to at least not fall for the same mistake twice, this is not to say that having the obtained notes written down in one place is useless - at the very least, they will serve me as a quicker way of getting back into FFA at a later stage, a bit of quicker-loading of sorts.

The structure of FFA itself is quite logical but I still kept stumbling over / forgetting bits and pieces as I went. So I made myself a map of the FFA packages, updated with every chapter and kept handy for easier reference. I found this map especially useful as a way of making sure I don't miss anything since otherwise the FFA series is (by design, I think) quite strictly presented from the bottom up and I tend to struggle to keep in memory all moving parts unless I can link them together even in the loosest of ways to some bigger picture2. So here are my FFA3 maps so far with whatever observations, comments and various nitpicks of all sorts I still have attached to each chapter.

Chapter 1

This chapter introduces the very foundation of FFA: what is a FFA number (FZ by its name), what is it made out of ("Word" components), how can one do - in constant time, of course - basic operations at Word level and very simple arithmetic operations (+ and -) with whole FZs. Crucially, this chapter introduces FFA's approach to implementing "if" without actually branching: the Mux selector. Notes and nitpicks:

  • Packages are Ada's units for structuring code, hence the units in the diagram below are exactly that: Ada packages. The ".ads" only packages happen to not have a body - this means simply that they group together a set of definitions (types, constants, variables or even imported procedures/functions) but do not require/have anything else.
  • In Words.ads I puzzled at first over why not use for Byteness both Iron.MachineBitness and Iron.ByteBits rather than using one constant from Iron and one local to the Words package, especially since the local value is defined precisely equal to the one in Iron and I still get confused over simple names like that. I suppose the idea is though that this is not necessarily true if one wants to have a Word that doesn't directly match the underlying iron (or rather the other way around: if one runs the program on whatever non-standard iron). Alternatively I suppose one can simply define "MachineByteness" in Iron as well and then simply define Words.Bytness = MachineByteness. It's just the mix of the two that pokes me in the eye even though it's just such a tiny detail.
  • In FZ_Type my inner maniac keeps asking: why Word_Index and not FZWord_Index, why Word_Count and not FZWord_Count given FZBit_Index? Sure, there is no other word_index than the one in a FZ (while there are bits in a Word too for instance) but each tiny non-standardised name adds to the number of things I need to remember and my memory for this sort of stuff is pretty much entirely absent. So for all the additional, annoying task of having to write FZWord_Index every time instead of the shorter Word_Index, I'd still rather do it and standardise the names: if it relates to a FZ, then it starts with FZ (or any other standard really, as long as it doesn't have exceptions).
  • The W_Mux is a FUNDAMENTAL mechanism of FFA: effectively it is "if" the FFA way (i.e. constant-time and without actual branching). So it's worth studying this in more detail if you are not familiar with the way a mux works.
  • The WBool type can be at times a bit ambiguous in use since it's both a true/false sort of thing by definition (1 and 0 are the only values it can take) but also the direct result of arithmetic operations. The key to this is of course the fact that the arithmetic operations in question have a 1-bit long result (or look at only 1 bit of a larger result) and so their possible range of values is restricted to 0 and 1.
  • In W_Shifts the Amount parameter is everywhere of Natural type, presumably to fit the import requirements. However, given the intended use as part of FFA, I think this should logically be one of the Index types defined and most specifically WBit_Index (from Words.ads) since those shifts are done at word level (hence any shift by more than WBit_Index'Last is just the same as a shift by WBit_Index'Last anyway).
  • A crucial observation regarding indices and counts: a FFA number (i.e. a FZ) can have by definition any range of indices for its component words (e.g. an FZ could be composed very well of 3 Words on positions 4, 5 and 6 in principle) BUT it has to have at any given time AT LEAST 1 word. This of course is perfectly sane since there is no such thing as a number of 0 length (0 itself has length!) but it's worth noting it since it comes into such wonderful contrast with previous sad "bignum" implementations that confuse 0 with "doesn't exist".

ffa_ch1_struct1

Chapter 2

This chapter adds logical operations (i.e. "predicates") at Word level and new operations with FFA numbers (FZs), most notably logical operations, bit operations and comparisons. A crucial part to get here correctly is the convention used throughout FFA for the order of words and bits. While "endianness doesn't matter", nevertheless "set_head", "get_head", "shift_left" and "shift_right" do imply a direction and messing with *that* turns out to be for me a lot of pain. After quite a few times tripping over this in various places, I found I'd rather NOT use Stan's convention that "most junior bits on the left" as that just doesn't work for me: the "FZ_Get_Head" returns "lowest, bottom-most word" wtf head is bottom!!; the directions for shift_right and shift_left are inversed (since their names fit the "most senior bits first" representation) and moreover one is sort-of having to think of the whole number in reverse representation all the time. The ONLY reason I can see for this is so that indices increase from left to right but honestly, it's way, way easier for me to count from right to left (or in decreasing order!) and keep everything else in place than to keep the counting direction in place at the cost of having to remember repeatedly that the "direction" of all sorts is the opposite to what is written down. So I go with the following approach instead:

  • FZ for me are represented with most senior words and bits on the left. If you prefer, call it big-endian. I'll just call it human-convention. The machine doesn't care either way but it helps me tremendously to understand what goes on (see below).
  • Given the above, all bit-level and word-level operations actually fit the direction given in the name: a shift_left moves numbers to the left and therefore to higher positions, increasing the value of the FZ (i.e. it's a multiplication by 2^Amount); a right_shift moves to the right, to more junior/lower positions, decreasing the value (a division by 2^Amount).
  • The index (if it starts at 0 anyway) neatly matches in fact the power of the base corresponding to that position if one was to convert to denary. So it's no additional load for me at least to look at it as increasing from the right to the left.
  • I think Get_Head/Set_Head would be better named at the very least Get_Bottom/Set_Bottom or, even better, Get_Junior/Set_Junior or similar. I don't want to... turn the left-right convention on its head just to have in there also the head/bottom convention. Instead, I think it's best to stick ideally to the *role* of the Word/bit retrieved (i.e. most junior) rather than marrying it to the position in any given representation and if this ideal is too much to ask for then at least as the next-best keep it within the already imposed convention of left-right and NOT add yet another convention to the mix4.

ffa_ch2_struct

  • Most of the procedures involving 2 FZ have as precondition that the lengths of the 2 are the same since otherwise comparisons and copies hardly make sense. While this makes sense in itself each and every time, it seems to me that FZ_Mux for instance *also* relies on the FZ having the same *range* of indices. While I'm sure that the later FFA use of FZ_Mux does not end up feeding it FZ with different ranges, I wonder whether there is any significant penalty to making FZ_Mux NOT rely on the FZ having the same range? Or is there some reason that I'm missing for requiring the same range?
  • What is exactly the need for FZ_Get_Head and FZ_Set_Head anyway? I could see the idea that it's a way to ensure that everything remains as it is even if one decides to move the head somewhere else (to the end) but for one thing I find them confusing (perhaps my fault, sure) and for the other FZ_And_W for instance still uses N(N'First) rather than FZ_Get_Head(N)/FZ_Set_Head so changing the position of the head would still require some changes in other code.
  • Why is fz_cmp a separate package? As far as I can see, the contents of this one are still FZ predicates so I'd put them totally fine into FZ_Pred. I'm not sure that the distinction unary (fz_pred) vs binary (fz_cmp) predicates really requires a different package or that FZ_Pred is growing way too big if it swallows up FZ_Cmp.
  • Perhaps make all the packages related to Words start with W_ so that there isn't a Word_Ops but a W_Pred and W_Shifts?

Chapter 3
This chapter adds the shifts for whole FZs. Once I remembered to stick to "most senior words/bits to the left", the rest is quite straightforward to understand. And I don't even have any nitpicks left for it, can you believe that?
ffa_ch3_struct

Chapter 4
This chapter focuses on introducing FFACalc and as a result the changes/additions to FFA lib itself are minimal: a new subtype "Nibble" in words.ads; FZ_NZeroP in fz_pred; WBool_To_FZ in fz_basic; a new package for enforcing/checking valid bitness, fz_lim. The difficulty of this chapter is mainly in getting used to FFACalc. As with the rest of the FFA series, the best way I found was simply to sit down and try and write myself the code that would do one or another thing - it turns out that there aren't all that many significantly different approaches given the existing building blocks and moreover, any differences are informative, usually of one's own mistakes/current lack of understanding. So without further nitpicks, the updated diagram looks like this:
ffa_ch4_struct

Update 18 April 2019 - the diagrams above are made with Dia so here are the .dia files if you want to tinker/change them for your own use: ffa_ch1_struct.dia ; ch2.dia; ch3.dia; ch4.dia.


  1. Perhaps I do only bulldog-style vicious hold-and-never-let-go. It goes with attention to detail and various other traits in my inventory. 

  2. A bottom-up approach has its advantages and it's not in itself a problem at all, of course, except that one needs to keep going for quite a while without really knowing where they are going. It's not even a matter of trust - I certainly trust Stan that he knows where he's going and he even drops clear hints on the way. It's simply that I find it easier to follow and retain the detail if I can also place it even vaguely in some bigger picture. On the bright side, I can say I have *plenty* of practice with the strictly bottom-up approach since my Romanian education followed precisely this approach and nothing else to the point that it took more than 10 years of school for some things to finally click as belonging *somewhere*. So yes, by comparison to that, FFA is far, far less demanding of the learner's trust and patience. But my overall learning experience has always shown that it helps - me at least - to have a more balanced approach. 

  3. I focus on the FFA lib itself so the Demo/Peh are so far not included. 

  4. I'm aware that there are and can be as many conventions as there are mouths to emit them. Still, every new convention added to the mix has a cost and I'd much rather keep this down as much as possible. First of all for myself, yes. 

March 15, 2019

EuCrypt Chapter 16: Bytestream Input/Output Keccak

Filed under: EuCrypt — Diana Coman @ 4:30 p.m.

~ This is part of the EuCrypt series. Start with Introducing EuCrypt. ~

Keccak1 suffers from a bit/byte issue: while internally the actual transformations work at byte-level, the input is taken bit by bit and expected to be in LSB order2 while the output is offered again bit by bit but coming out in MSB3 order if taken byte by byte. Moreover, the padding applied to any input to bring it to a convenient length is again defined - and even applied - at bit level rather than byte level. While originally I discussed this issue in more detail and systematised the options available for Keccak to get some clarity and make a decision on either bit-level or byte-level, the actual implementation followed as closely as possible the original specification retaining bitstream (i.e. bit by bit) input and output while doing internally the transformations at byte level, as confusing as that was. As soon as this implementation was put to actual use though, it became clear that the bitstream part really has to go because it causes huge waste (8x stack-allocated space for any input, quite correctly described as exploding) and trouble in the form of overflowing the stack even for relatively small inputs. So this is the promised .vpatch that updates Keccak to work on bytestream input and produce bytestream output, getting rid of the x8 waste and effectively choosing options 1.1, 2.2 and 3.2.

The obvious change is to convert all the bit* to byte*. This includes constants such as the Keccak rate that is now expressed in number of octets, types such as bitstream that becomes bytestream and functions such as BitsToWord that becomes BytesToWordLE. Note that the internals of the Keccak sponge (i.e. the transformations) are unchanged since they weren't working at bit-level anyway. The less obvious change is the addition of bit-reversing (via lookup table since it's fastest) - this is needed to ensure that Keccak receives and produces at octet-level the same values as it did at bit-level. Specifically, this means that input values on Big Endian iron will be bit-reversed for input (since input is expected LSB) and obtained output values on Little Endian iron will be bit-reversed (since output is extracted MSB). It's ugly but so far it's the only option that doesn't change the Keccak hash essentially.

The .vpatch contains those main changes:

  • Bit reversing: there is a lookup table with the corresponding bit-reversed values for any byte (i.e. 0-255 values mapped to corresponding values with the other bit-order convention). So the bit-reversing of a byte is simply a matter of reading the value from the table at that index.
  • Input: bytestream instead of bistream so effectively an array of octets. Because of Keccak's LSB expectation re input bits, the BitsToWord function became BytesToWordLE meaning that it will reverse the bits on Big Endian Iron. The Padding is also expressed at byte level in LSB format (so the last byte is 0x80 rather that 0x01.
  • Output: bytestream instead of bitstream. Because of Keccak spitting out MSB when output is extracted at byte level, the WordToBits function became WordToBytesBE meaning that it will flip the bits on little endian so that the iron sees the same value as a big endian iron would.
  • Tests: there is an additional test that checks the values in the bit-reversing table for correctness, effectively calculating each of them and comparing to the constant; other than this, the same tests are simply updated to use bytestream/bytes as input and output; as in the previous version, there are also tests for calculating a hash or using Keccak on a String, quite unchanged.

The .vpatch for the above can be found on my Reference Code Shelf and is linked here too, for your convenience.

As I don't have any Big Endian iron around, I couldn't test the above on anything other than Little Endian so if you are looking for something easy to help with, you've found it: kindly test and report here in the comments or on your blog (a pingback will be enough or otherwise comment here with a link).


  1. As originally specified by Bertoni et al. 

  2. Least significant bit. 

  3. Most significant bit. 

March 11, 2019

Compiling Crystal Space with SJLJ

Filed under: Coding — Diana Coman @ 12:42 p.m.

Miserable failures and unwanted problems make great blog fodder1 and so here I am, writing yet another set of compilation notes. This time it's all about how to ditch the non-standard ZCX run-time for its well-behaved and otherwise similarly fast ancestor, SJLJ.

My project to compile is fairly complex as it links dynamically with a bunch of libraries and it includes code in C, CPP and Ada all together. The first naive attempt at simply setting the PATH environment variable to point to the ZCX GNAT (with dynamic linking) and then recompiling the server code itself (i.e. without recompiling the libraries) failed at linking stage:

Error relocating ./euserver: _Unwind_SjLj_RaiseException: symbol not found
Error relocating ./euserver: _Unwind_SjLj_Unregister: symbol not found
Error relocating ./euserver: _Unwind_SjLj_Register: symbol not found
Error relocating ./euserver: __gxx_personality_sj0: symbol not found
Error relocating ./euserver: _Unwind_SjLj_Resume: symbol not found
Error relocating ./euserver: _Unwind_SjLj_ForcedUnwind: symbol not found

The good thing about those errors are that they confirm at least that the SjLj run-time is indeed to be used, rather than ZCX. So the correct intentions are there but the result is still miserable failure. A quick report in the forum yielded immediate help as Stan kindly provided the very useful observation that such errors at the final linking stage can only mean that it is still trying to link with a non-SJLJ gcc standard lib. Outrageous and unpermitted and how does it dare2!! A poke of the executable euserver by means of ldd revealed that indeed, it is apparently linking the ZCX libstdc++.so.6 and libgcc_s.so.1 and all of that because of the pesky Crystal Space (CS) library! After a bit of initial doom when I thought I might need to recompile the whole bloody toolchain manually with --enable-sjlj-exceptions or such something, some rummaging around in the SJLJ-istic GNAT's directory revealed that it actually has already everything that's needed - if only they'd be linked instead of the ZCX-isting versions! So it came as a relief really that it's "only" CS3 that needs to be recompiled too4. Onwards!

A quick look through CS's makefile made it plenty clear that the naive approach of a simple .gpr file is not going to work: basically CS suffers deeply from ifdefism since it caters to all-and-everyone it can think of. So at the moment no option other than going at it with simply the PATH set properly to the SJLJ GNAT and then running: make clean; ./configure --without-java --without-perl --without-python --without-3ds --with-cal3d=$HOME/eulora/cal3d . And then all of a sudden, the configure reports that it couldn't find zlib although obviously, zlib did not run away anywhere. Moreover, a closer inspection of the full output of configure reveals that it *does* actually find zlib (so it's not a matter of "don't know where it is") but it reports it as unusable as it can't find apparently the ...headers. After various attempts at feeding it the path to the headers in various ways5, I got a bigger hammer and simply ran (making sure that it is PRECISELY the same version I had otherwise in the ZCX-istic toolchain):

curl -sL "http://www.zlib.net/zlib-1.2.11.tar.gz" -O
tar -xzvf zlib-1.2.11.tar.gz
./configure --prefix=$HOME/eulora/zlib
make
make install

Back in the CS dir, I ran again:

jam distclean
./configure --with-z=$HOME/eulora/zlib --without-java --without-perl --without-python --without-3ds --with-cal3d=$HOME/eulora/cal3d
jam -aq libs plugins cs-config

The configure above worked fine and it reported that it did find zlib this time and it's usable and all is good. And then the jam went ahead and compiled all sorts, some of them even *with* #include < zlib.h > so obviously all was well, right? Except:

In file included from ./include/csutil/archive.h:36:0,
                 from cs/plugins/filesys/vfs/vfs.cpp:39:
./include/csutil/zip.h:32:18: fatal error: zlib.h: No such file or directory
 #include 
                  ^
compilation terminated

So VFS6 is brighter than all the rest and it can't find zlib. Moreover, a manual run of the very same compilation line with added -I$HOME/eulora/zlib/include completed absolutely fine and then the jam -q libs plugins cs-config went through without any trouble at all. The rest was simply waiting for the rather slow and bloated build of CS. So at this point, one would need to chase about in the config and jam files of CS and vfs where is the place to add the missing include flag. Some searches through the whole autoconf+configure+makefile+jamfile revelead that the ZLIB.CFLAGS and ZLIB.LFLAGS are set correctly for Jam to use so the problem seems to be simply that the vfs plugin is not aware it needs to use them. Nevertheless, I am not very keen on spending even more time essentially learning more on all those tools that are set to be discarded as at some point everything will have to move to GNAT and gprbuild anyway. So for now there it stands, ugly as it is: manually re-run that compilation line for vfs and be done with it7

Once CS was compiled with ZCX, the rest went perfectly fine, server and tests and everything else. Do remember to fully clean your previous compilation though before recompiling for changing from ZCX to SJLJ as otherwise all sorts of weird errors will appear. With gprbuild that's as simple and fast as running in your project's directory where the .gpr file is:

gprclean -r
gprbuild

And that's all! I happily have the server using SJLJ and so I can finally go up rather than down a few levels, up this pit of unexpected troubles and back to what I was actually doing - migrating both server and client to Eulora's new communication protocol so that there can finally be a text client too and everything one wants!


  1. Not that I am really lacking either problems or blog fodder but at this rate I'll end up with a compilation notes category all by itself. Other problems, failures and assorted troubles want a place in the sun too! 

  2. Especially since all the .gpr files of the server code specifically include the option --nostdlib to make sure the standard lib is not included and then a direct specification of the musl, ZCX rpath and dynamic-linker to use. 

  3. I recompiled also Cal3d since CS depends on Cal3d but that was simply a matter of cleaning properly the previous compilation, configure and recompile. Cal3d compiles even with a basic .gpr file so by comparison to CS it's absolutely wonderful. 

  4. In hindsight of course it has to be recompiled since CS contains all but the kitchen sink and specifically for this part of interest all sorts of threads and thread management "utilities" that pull in all sorts.  

  5. A run of ./configure --help said that one can use --with-z=path-to-zlib to directly tell configure where zlib is; very nice, except it still wasn't enough. 

  6. Virtual File System; inside a supposedly graphics engine; you have no idea. 

  7. CS is really to be compiled only once per system and that's it. If it becomes something I need to compile more often then it will have to be trimmed and ported to gprbuild anyway. 

March 4, 2019

GNAT Compilation Notes

Filed under: Coding — Diana Coman @ 4:55 p.m.

Ada is *the* republican programming language1 and GNAT is its compiler but the compiler of GNAT itself is... another GNAT! While this would mean at first glance a self-contained environment and the end of headaches once GNAT + Ada are fully digested, the recent surprises with GNAT's default disdain for the Ada standard show that some parts are likely to be simply indigestible. Such parts - ZCX run-time especially - will probably end up at least sidelined if not directly and explicitly discarded by an emerging republican standard. Still, as part of investigating those unwanted surprises and especially the actual differences between ZCX and SJLJ, I gained a lot more experience with compiling GNAT itself in all sorts of configurations and on various environments. And since I promised to publish my GNAT compilation notes, it makes sense to publish at least the complete summary to have it in one place for my own future reference2.

GNAT Versions

  • The initial GNAT that made it into TMSR and served so far rather well is Adacore's 2016 GNAT GPL, mirrored previously as part of my work on EuCrypt. The mirrored version is compiled with both ZCX and SJLJ run-times, allowing one to easily switch between them with a simple flag, namely: --RTS=sjlj given to gprbuild3. ZCX is the default so compiling without the --RTS flag simply means that ZCX is used.
  • Ave1's GNAT version 2018-01-17: this uses Adacore's 2016 GNAT only to bootstrap the compilation of a republican GNAT linked against the MUSL C library. Note that the shitgnomes have already messed up some of the links in the scripts there - according to my notes the current scripts fail with a "not found" error at fetchextract http://ftp.gnu.org/gnu/gcc/gcc-4.9.adacore2016/ gcc-4.9.adacore2016 .tar.bz2 . Obviously, the pill to this is to have everything already downloaded and only then run the script (i.e. no fetch, only extract).
  • Ave1's GNAT version 2018-04-30 adding support for ARM architectures.
  • Ave1's GNAT version 2018-05-15 adding parallel build support that effectively cuts significantly the time required for the full build4
  • Ave1's GNAT version 2018-05-29 moving towards supporting only static linking (but not quite there yet, see next version).
  • Ave1's GNAT version 2018-06-01 ditching dynamic linking and therefore building only statically linked programs.
  • Ave1's GNAT version 2018-09-24 fixing an issue with a hard-coded path.

As it can be easily seen above, there are essentially only three main versions: the original Adacore GNAT (glibc-based), Ave1's musl-based GNAT supporting dynamic+static builds (i.e. version 2018-05-15) and Ave1's musl-based GNAT supporting only static builds (i.e. latest version, 2018-09-24). Ideally one would be able to ditch dynamic linking entirely and simply work only with Ave1's latest version but so far I still have on my hands all sorts of code that is far from ideal.

Compilation Options

  • To make use of your computer's cores and speed up the build time: export MAKEOPTS="-j8" (obviously, adjust the value depending on your number of cores).
  • To build effectively offline5: download all the needed tarballs and unpack into the tarballs directory. Since I did this, I went one step further and also simply deleted the scripts download-adacore*.sh as well as their calls from the build-*.sh scripts but this is entirely up to you of course. NOTE: currently the 2018-09-24 seems to have a problem with all the tarballs already in place - my latest run failed complaining that the bin dir doesn't exist inside the created bootstrap, specifically: ../../extra/../build.sh: line 148: cd: /home/eu-test/adabuilds/build510/bootstrap/bin: No such file or directory. This requires deeper digging and Ave1 will perhaps help me out through the maze of scripts in there.
  • To build with SJLJ run-time (NOTE: you'll obtain ONLY SJLJ so no switching with --RTS):
    1. Add to extraconfig.sh the following 2 lines:

    GCC_BOOTSTRAP_CONFFLAGS="--enable-sjlj-exceptions"
    GCC_CONFFLAGS="--enable-sjlj-exceptions"
    

    2. Change the setting of ZCX_BY_DEFAULT from "True" to "False". Ideally this would be changed via a Makefile but so far I wasn't able to find the Makefile that actually works for this. So I hacked it for now: unpack the gcc-4.9.adacore2016.tar.bz2 archive from tarballs/ ; the files of interest are in gcc-4.9.adacore2016/gcc/ada namely system-linux-* (one for each architecture - I modified all of them): change in those ZCX_BY_DEFAULT to False; pack the tar.bz2 archive back (e.g. tar -cjvSf gcc-4.9.adacore2016.tar.bz2 gcc-4.9.adacore2016); run the ./build-ada.sh script and that's it.

So far I was able to compile the following:

  • 2018-09-24 (aka latest, static ONLY): ZCX version using existing Adacore's GNAT; ZCX version using previously compiled 2018-09-24 ZCX version (i.e. itself!)
  • 2018-05-15 (aka older, supporting both dynamic and static linking): SJLJ and ZCX versions using existing Adacore's GNAT; SJLJ and ZCX versions using previously compiled 2018-05-15 on same machine (i.e. without running into the paths issue).


  1. For a summary of "why Ada", read Mocky's Log Reference on it. 

  2. Or so I hope - that in the future the reference will save me from gaining even *more* experience of the same sort that I gained here. Ever the optimist that I am, I know. 

  3. gprbuild --RTS=sjlj 

  4. I could build GNAT in a bit over 1 hour on 8 cores. 

  5. I recommend this for at least 3 reasons: you know what you are building with, since you put those tarballs there - hopefully you checked them; you avoid any trouble with broken links and similar shitgnommeries; it's faster. 

March 3, 2019

Inconsiderate Considerations or a Tiny Euloran Dataset on Blueprints

Filed under: Eulora — Diana Coman @ 8:57 p.m.

Blueprints in Eulora are currently the product of expensive clicks: millions go in to pay only for the most inconsequential of tokens, thousands come out and few of the blueprints obtained are what one wanted to get anyway. In typical euloran fashion however, this exercise in the frustrations swamps is anyway on one hand unavoidable and on the other hand potentially, eventually, greatly rewarding1 - with a bit or a bit more of luck and other assorted arcana. That being so, I let my luck do whatever it does and otherwise I set about to collect in one place a tiny bit of data on what comes out of where. About 200mn ECu later, here's what I've got when clicking them with a noob2:

Consideration Type3 Consideration q4 Bundle q Output q Blueprints obtained5
10 x Apprentice Tinker Considerations 179 37006 34 LTF, Slag, CT, CC, PPB, IO, DUH, SCS, BCT
10 x Apprentice McGuyver Considerations 47 32877 17 QF, POC, Rockpile, CSW, GT, PC, CB, CDS, ETD, RH
7 x Neophyte Gumbo Considerations 152 7528 14 NP, TT, FT, TF, WOA, BBB, ACG, BNG, CP, CF
10 x Neophyte McGuyver Considerations 2670 18389 91 SFH, IT, PS, MK, TM, BH, RH, CB, POC, HG

Conspicuously missing from the above, the very blueprint that I really wanted to see, namely BMS10. The BMS would be in the McGuyver line but 10 Neophyte clicks and 10 Apprentice clicks failed to produce even 1 single bp. Rather sadly, the Apprentice McGuyver clicks still produced the useless Caveman Bedroll blueprints and other such shit (Reed Hauberk!) that one has too much of, even just from Neophyte clicks anyway. It's totally unclear why exactly would some blueprints be SO common even though they are not necessarily the cheapest ones: take for instance the ton of LTF (4070 bv) or CT (1404 bv) blueprints obtained (0.1mn of each) and compare that with the 13 total BCT (540 bv) blueprints! If anything, it would be the cheaper blueprints that are harder to get but then again, I got tons of "Somewhat Fashionable Hat" and "Hoof Gloves" useless blueprints from the McGuyver line and those are the cheapest in that line (656 bv and 622 bv respectively).

Assuming that those rare bps aren't simply unobtainable at this moment for whatever reason, the obvious conclusion is that those considerations are rather inconsiderate of the ~200mn base value sunk into them and won't reveal in return even the very basic of *what* blueprints should one expect from where. Then again, it's not *consideration* you want from a good game, is it?


  1. Different players might find different things rewarding but a basic reward would be the rare serious "popping" i.e. obtaining for once millions out of thousands rather than the other way around.  

  2. the ~only way I have to actually get a wider spectrum of bps since clicks with Foxy end up high quality output and 1 type of bp in the usual cases; in the rare case, Foxy can in principle get more bps too but here I wanted to see precisely WHAT bps one gets from where so I suppose it's a gamble on the types more than the overall value itself. 

  3. Each crafting line in Eulora has its own Considerations line. Each line has several levels of which so far there are three seen: neophyte, apprentice and journeyman. 

  4. Quality of the Consideration blueprint used for this click. 

  5. They are ordered based on quantity obtained, from high to low. 

  6. ~=6.8mn base value. 

  7. ~=6.05mn base value 

  8. ~=1mn 

  9. ~=2.83mn base value 

  10. In the missing and becoming ever so rare range there would also be the CFT bp from Tinkering but at least I did not do a Neophyte Tinkering click this time... 

February 28, 2019

ZCX vs SJLJ - Data Set

Filed under: Coding — Diana Coman @ 5:37 p.m.

This all started with a misbehaving Ada program: the one that keeps "killing" its tasks with blind bullets that do precisely and exactly nothing. The horror of having code that refuses to actually abort its own spawned tasks is amply discussed in the logs linked above but I think it's worth pointing out the thick layers of horrid at play: it's not only (as if that wasn't enough by itself) that it won't abort its tasks but also that it does this on the quiet as it were - the abort code is effectively hijacked to do nothing for as long as one uses the default ZCX1. Now why exactly is ZCX the default given that it's not providing what the Ada standard says it should? And what is anyway its rationale2 for chopping away bits of the standard? No idea really, the docs are silent on this: it's apparently enough bother that they admit at least that indeed "the ZCX run-time does not support asynchronous abort of tasks(abort and select-then-abort constructs)" and otherwise throw in there an unsubstantiated statement that "most programs should experience a substantial improvement by being compiled with a ZCX run-time." Why would they bother with rationale or with justifying decisions - nobody ever asks for or indeed reads such things anyway as they are way too complex, aren't they?

Once the culprit was identified as ZCX, a quick test3 showed that the older SJLJ4 is indeed perfectly capable of killing spawned tasks. And so the investigation moved on to the possible cost of using SJLJ vs ZCX. Supposedly ZCX is faster, SJLJ is slower but suppositions are one thing and hard data quite another. To figure out if that's indeed the case, one needs simply to design an experiment for the exact sort of thing investigated and then set to work implementing and running it. Given the docs' claim that SJLJ comes with a signficant penalty for any exception handlers present, the experiment included also runs with and without exception handlers to try and gauge if those add indeed signficantly to the running time. Specifically, there were two main programs used (linked below if you want to reproduce this experiment):
1. Nested procedure calls (with and without exception handlers): there are three procedures A, B, C that increment each a global counter and pseudo-randomly5 call one or both of the other two procedures for as long as the counter is below a certain value. Note that you'll likely need to increase the size of the stack for this experiment since otherwise it will quickly overflow.
2. Nested for loops with Serpent encryption (NO exception handlers): there are between 1 and 22 for loops counting either from 1 to 10 or from 1 to 100 and going to the next level on a subset of values (those divisible by 2, by 3 or by 4 - those are essentially knobs to adjust for different runs); the inner-most loop performs a Serpent encryption.

I ran this experiment on two different computers:

  • S.MG Test server: this is an x86_64 AMD Opteron(tm) Processor 6376 (8 cores, 2.3GHz).
  • A torture-room desktop code-named D1: this is an Intel i5, 4GB RAM (2 cores, 3.2GHz).

The S.MG Test server runs 3 different incarnations of Ave1's GNAT: the ZCX variant which is obtained as a straight compilation using Ave1's scripts from the 15th of July 2018; the SJLJ-compilation of the same (i.e. supporting both dynamic and static linking); the SJLJ compilation of the latest, static-only version from September 2018 .

The D1 machine runs Adacore's GNAT 2016 and simply switches between ZCX and SJLJ via the --RTS parameter given to gprbuild.

The full set of data obtained is this:

S.MG Test server
Ave1's GNAT, build 301 (dynamic linking) and build 202 (static only) for SJLJ, older Ave1's GNAT for ZCX

Procedure Calls testing  with MT as PRNG;
Max = 16777216; 3 procs, mod 3 calls both, one or the other of the 2;
NO serpent calls; X final gives the total number of calls;
Seeding of MT: 16#111#, 16#222#, 16#333#, 16#444#  -> X = 22368144.
NB: as MT is used, X final is repeatable between runs (same seed).

Handlers | ZCX (s)  | SJLJ (s) (build 301) | SJLJ (s) (build 202 - static only)
_______________________________________________________________________________
0        | 1.119    | 1.211                | 1.165
0        | 1.174    | 1.210                | 1.161
1        | 1.299    | 4.106                | 3.850
1        | 1.302    | 3.777                | 3.755
2        | 1.599    | 3.929                | 3.301
3        | 1.862    | 4.125                | 4.193

 Nested Loops testing 
loops 1 to 100; if mod 2
Loops (runs)    | ZCX	(s)    |	Serpent Timing             | SJLJ (s) (b301) | SJLJ (s) (b202 - static ONLY)
_____________________________________________________________________________________________________________________
1 (a)  (1k runs)| 0.000168893  | 0.000168893 / (50^1) = 3.377e-6   |   0.000168199   | 0.000168727
2 (b)  (1k runs)| 0.007213758  | 0.007213758 / (50^2) = 2.8855e-6  |   0.007055130   | 0.007084592
3 (c)  (1k runs)| 0.351611073  | 0.351611073 / (50^3) = 2.81e-6    |   0.352684722   | 0.351471743
4 (d)  (1 run)  | 17.740324000 | 17.740324000 / (50^4) = 2.83e-6   |  17.580107000   | 17.73699000
5 (e)  (1 run)  | 879.95117100 | 879.951171000 / (50^5) = 2.81e-6  | 881.139986000   | 879.7342540

loops 1 to 10; if mod 4
Loops (runs=1)  | ZCX (s)      | Time per Serpent (ZCX)            | SJLJ (s) (b301) | SJLJ (s) (b202 - static ONLY)
_____________________________________________________________________________________________________________________
1               | 0.000008     | 0.000008 / (2^1) = 4e-6           |  0.000010       |  0.000009
2               | 0.000017     | 0.000017 / (2^2) = 4.25e-6        |  0.000016       |  0.000017
3               | 0.000031     | 0.000031 / (2^3) = 3.875e-6       |  0.000030       |  0.000024
4               | 0.000046     | 0.000046 / (2^4) = 2.875e-6       |  0.000057       |  0.000056
5               | 0.000150     | 0.000150 / (2^5) = 4.6875e-6      |  0.000110       |  0.000089
10              | 0.027650     | 0.027650 / (2^10)= 2.7e-5         |  0.002937       |  0.002915
22              | 11.98272     | 11.98272 / (2^22)= 2.8569e-6      | 12.045471       | 12.011244

loops 1 to 10; if mod 3
Loops (runs=1)  | ZCX (s)      | Time per Serpent (ZCX)            | SJLJ (s) (b301) | SJLJ (s) (b202 - static ONLY)
_____________________________________________________________________________________________________________________
1               | 0.000012000  | 0.000012000 / (3^1) = 4e-6        |   0.000013000   |   0.000010000
2               | 0.000034000  | 0.000034000 / (3^2) = 3.778e-6    |   0.000027000   |   0.000033000
3               | 0.000076000  | 0.000076000 / (3^3) = 2.815e-6    |   0.000111000   |   0.000075000
4               | 0.000219000  | 0.000219000 / (3^4) = 2.704e-6    |   0.000221000   |   0.000220000
5               | 0.000654000  | 0.000654000 / (3^5) = 2.691e-6    |   0.000823000   |   0.000673000
10              | 0.167347000  | 0.167347000 / (3^10)= 2.834e-6    |   0.167906000   |   0.167250000
15              | 40.69105200  | 40.69105200 / (3^15)= 2.836e-6    |  40.743956000   |  40.716526000
16              | 121.9171600  | 121.9171600 / (3^16)= 2.832e-6    | 123.760958000   | 121.816518000

D1 computer 
Adacore's GNAT 2016, switching with --RTS between ZCX and SJLJ

Procedure Calls testing  with MT as PRNG;
Max = 16777216; 3 procs, mod 3 calls both, one or the other of the 2;
NO serpent calls; X final gives the total number of calls;
Seeding of MT: 16#111#, 16#222#, 16#333#, 16#444#  -> X = 22368144.
NB: as MT is used, X final is repeatable between runs (same seed).

Handlers | ZCX (s)  | SJLJ (s)
______________________________
0        | 0.896    | 0.882
0        | 0.905    | 0.906
1        | 1.058    | 184.516
1        | 1.064    | 265.339
2        | 1.215    | 372.574
3        | 1.329    | 446.821

 Nested Loops testing 
loops 1 to 100; if mod 2
Loops           | ZCX (s)      | Time per Serpent                  | SJLJ (s)
___________________________________________________________________________________
1 (a)  (1k runs)|   0.000095269|  0.000095269 / (50^1) = 1.905e-6  |   0.000098855
2 (b)  (1k runs)|   0.004609351|  0.004609351 / (50^2) = 1.844e-6  |   0.004598492
3 (c)  (1k runs)|   0.231582664|  0.231582664 / (50^3) = 1.853e-6  |   0.230719036
4 (d)  (1 run)  |  11.548194   | 11.548194 / (50^4)    = 1.848e-6  |  11.579138
5 (e)  (1 run)  | 580.261706   |580.261706 / (50^5)    = 1.857e-6  | 586.996228

loops 1 to 10; if mod 4
Loops (runs=1)  | ZCX (s)      | Time per Serpent                  | SJLJ (s)
_______________________________________________________________________________
1               | 0.000009     | 0.000009 / (2^1) = 4.5e-6         | 0.000013
2               | 0.000023     | 0.000023 / (2^2) = 5.75e-6        | 0.000023
3               | 0.000036     | 0.000036 / (2^3) = 4.5e-6         | 0.000043
4               | 0.000083     | 0.000083 / (2^4) = 5.18e-6        | 0.000100
5               | 0.000167     | 0.000167 / (2^5) = 5.218e-6       | 0.000167
10              | 0.005537     | 0.005537 / (2^10)= 5.4e-6         | 0.007399
22              | 8.007915     | 8.007915 / (2^22)= 1.909e-6       | 7.944913
22              | 7.971711     | 7.971711 / (2^22)= 1.9006e-6      | 7.966864         

loops 1 to 10; if mod 3
Loops (runs=1)  | ZCX (s)      | Time per Serpent                  | SJLJ (s)
_______________________________________________________________________________
1               |  0.000008    |  0.000008 / (3^1)  = 2.6e-6       |  0.000009
2               |  0.000017    |  0.000017 / (3^2)  = 1.8e-6       |  0.000019
3               |  0.000142    |  0.000142 / (3^3)  = 1.75e-6      |  0.000143
4               |  0.000427    |  0.000427 / (3^4)  = 5.27e-6      |  0.000150
5               |  0.001267    |  0.001267 / (3^5)  = 5.21e-6      |  0.001259
10              |  0.107943    |  0.107943 / (3^10) = 1.82e-6      |  0.113296
15              | 27.163182    | 27.163182 / (3^15) = 1.89e-6      | 27.167316
16              | 81.724382    | 81.724382 / (3^16) = 1.89e-6      | 81.951616        

Based on the data above, my conclusion so far is that there is rather precious little to justify the use of ZCX: when no exception handlers are present in the code, the running times are similar under ZCX and SJLJ; when exception handlers are present in the code, there is indeed a penalty for using SJLJ but this penalty is not all that big on some irons. This being said, there is little reason that I can see for having lots of exception handlers in any sane code anyway so I'm really not all that concerned about this honest cost of SJLJ especially when compared with the "you can't kill your own tasks" cost of the so-called ZCX.

Separate from the above but still relevant, there are still quite a few issues remaining with SJLJ including the fact that it's apparently broken on ARM. But every pain in its own time and so for now simply look at the data above and let me know: do you see any real reason why one would *not* simply go with SJLJ?


  1. "Zero-Cost Exceptions" being the full name of this great idea of reducing costs by not doing the work - meaning here by simply choosing to not implement some parts of the Ada standard, namely the asynchronous abort.  

  2. Or its excuse at the very least since rationale might be too much to ask for under those circumstances. 

  3. You can try this yourself: the code is in test_tasks_ada.zip, download, read it, run it and see if the tasks actually abort or not. Try it also with SJLJ (gprbuild --RTS=sjlj) and spot the difference. 

  4. "Setjmp / longjmp" by its name, the older exception handling model that actually implements the Ada standard so it allows asynchronous abort. 

  5. Using the Mersenne-Twister pseudo-random generator

February 7, 2019

Seppuku Job Market: Minimal Dynamic Tasking in Ada

Filed under: Coding — Diana Coman @ 10:44 p.m.

Eulora's server needs a reliable and robust way of performing - preferably in parallel whenever possible - various jobs for all the players that might connect to the game at any given time. Given the parallel requirement, there isn't really any way around the fact that multi-threading is needed. Nevertheless, since multi-threading is by its nature complex enough to give subtle errors and heavy headaches at any time, I'd really much rather make sure any implementation that deals with multiple threads of execution is as small, clear, plain and easy to follow as possible. In other words, if it has to be multi-threaded then it should better be minimal, self-healing, self-adjusting and ruthlessly functional with all and any bells and whistles chucked as far away from it as possible. To drive this point home and keep it in mind at all times1, I'll call this self-reliant unit of the server the Seppuku Job Market or SJM for short.

The list of requirements for the SJM is this:
1. Accept Jobs from all and sundry in a thread-safe manner and execute them in order of their priorities.
2. Generate and kill Worker tasks2 *dynamically* and on an *as-needed basis* to perform jobs as soon as possible but remaining at all times within a pre-set maximum number of Workers3.
3. Creation and destruction of Workers should be reliable and robust: in particular, SJM should run for ever unless explicitly stopped and it should re-spawn Workers as needed, even if they get killed from outside the code (cosmic-ray event or not).
4. Aim to perform jobs in order of their specified priority but taking into account that at most ONE job per player is actually executed at any given time. In other words: do NOT allow a player to hog the whole thing and run as many jobs as they want; this is parallelism aimed to increase the number of players served not merely the number of jobs performed!

Point 1 of requirements hints at the nature of SJM: as it needs to accept jobs from many, unknown sources, it is effectively a "server" of sorts and moreover it is essentially a resource from the point of view of job producers. The best Ada construct that readily fits this description is a protected unit (aka a passive entity that guarantees thread-safe access to the data it encapsulates - in this case to the queue of jobs waiting to be performed). One significant benefit of an Ada protected entity is the fact that it is specifically not a task itself nor is there a task associated with it. Instead, the mutually exclusive access to services provided by a protected unit is ensured by the run-time system and therefore the whole thing has at least one less headache to think of: while Worker tasks may get killed, the SJM itself at least cannot get killed unless the whole program (i.e the main thread of execution of the server itself) gets killed.

Point 2 of requirements (dynamic, self-adjusting number of tasks) means that I'll need to actually create and dispose of tasks programmatically - there is no way to have only statically allocated tasks. In turn, this means that a few restrictions have to go away: No_Allocators, No_Finalization, No_Task_Allocators, No_Tasking, No_Unchecked_Deallocation. The need to drop the No_Finalization and No_Unchecked_Deallocation restrictions comes from the way in which Ada handles memory allocated dynamically even when on the stack. Essentially, dynamically allocated tasks receive memory from a "pool". Once allocated, memory from a pool is reclaimed ONLY when the whole pool goes out of scope or in other words when it can be guaranteed that there is no piece of code left that can actually attempt to access that bit of memory. This is very robust and quite useful of course but in the case of dynamically allocated tasks it means that tasks that finish will STILL effectively occupy memory unless specifically deallocated (with unchecked_deallocation as that's the only way to do it as far as I can tell). In turn, this creates the undesirable but very real and quite horrible possibility that the code will run just fine *until* the pool in which tasks are created runs out of memory because of all previous tasks that finished long time ago but whose space was never reclaimed. To avoid this, the code has to keep track of terminated tasks and explicitly deallocate the memory they occupy before chucking away their pointer and/or re-spawning a replacement Worker (as there is no way to "restart" a task).

Point 3 means that Workers need to be effectively managed based on the evolution of the number of available jobs and the number of Workers themselves. One approach would be of course to have a Supervisor task but the problem then is twofold: first, the Supervisor needs to be aware of changes to the jobs queue as they happen; second, having a Supervisor task creates the potential problem of who supervises the supervisor (esp. with respect to recovery from killed thread since in this case the Supervisor itself might die unexpectedly). Given however that the SJM protected unit effectively guards precisely the jobs queue, it's also in the best position to react promptly to an increase or decrease in jobs and so it follows that it should in fact manage the Workers too. After all, it can do a bit more on receiving a job than merely chucking it into the queue: ideally it would in fact pass it on to a Worker immediately.

While at first sight "take job, spawn Worker, pass it on and let him do it" sounds precisely fine, in practice it's really not fine at all and not least because of the requirement at Point 4: passing a job on to a Worker requires some ordering of jobs (by priority) and even a sort of guarded access to a player since a new job cannot be accepted (and especially cannot be passed on to a Worker for execution) while an existing Worker may still be toiling away on a previous job for the same player. So the SJM needs to find out when a job is finished in order to accept again jobs for that specific player. As always, there are only a few ways to know when something finished: either look for it4 as one rather has to do when Workers are just passive executors of jobs or otherwise expect a signal to be sent back by a more active type of Worker task when it finished the job it had.

This distinction between active and passive Workers (or tasks in general) is quite significant. As passive entities, Workers can at most simply wait to be handed a job or any other signal. Typically, a Worker would be created and handed a job, they would do it and then they would quietly die keeping out of the way of everyone else. This can be a great fit in various cases but I can see several problems with this for Eulora's server: first, Workers cannot be reused even when jobs are available so there is a rather inefficient kill/create overhead5 precisely at busy time when one wants it even less than at any other time; second, the only way for the SJM to find out when a job finished is by a sort of polling i.e. going through the whole set of workers and checking which one is in a terminated state - note that it is not at all clear just *when* should this be done or how would it be triggered (sure, one can use a sort of scheduled event e.g. check it every 3 seconds or some such but it's more of a workaround than addressing the problem); third, the SJM needs to do both Worker creation and Job allocation (i.e. priority ordering + only one job per player at any given time) at the same time and while keeping a job creator waiting.

The first of the above issues (no reuse of Workers) is easily addressed by making Workers active rather than passive: they get created and then they actively ask for a job; once they got a job, they do it and then they go back and report it done, after which they queue again to get another job or perhaps the boot if there are no jobs to be had. And since such active Workers do not finish by default when a task is finished, they need to have rather suicidal tendencies and ask not merely for a job but rather for either a job or permission to seppuku (hopefully in a clean manner though!).

Making Workers active (if suicidal) neatly separates Worker creation from Job allocation: when jobs start pouring in, the SJM can simply create a bunch of Workers and release the job creators before it makes time to actually hand the jobs out to queuing workers. When the jobs keep pouring in, Workers keep working and there's no need to kill them now to only create them a few milliseconds later. Moreover, finished jobs are simply reported and marked as such without any need to poll. In the (hopefully rare) case when a Worker dies unexpectedly before sending the signal that it finished its job, they will be anyway observed sooner or later when the state of Workers is assessed to decide if more or fewer Workers are needed. Essentially the only trouble this approach brings is the added responsibility on the SJM: it controls access to the Job queue for job creators AND for Workers while ALSO effectively managing and keeping track of all Worker-related aspects. But then it's not a Seppuku Job Market for no reason: if it needs to do it, it will have to do it and do it well.

As a proof of concept of the above, I have implemented the SJM precisely as described: as a protected unit that encapsulates a Job queue and manages active Worker tasks, creating and destroying them as needed while also de-allocating memory of any terminated Workers, ensuring that only one Job per player is accepted at any given time and allowing a graceful stop that does not block any job producers that may come at a later time and does not leave dangling Worker tasks either. Jobs are simply record types with a discriminant that specifies their type and therefore the exact form a variable part of the record takes (since each Job type is likely to have specific data structures it requires). Note that I specifically avoided the Object-Oriented option (i.e. tagged type in Ada) with a hierarchy of Job types and reliance on polymorphism for "Complete" to do the right thing depending on the exact type of Job. The reason for this avoidance is mainly that there really isn't much to gain from it as far as I can see at the moment. Similarly, I prefer to not rely on generic containers (for the Job Queue for instance) unless they become clearly and absolutely needed. Finally, I am quite aware of Ada's relevant annexes such as Real-Time Systems and I know that it provides a whole infrastructure of worker pools and jobs with futures even (i.e. a way to provide results at a later time) but they are quite at odds with the significant aim of keeping it all as short6 and clear and easy to follow as possible (not to mention potential issues with the way in which some parts might be implemented using a secondary stack for instance which I specifically do not want to have).

The public part of the EuJobs package is this:

with Interfaces; use Interfaces;
with Data_Structs;
with Ada.Finalization;
with Ada.Unchecked_Deallocation; -- to clean up worker tasks if needed.

package EuJobs is

  pragma Elaborate_Body;

  -- knobs and constants
  Max_Workers : constant Natural := 64;
  Max_Idle_W  : constant Natural := Max_Workers;
  -- max jobs
  Max_Jobs    : constant Natural := Max_Workers * Max_Workers;

  -----------------------WORKERS--------------------
  -- Generic Eulora Workers type: simply perform given Jobs
  subtype Worker_Index is Natural range 1..Max_Workers;
  -- Those are to be FULLY managed (including created/ended) by the Job Market
  -- ACTIVE but suicidal elements:
  --   a worker will keep requesting jobs/permission to seppuku
  --     until allowed to terminate
  -- Pos is a token identifying Worker with the Job Market
  -- NB: ALL workers WILL use this Job Market
  -- NB: do NOT create workers from outside the Job Market!
  task type Worker( Pos: Worker_Index );

  -- needed to dynamically generate Workers
  type Worker_Address is access Worker;
  procedure Free is new Ada.Unchecked_Deallocation(Worker, Worker_Address);

  -- ALL the info that the Job Market holds on workers to manage them
  type Worker_Rec is
    record
      Assigned  : Boolean := False;
      Player_Id : Interfaces.Unsigned_64;
      WA        : Worker_Address;  -- actual pointer to worker
    end record;  

  -- for storing pointers to generated workers including if assigned and id
  type Worker_Array is array( Worker_Index'First ..
                              Worker_Index'Last) of Worker_Rec;

  -- limited controlled type that ensures no dangling workers at Finalize time
  type Controlled_Workers is new Ada.Finalization.Limited_Controlled with
    record
      Workers: Worker_Array;
    end record;

  overriding
  procedure Finalize( S: in out Controlled_Workers );
  overriding
  procedure Initialize( S: in out Controlled_Workers );

  -------------------------------JOBS-------------------
  -- Job types; NB: do NOT map (nor have to) directly on message types!
  type Job_Types is ( Do_Nothing,
                      Print_Job,
                      Create_Acct );

  -- Data structure with relevant information for each type of job
  type Job_Data ( T: Job_Types := Do_Nothing ) is
    record
      -- common information relating to the one requesting this job
      Player_ID: Interfaces.Unsigned_64 := 0;
      Source_IP: Interfaces.Unsigned_32 := 0;
      --NB: this is SOURCE port - reply WILL be sent here, whether RSA or S!
      Source_P : Interfaces.Unsigned_16 := 0;
      -- Message counter, as received
      Counter  : Interfaces.Unsigned_16 := 0;
      Priority : Natural := 0; --lowest possible priority
      case T is
        when Create_Acct =>
          Acct_Info: Data_Structs.Player_RSA;
        when Do_Nothing =>
          null;
        when Print_Job =>
          null;
      end case;
    end record;

  procedure Complete( JD   : in Job_Data );

  subtype Job_Count is Natural range 0..Max_Jobs;
  type Job_Array is array( 1..Max_Jobs ) of Job_Data;
  type Jobs_List is
    record
      Len  : Job_Count := 0;
      JA   : Job_Array;
    end record;

  ---------------------------Job_Market--------------------

  -- FULLY self-managed Job Market for euloran jobs:
  --   -- accepts jobs to do
  --   -- spawns, kills and managed workers that complete the jobs
  -- NB: Job_Market will DISCARD a new job when:
  --    -- it is FULL (i.e. can't handle anymore)
  --    -- it is stopping
  --    -- it already has a job for the same player
  -- Jobs are performed according to specific criteria (not strictly fifo):
  --   - FIFO but ensuring no more than 1 job per player served at any time
  --   - ALSO: there might be other priorities (e.g. type of job)
  protected Job_Market is
    -- adding a new job that needs to be done
    -- this can be ANY derivated type of Job_Data
    -- NB: Added will be true if J was indeed accepted and False otherwise
    entry Add_Job( J     : in Job_Data;
                   Added : out Boolean );

    -- workers request jobs when they are out of work
    -- workers need to provide their token (Pos)
    -- they can get to do: either a job OR seppuku signal.
    procedure Get_Job( Pos    : in Worker_Index;
                       J      : out Job_Data;
                       Seppuku: out Boolean );

    -- workers have to report back when a job is done
    -- (or they get sweeped up eventually if/when they abort).
    procedure Done_Job( Pos: in Worker_Index );

    -- sets in motion the process to stop gracefully:
    --   -- no more jobs received, existing discarded
    --   -- all workers will be given Seppuku signal
    -- NB: NO reverse for this.
    procedure Stop;

    -- for any external supervisors
    -- returns TRUE if it is NOT stopping
    -- returns False if it is stopping
    function Operating(  Waiting_Jobs: out Natural;
                         Idle_Workers: out Natural;
                         Active_Workers: out Natural;
                         Terminated_Workers: out Natural;
                         Is_Full: out Boolean)
      return Boolean;

  private

    -- internal storage of jobs and mgm of workers
    Board  : Jobs_List;

    -- NB: Workers are in the BODY of the package
    --   because they HAVE to be after the body of Finalize

    -- when stopping:
    -- discard new jobs; give out stop on get/done; empty jobs map
    Stopping : Boolean := False;
    Fullboard: Boolean := False;

    -- Retrieves next available job from the Board and returns it in JD
    -- Sets Found to True if an available job was found (i.e. JD is valid)
    -- Sets Found to False (and JD is undefined) if NO available job was found.
    -- NB: this DOES remove the element from the board!
    procedure Get_Available( JD    : out Job_Data;
                             Found : out Boolean );

    -- checks if the given player_id IS currently served by any worker
    function Is_Assigned( Player_ID: in Interfaces.Unsigned_64 )
             return Boolean;

    -- Checks in Board list ONLY if there is a job for this player
    -- Returns True if a job was found (i.e. a job waiting for a worker)
    -- Returns False otherwise.
    -- NB: Player might STILL have a job in progress (assigned to a worker)
    function Has_Waiting_Job( Player_ID: in Interfaces.Unsigned_64 )
             return Boolean;

    -- releases any player_id that might be stuck with aborted workers
    -- *creates* new workers if needed (specific conditions met)
    procedure Manage_Workers;

  end Job_Market;

private

  -- create new Worker with identification token (position) P
  function Create_Worker(P: in Worker_Index)
             return Worker_Address;

end EuJobs;

Workers are very simple tasks with an ID received at creation time to identify them within the Job_Market (very simply by position in the array of Worker addresses). They run a loop in which they request tasks or permission to Seppuku and when they receive either of them they proceed to do as instructed. Perhaps you noticed above that the array of Worker pointers is wrapped inside Controlled_Workers, which is a controlled type. A controlled type in Ada guarantees that the provided Initialize and Finalize routines are run precisely at the stages that their names suggest to enable the type to start off cleanly and to end up cleaning after itself. In the case of Controlled_Workers, the Initialize simply makes sure that the array has all pointers marked as null and moreover as not assigned any tasks while the Finalize goes one more time through the array and finishes off (with abort) any workers that are not null already. Note that the scope of Worker tasks is in fact the package level since the Worker_Address type is declared at this level (and that's how the scope is defined for such types in Ada). You might have noticed also that there is no concrete array of Workers defined anyhere so far: indeed, the array of workers is defined inside the package body for two main reasons: first, it should NOT be accessed by anyone from outside (not even potential children packages at a later time); second, it has to be defined after the bodies of Initialize and Finalize since otherwise it can't be created.

Jobs are barely sketched for now as Job_Data structures with a discriminant to distinguish different types and a variable part for specific data that each type of job needs. The Complete procedure then simply does different things for each type of job in a straightforward manner (at the moment it does something for the print job only for basic testing purposes).

The Job_Market itself is a protected object that offers a handfull of services (aka public entries, procedures or functions): entry Add_Job for job producers to provide their new jobs; procedure Get_Job for Workers who are looking for something to do; procedure Done_Job for Workers who report they finished their previously allocated job; procedure Stop for any higher-level caller who is in a position to turn off the whole Job_Market; function Operating that simply provides information on the current state (i.e. operating or stopping) and status (e.g. number of jobs and workers) of the Job_Market. Note that there are important differences between functions, procedures and entries: functions can only *read* protected data so they are effectively banned from modifying anything, hence Operating being exactly a function as it provides a snapshot of current state and metrics for the Job_Market; procedures can modify data but a call to them is unconditional meaning it gets accepted as soon as the protected object is available and the caller is first in queue for it, without any further restrictions - hence Stop, Done_Job and Get_Job are procedures since there is no constraint on them being called at any time; finally, entries can also modify data but they have entry barriers meaning they accept a call only when certain conditions are met - in this case Get_Job has the simple but necessary condition that either the Job_Market is stopping (in which case callers should not be blocked since it's pointless to wait anyway) or the Job queue is not full since it makes little sense to allow a job producer in just to discard their job for lack of space anyway. Note however that this is merely for completeness here since in practice there will be several other levels of measures taken so that the job queue does NOT become full since that is clearly not a sane way to have the server running.

In addition to the above public services, the Job_Market also has a private part where it keeps the job queue (as a basic array for now - this can easily change at a later time if there is a good reason for the change), a flag to know if it's stopping and one to register if/when the board is full as well as a few helper procedures and functions for its own use. The Get_Available procedure effectively implements the strategy of picking next Job to execute: it's here that priorities are considered really and it's here that there is another check to make sure that no two jobs of the same player are ever executed at the same time. The Is_Assigned procedure checks the set of Workers to see if any of them is performing a job for the specified player. The Has_Waiting_Job on the other hand checks the job queue to see if there is any job from the specified player waiting in the queue. Arguably the most important of those is "Manage_Workers" that does precisely what the name says: it does a headcount of Workers in various states, cleans up any aborted/unexpectedly dead ones, reclaims memory for terminated ones and then, if required, creates new Workers to match the current Job provision. Note that there really are only 64 workers in total (and at any rate this is unlikely to become a huge number) so this headcount of workers is not really terribly costly.

The overall package further has a private function that dynamically creates a new Worker task with the given ID, returning its address. This is more for convenience than anything else since one could easily call new directly so perhaps it will even go away at the next round of trimming the code.

The implementation in eujobs.adb starts with the Initialize and Finalize procedures, declares the Controlled_Workers object and then proceed with the internals of the Job_Market itself:

with Ada.Text_IO; use Ada.Text_IO;

package body EuJobs is

  procedure Finalize( S: in out Controlled_Workers ) is
    -- ALL this needs to do is to make SURE no worker is still running!
  begin
    for I in S.Workers'First .. S.Workers'Last loop
      if S.Workers(I).WA /= null then
        abort S.Workers(I).WA.all;
        S.Workers(I).WA := null;
        S.Workers(I).Assigned := False;
      end if;
    end loop;
  end Finalize;

  procedure Initialize( S: in out Controlled_Workers ) is
  begin
    for I in S.Workers'First .. S.Workers'Last loop
      S.Workers(I).WA := null;
      S.Workers(I).Assigned := False;
    end loop;
  end Initialize;

  -- actual workers slots; workers are managed internally here
  -- this type is needed though, to Finalize properly
  CW: Controlled_Workers;

  protected body Job_Market is
    -- adding a new job that needs to be done
    -- this can be ANY derivated type of Job_Data
    entry Add_Job( J     : in Job_Data;
                   Added : out Boolean )
      when Stopping or    --to unblock producers
           (not Fullboard) is
    begin
      -- if stopping, discard job -- allows callers to finish too...
      -- check Player_ID and add job ONLY if none exist for this player
      if (not Stopping) and
         (not Is_Assigned(J.Player_ID)) and
         (not Has_Waiting_Job(J.Player_ID)) then
        -- board is known to have space, so add to it
        Board.JA(Board.JA'First + Board.Len) := J;
        Board.Len := Board.Len + 1;

        -- job added may mean full board
        FullBoard := Board.Len >= Board.JA'Last;

        -- Quick worker management to adjust if needed
        Manage_Workers;
        -- Let caller know that job was indeed added
        Added := True;
      else
        Added := False; --not added, aka discarded
      end if;
    end Add_Job;

    -- workers request jobs or seppuku when they are out of work
    procedure Get_Job( Pos    : in Worker_Index;
                   J      : out Job_Data;
                   Seppuku: out Boolean ) is
      Found : Boolean;
    begin
      if Stopping then
        -- when stopping: all seppuku
        Seppuku := True;
      else
        -- try first to get some job that should be done
        Get_Available(J, Found);
        if (not Found) then
          Seppuku := True; --since no job is available..
        else
          -- have a job so no seppuku for now
          Seppuku := False;
          -- update Worker record to mark player as being served etc.
          CW.Workers(Pos).Assigned := True;
          CW.Workers(Pos).Player_ID := J.Player_ID;
          -- this SURELY means board is NOT full!
          Fullboard := False;
        end if;
      end if;
      -- LAST: manage workers in ANY CASE!
      Manage_Workers;
    end Get_Job;

    -- workers have to report back when a job is done
    procedure Done_Job( Pos: in Worker_Index ) is
    begin
      -- update record for this worker and let him go
      CW.Workers(Pos).Assigned := False;
    end Done_Job;

    -- aim to stop gracefully:
    --   -- no new jobs stored, existing discarded, workers killed.
    -- NB: NO reverse for this.
    procedure Stop is
    begin
      Stopping := True; -- NO need for anything else, really
    end Stop;

    function Operating(  Waiting_Jobs: out Natural;
                         Idle_Workers: out Natural;
                         Active_Workers: out Natural;
                         Terminated_Workers: out Natural;
                         Is_Full: out Boolean)
      return Boolean is
    begin
      Waiting_Jobs := Natural( Board.Len );
      Is_Full := Fullboard;
      Idle_Workers := 0;
      Active_Workers := 0;
      Terminated_Workers := 0;

      for I in CW.Workers'Range loop
        if CW.Workers(I).WA /= null then
          if CW.Workers(I).WA'Terminated then
            Terminated_Workers := Terminated_Workers+1;
          elsif CW.Workers(I).Assigned then
            Active_Workers := Active_Workers + 1;
          else
            Idle_Workers := Idle_Workers + 1;
          end if;
        end if;
      end loop;
      return (not Stopping);
    end Operating;

    -- anything needed for external load checking (?)

--private stuff

    procedure Get_Available( JD    : out Job_Data;
                             Found : out Boolean ) is
      Pos   : Job_Count;
      P     : Natural := 0; --priority of job found so far
    begin
      Found := False;
      -- ALWAYS walk the FULL set: higher priority might have come in later
      for I in 1 .. Board.Len loop
        if ( (not Found) or (Board.JA(I).Priority > P) ) and
           (not Is_Assigned(Board.JA(I).Player_ID) ) then
          Found := True;
          Pos   := I;
          P     := Board.JA(I).Priority;
          -- but don't copy just yet, as there might be higher priority further
        end if;
      end loop;
      -- retrieve the found job data but ONLY if found!
      if Found then
        JD := Board.JA(Pos);
        -- if not last job, shift to avoid gaps in the array
        if Pos < Board.Len then
          Board.JA(Pos..Board.Len-1) :=
              Board.JA(Pos + 1 .. Board.Len);
        end if;
        -- update count of jobs in the array
        Board.Len := Board.Len -1;
      end if;
    end Get_Available;

    function Is_Assigned( Player_ID: in Interfaces.Unsigned_64 )
             return Boolean is
      Found: Boolean := False;
    begin
      -- walk the array of workers and check
      for I in CW.Workers'Range loop
        if CW.Workers(I).WA /= null and
           CW.Workers(I).Assigned and
-- Will have to rely on .assigned being SET properly by the manager!
--  (not CW.Workers(I).WA'Terminated) and
           CW.Workers(I).Player_ID = Player_ID then
          -- found it!
          Found := True;
          exit;
        end if;
      end loop;
      return Found;
    end Is_Assigned;

    function Has_Waiting_Job( Player_ID: in Interfaces.Unsigned_64 )
             return Boolean is
      Found: Boolean := False;
    begin
      for I in Board.JA'First .. Board.JA'First + Board.Len loop
        if Board.JA(I).Player_ID = Player_ID then
          Found := True;
          exit;
        end if;
      end loop;
      return Found;
    end Has_Waiting_Job;

    procedure Manage_Workers is
      Active_W: Natural := 0;
      Idle_W  : Natural := 0;
      Total_W : Natural := 0;
      To_Create: Natural:= 0;
    begin
      -- release player ids if workers terminated
      -- count also precisely how many are active
      for I in CW.Workers'Range loop
        if CW.Workers(I).WA /= null then
          if CW.Workers(I).WA'Terminated then
            -- this terminated abnormally -> LOG?
            CW.Workers(I).Assigned := False;
            -- claim this space to restart a worker here if needed
            --CW.Workers(I).WA := null;
            -- deallocate it too as otherwise memory space slowly gets lost
            -- NB: Free proc sets it to null anyway
            Free(CW.Workers(I).WA);

          --if NOT null and NOT terminated-> idle or active
          elsif CW.Workers(I).Assigned then
              -- this is an active worker, count it
              Active_W := Active_W + 1;
          else
            -- this is an idle worker, count it
            Idle_W := Idle_W + 1;
          end if;
          -- null workers are simply empty spaces, no need to count them
        end if;
      end loop;
      -- calculate total workers
      Total_W := Active_W + Idle_W;

      if (not Stopping) and
         (Board.Len > Total_W) and
         (Total_W < Max_Workers ) and
         (Idle_W = 0) then
        -- need (perhaps) to create workers: how many?
        To_Create := Board.Len - Total_W;

        -- create them for as long as there is ANY space..
        -- NB: MORE workers MIGHT have terminated meanwhile,
        -- but they won't be null!
        for I in CW.Workers'Range loop
          if CW.Workers(I).WA = null then
            -- found a place, so create a worker
            CW.Workers(I).Assigned := False;
            CW.Workers(I).WA := Create_Worker(I);
            To_Create := To_Create - 1;
            Total_W := Total_W + 1;

            if To_Create <= 0 or Total_W >= Max_Workers then
              exit;
            end if;
          end if;
        end loop;
      end if;
     end Manage_Workers;

  end Job_Market;

  -- Worker body
  task body Worker is
    JD      : Job_Data;
    Seppuku : Boolean := False;
  begin
    -- main Loop: get a job or die, work and repeat.
    Work_Loop:
    loop
      -- ask the Job Market for a job or permission to seppuku
      Job_Market.Get_Job( Pos, JD, Seppuku );

      if Seppuku then
        exit Work_Loop;
      else
        -- do the job
        EuJobs.Complete( JD );
        -- report job done
        Job_Market.Done_Job( Pos );
      end if;
    end loop Work_Loop;
    -- worker is done and will die gracefully!
  end Worker;

  -- Jobs themselves
  procedure Complete( JD   : in Job_Data ) is
    Stop: Boolean;
  begin
     -- do different things for different types of jobs...
    case JD.T is
        when Create_Acct =>
          --Acct_Info: Data_Structs.Player_RSA;
          Stop := False;
        when Set_SKeys =>
          -- SKes: Data_Structs.Serpent_Keyset;
          Stop := False;
        when Mgm_SKeys =>
          --SMgm: Data_Structs.Keys_Mgm;
          Stop := False;
        when Print_Job =>
          Put_Line("Completing: job counter " &
                   Interfaces.Unsigned_16'Image(JD.Counter) &
                   " priority " & Natural'Image(JD.Priority) &
                   " for player " &
                   Interfaces.Unsigned_64'Image(JD.Player_ID) &
                   " from IP:P " & Interfaces.Unsigned_32'Image(JD.Source_IP) &
                   ":" & Interfaces.Unsigned_16'Image(JD.Source_P));
        when others =>
          -- no job or dubious at best, better stop.
          Stop := True;
    end case;
  end Complete;

  function Create_Worker(P: in Worker_Index)
             return Worker_Address is
  begin
    return new Worker(P);
  end;

end EuJobs;

Your thoughts, observations and critiques on the above are welcome below in the comments section. If there is a problem with the above approach or with the code itself I really want to hear of it sooner rather than later since it's of course easier to do something about it now - this is after all the whole reason why I'm publishing this proof of concept so go ahead and point out any faults you see.


  1. Also to reflect some suicidal tendencies of my Workers but that becomes clearer later. 

  2. "Threads" if you prefer non-Ada terminology. 

  3. There isn't much point in having more Workers than your underlying iron can actually support 

  4. blocking until it's done or checking at some intervals 

  5. Ada's documentation claims that dynamic creation of a task has a big overhead anyway so it's best avoided whenever possible but I can't say I have any idea just what "big overhead" means here. 

  6. The full .ads + .adb code+comments shown below is 500 lines, it uses no secondary stack, no heap and no containers or other similar external packages. Even the "use Ada.Text_IO" will go away as it's in there now strictly to allow the Print job to be seen as it completes for testing purposes. 

January 12, 2019

Compiling Ada Library for Use with Non-Ada Main

Filed under: Coding — Diana Coman @ 5:40 p.m.

Following a rather bumpy road of compilation troubles trying to link an Ada lib into a CPP main program, I found first a working solution to the task at hand and then a headache trying to disentangle the confusion of what is exactly a "standalone encapsulated dynamic" library, why is it needed and how exactly does it differ on the initialization front from a boring static library. Fortunately it turns out that my headache was due mainly to all that bumping into walls combined with the rather confusing terms used in .gpr files - there was at least nothing that a good dose of hands-on experimentation and several re-reading of the GNAT docs couldn't cure! Still, as I'd rather not repeat the whole process next time I need to mix Ada with others, I'll summarise here my notes on the options I found for compiling an Ada library1 so that it can be safely used from a non-Ada main program.

To use Ada code from a non-Ada main program, one needs to find a way to actually start the Ada run-time environment *before* any calls to Ada code. The Ada run-time does the crucial task of elaboration of Ada code (i.e. getting everything ready for executing code, so broadly speaking it takes care of initializing variables and constants as well as running any code found in the main body of packages that are used). Since elaboration is a concept entirely specific to Ada, there is no way to rely on the non-Ada main code (C, C++ or whatever it might be) to take care of this. Instead, the solution is to make sure that the Ada library itself contains and exposes an initialization procedure that does exactly this: starts the Ada run-time and performs the required elaboration for the library code. Once this exists, the non-Ada code simply has to make sure it calls this initialization procedure *before* calling *any* Ada code from that library and that's all2. This much was clear from the beginning - it's from here on that the headache and confusion started since not ALL Ada libraries actually contain/expose such an initialization routine. Essentially, in addition to the usual classification of libraries into static or dynamic, Ada has another parallel classification: standalone or not! And asking gprbuild to produce a standalone library is NOT done by using the "Library_Standlone" option but by defining an... interface for the library via the "Library_Interface" option in the .gpr file. Specifically, from the beginning:

  • To use an Ada library from a non-Ada main program, one needs to compile the library as "standalone". The standalone type of Ada library is the only one that contains and exposes for outside use an initialization routine that will start the Ada run-time and perform all elaboration tasks required for the library itself. NB: the initialization routine will be called libnameinit so if the library is called "eunet" then the routine will be "eunetinit".
  • To create a standalone Ada library with gprbuild, the corresponding .gpr file has to include the option "Library_Interface" that lists the packages that are actually exposed for use from outside the library. This option is enough by itself to obtain a standlone library and therefore to have the initialization routine! NB: you can build a standalone library as static or dynamic, as you want, simply specifying the kind, via "Library_Kind" - in both cases, the resulting .a or .so file will contain the initialization routine. For example:
      for Object_Dir use "obj";
      for Library_Dir use "lib";
      for Library_Name use "eunet";
      for Library_Kind use "static";
      for Library_Interface use ("Eunet", "Raw_Types");
    
  • Standalone libraries have subtypes too and it is actually the subtype that is specified via the option "Library_Standalone" in a .gpr file! According to GNAT's user guide, the Library_Standalone can take 3 values: standard (default), no, encapsulated.
    • The "standard" is the option used if your .gpr file does not even mention "Library_Standalone" (but DOES mention "Library_Interface"!) and it means that the initialization routine is contained and exposed.
    • The "encapsulated" option means in addition that the library will depend only on static libraries except for system libraries - so this option will effectively pull in everything the library needs, including the GNAT run-time. This makes for a significantly *easier* use and linkage further downstream BUT it forces the Library_Kind to... "dynamic". I could NOT find out any clear explanation as to WHY this is so but if I'm to guess I'd say it's probably a way of "protecting" users so that they don't encapsulate the Ada run-time in 10 separate libraries and then use all of them in the same program or something.
    • Finally, the "no" option means - surprisingly! - what you'd expect: the library is NOT to be a standalone library after all (and "Library_Interface" be damned)!
  • Summarizing the messy interplay between Library_Interface and Library_Standalone above: you can have a static or dynamic standalone library as long as you leave "Library_Standalone" option alone; you can have only a dynamic standalone library if you actually want to include the GNAT run-time. This item is called "encapsulated standalone library" and means that "Library_Standalone" is set to "encapsulated". You can - unclear why/when is it useful - specify explicitly that you do NOT want a standalone library by setting "Library_Standalone" to "no". Essentially the Library_Standlone chooses between "types" of standalone that include standard, encapsulated or... not standalone at all. I still get slightly nauseaous.
  • Assumming you did go for one sort or another of standalone library, there is a further option to ask the library to "automatically" run its initialization. This is done via "Library_Auto_Init" option being set to "true" (the default value). However, this is the sort of gun that can easily explode in your face since the actual behaviour is platform dependent so you can't rely on it for anything. As a result, I'd say this is best set to "false" clearly and explicitly so that one is not lulled into the idea that someone else will do the initialization auto-magically.
  • If you build a static standalone library, note that its linking into the main program requires also the linking of GNAT runtime as a minimum. The exact things you need depend on what your library really uses but things can get quite gnarly. For instance3 the line for a basic main.cpp test that does ~nothing but it does it with the whole smg_comms + some glue for handling net stuff of Eulora's client:

    gcc main.cpp -o main.o lib/libeunet.a
    -Wl,-z,origin,-rpath,/home/eu-test/eulora/eunet/c_wrappers/bin/:/home/eu-test/eulora/eunet
    /rsa/bin/:/home/eu-test/eulora/eunet/mpi/bin/
    -ldl -lpthread -lrt -L/home/eu-test/eulora/eunet/c_wrappers/bin/ -lC_Wrappers
    -L/home/eu-test/eulora/eunet/rsa/bin/ -lRSA -L/home/eu-test/eulora/eunet/mpi/bin/
    -lMPI
    /home/eu-test/x86_64-linux-musl-native/lib/gcc/x86_64-linux-musl/4.9.4/adalib/libgnarl.a
    /home/eu-test/x86_64-linux-musl-native/lib/gcc/x86_64-linux-musl/4.9.4/adalib/libgnat.a
    

On the bright side, the investigation that resulted in the above notes means that I'm now satisfied that I can in fact link an eunet Ada library both with Eulora's client (that links mainly with dynamic libs so possibly easier as encapsulated standalone) and with code that runs on GNAT with static libs only. In addition, I certainly got also a much better understanding of Ada's elaboration and elaboration order and how it should be handled for safe use of tasks from within a library. But a set of notes for that might be the topic for another time!


  1. even a rather complex one with tasks that start at elaboration time among other things 

  2. In principle there is also a symmetric finalization procedure that is to be called at the very end by the non-Ada code: this one shuts down the Ada run-time but in practice it doesn't seem to be required all that often. 

  3. Isn't "libgnarl.a" such a great name? 

December 17, 2018

SMG Comms Chapter 13: Sender and Receiver

Filed under: Coding, SMG_Comms — Diana Coman @ 1:39 p.m.

~ This is a work in progress towards an Ada implementation of Eulora's communication protocol. Start with Chapter 1.~

This chapter adds to SMG Comms a thin wrapper package that effectively rescues the queue of UDP messages from the IP stack (where it's relatively small) into memory (where it can be, by comparison, large). Once the decision has been clearly made as to what the sender/receiver should do and moreover I finally seem to have gotten my head around using Ada's threads1, the implementation is deliciously straightforward. Who could have predicted this ?! Let's see directly the new Snd_Rcv package as it's very easy to read indeed:

 --Sender and Receiver task types for Eulora's Communication Protocol
 --This is a THIN layer on top of UDP lib, mainly to move messages out
 -- of the small queue of the IP stack onto a bigger, in-memory queue.
 --There is NO processing of messages here: just read/write from/to UDP.
 --S.MG, 2018

with Interfaces;
with Msg_Queue;
with UDP;

generic
  -- exact length of payload aka whether RSA or Serpent
  Len: in Positive;

package Snd_Rcv is
  -- queue package with specified payload length
  package M_Q is new Msg_Queue( Payload_Len => Len);

  -- outbound and inbound messages queues
  -- those are meant to be accessed from outside the package too!
  out_q : M_Q.Queue;
  in_q  : M_Q.Queue;

  -- sender type of task: takes msgs out of out_q and sends them via UDP
  task type Sender( Port: Interfaces.Unsigned_16);

  -- receiver type of tasks: reads incoming msgs from UDP and puts them in in_q
  task type Receiver( Port: Interfaces.Unsigned_16);

private
  -- udp lib package with specified payload length
  package M_UDP is new UDP( Payload_Size => Len);

end Snd_Rcv;

As it can be seen above, the package simply packs in one place an outbound message queue (out_q), an inbound message queue (in_q) and the definitions of two types of tasks: the Sender and the Receiver. The two queues act effectively as mailboxes: all and any tasks from anywhere else are invited to just drop their outbound packages into out_q and /or get incoming packages from in_q. Note that both those queues are thread-safe so there is no concern here over who tries to read/write and when - at most, a task may end up blocked waiting on an empty queue (when trying to read a message) or on a full queue (when trying to write a message).

If the two out_q and in_q are mailboxes, then the two types of tasks, Sender and Receiver, are postmen. They share the same underlying UDP package that is private here (only postmen are allowed to use the UDP post van!) and has a fixed size of messages. Note that this fixed size is given as a parameter to the Snd_Rcv package itself and is then used both for the queues and for the UDP package. Essentially the snd_rcv package is a postal service that handles just one pre-defined length of messages. An application may use of course as many different lengths of message it requires - all it needs to do is to create a different snd_rcv package (i.e. "postal service") for each of those lengths. Note also that the actual ports used by the Sender and Receiver are given as parameters - an application can create as many Sender/Receiver tasks as it wants and even bind them to different ports or to the same port, as desired. This gives maximum flexibility: an application can listen for messages on one port and send them out on a different port, while still having everything in one single queue; or it can listen and send through the same port via any number of Sender/Receiver tasks. Each Sender and Receiver task will simply bind its own local socket and then enter an endless loop in which the Sender picks up messages from the out_q and sends them through its socket via UDP lib, while the Receiver picks up messages from its socket via the UDP lib and writes them into in_q. The corresponding code is short (and it's made slightly longer by my choice of having each Sender/Receiver use its own local socket):

 -- S.MG, 2018

package body snd_rcv is
  -- sender
  task body Sender is
    E       : M_UDP.Endpoint;
    S       : M_UDP.Socket;
    Payload : M_Q.Payload_Type;
    Dest    : M_UDP.Endpoint;
  begin
    -- open the socket on local interface, specified port
    E.Address := M_UDP.INADDR_ANY;
    E.Port := Port;
    M_UDP.Open_Socket( S, E );

    -- infinite loop reading from out queue and sending via udp
    -- caller will have to call abort to stop this!
    loop
      out_q.Get( Payload, Dest.Address, Dest.Port);
      M_UDP.Transmit( S, Dest, Payload);
    end loop;
  end Sender;

  -- receiver
  task body Receiver is
    E      : M_UDP.Endpoint;
    Source : M_UDP.Endpoint;
    S      : M_UDP.Socket;
    Payload: M_Q.Payload_Type;
    Valid  : Boolean;
  begin
    -- open the socket on local interface, specified port
    E.Address := M_UDP.INADDR_ANY;
    E.Port := Port;
    M_UDP.Open_Socket( S, E );

    -- infinite loop reading from out udp and writing to inbound queue
    -- caller will have to call abort to stop this!
    loop
      M_UDP.Receive( S, Source, Payload, Valid);
      -- store ONLY if valid, otherwise discard
      if Valid then
        in_q.Put( Payload, Source.Address, Source.Port);
      end if;
    end loop;

  end Receiver;

end snd_rcv;

An alternative approach to the above (and one that I have implemented at first) was to have a single task Snd_Rcv that bound one single socket and then started on it its own sub-tasks for the actual sender and receiver. However, I find such an approach needlessly complicated and inflexible: it creates an additional layer in the hierarchy of tasks for no clear benefit (perhaps it would make sense if one added some sort of additional management of the sender/receiver tasks in there but at the moment it's unclear that any such thing is actually needed or needed here of all places); it is harder to read with the single and so far unconvincing benefit of a shared socket (so no repeated binding code); it forces some choices on any application using this package: the sender/receiver are forced as a package so there is no more option of just listening on a port and/or just sending on it; there is also no option of listening on one port and sending on another or indeed of creating - if needed - more senders than receivers or the other way around. Sure, it can be argued that several senders and receivers are anyway not likely to be required or that binding too many is likely to just increase packet loss or any other trouble. This is however up to higher levels of the application rather than the concern of this thin sender/receiver and since this implementation offers both highest flexibility AND highest clarity, I think it's the best option so far. As usual, feel free to let me know in the comments your reasons for disagreeing with this and your better solution for implementing a sender/receiver layer.

The above tiny amount of code would be all for this chapter if it weren't for 3 things: the need to relax yet another few restrictions; an example/test of using the above sender/receiver package; my decision to include the UDP lib as just another package of SMG comms rather than keeping it as a separate lib. This last part concerning the UDP lib accounts for most lines in the .vpatch and is essentially some noise at this level (since vdiff is not yet bright enough to figure out a move of files). The reason for it is mainly the fact that the UDP code is really meant to be used from this snd_rcv package and from nowhere else so I can't quite see the justification in keeping it entirely separate, with a .gpr file and everything else of its own and moreover - perhaps more importantly from a practical point of view - unable to directly use the basic types of smg_comms in raw_types. Note that this move does *not* make it in any significant way more difficult to replace this UDP code with another at a later time if that becomes available - it's still one package and those pesky C files, nothing else.

Going back to the need to relax a few restrictions - those are mainly restrictions related to the use of tasks. As both Sender and Receiver work in infinite loops2, the caller has to ruthlessly abort them when it needs them to finish (in Ada a task has to wait for all its sub-tasks to finish before it can finish itself). So the "No_Abort_Statements" restriction needs to go. The use of Abort is illustrated in the test/example code I wrote aka test_client and test_server. Similarly, because of the queues that use protected objects, the "No_Local_Protected_Objects" restriction had to go too. Here I must say that I am not sure I fully grasp why would it be better to have protected objects only as global rather than local? They are of course meant to be accessed from many places and therefore in "global" but this doesn't mean that they don't still belong somewhere and/ or that "access from several places" has to mean "access from ALL places". Finally, the restriction "No_Nested_Finalization" also had to go to allow the testing code to create the snd_rcv packages with different length of messages.

The testing code itself provides more of an example of using the snd_rcv package rather than a test as such since UDP communications are unreliable and therefore one can't really say in advance what one should get on the other side of the connection. At any rate, the test_server package provides an example of a basic "echo server" end of the connection: there are 2 Sender and 2 Receiver tasks working with Serpent-length and RSA-length packages on 2 different ports, respectively; there is also a "consumer" task for each type of package, simply taking it out of the inbound queue, printing it at the console and then echoing it back to the source aka writing it into the outbound queue for the Sender to send. The example awaits for a pre-defined total number of packages so it may remain waiting if the other end sends fewer packages or fewer packages make it all the way. At any rate, once all the expected messages are received, the whole application (aka the main task) simply aborts all the tasks it created and then finishes itself:

 -- S.MG, 2018
with Ada.Text_IO; use Ada.Text_IO;
with Interfaces;
with Snd_Rcv;
with Raw_Types;

procedure Test_Server is
  PortRSA: Interfaces.Unsigned_16 := 44340;
  PortS  : Interfaces.Unsigned_16 := 44341;
  N_S    : Interfaces.Unsigned_8 := 105;
  N_RSA  : Interfaces.Unsigned_8 := 82;
  package Snd_Rcv_RSA is new Snd_Rcv(Raw_Types.RSA_Pkt'Length);
  package Snd_Rcv_S is new Snd_Rcv(Raw_Types.Serpent_Pkt'Length);

  -- sender/receiver tasks --
  -- sender RSA and Serpent
  Sender_RSA: Snd_Rcv_RSA.Sender( PortRSA );
  Sender_S  : Snd_Rcv_S.Sender( PortS );
  -- receiver RSA and Serpent
  Receiver_RSA: Snd_Rcv_RSA.Receiver( PortRSA );
  Receiver_S: Snd_Rcv_S.Receiver( PortS );

  -- Serpent Consumer
  task s_cons is
    Entry Finish;
  end s_cons;
  task body s_cons is
    Payload: Raw_Types.Serpent_Pkt;
    A: Interfaces.Unsigned_32;
    P: Interfaces.Unsigned_16;
  begin
    for I in 1..N_S loop
      -- consume one message and echo it back
      Snd_Rcv_S.in_q.Get(Payload, A, P);
      Put_Line("S msg " &
               Interfaces.Unsigned_8'Image(Payload(Payload'First)) &
               " from " & Interfaces.Unsigned_32'Image(A) &
               ":" & Interfaces.Unsigned_16'Image(P));
      -- echo it back
      Snd_Rcv_S.out_q.Put(Payload, A, P);
    end loop;

    accept Finish;
    Put_Line("S Cons got the finish.");
  end s_cons;

  -- RSA Consumer
  task rsa_cons is
    Entry Finish;
  end rsa_cons;
  task body rsa_cons is
    Payload: Raw_Types.RSA_Pkt;
    A: Interfaces.Unsigned_32;
    P: Interfaces.Unsigned_16;
  begin
    for I in 1..N_RSA loop
      -- consume one message and echo it back
      Snd_Rcv_RSA.in_q.Get(Payload, A, P);
      Put_Line("RSA msg " &
               Interfaces.Unsigned_8'Image(Payload(Payload'First)) &
               " from " & Interfaces.Unsigned_32'Image(A) &
               ":" & Interfaces.Unsigned_16'Image(P));
      -- echo it back
      Snd_Rcv_RSA.out_q.Put(Payload, A, P);
    end loop;

    accept Finish;
    Put_Line("RSA Cons got the finish.");
  end rsa_cons;

begin
  Put_Line("Test server");
  -- wait for consumers to finish
  rsa_cons.Finish;
  s_cons.Finish;

  -- abort the sender & receiver to be able to finish
  abort Sender_S, Receiver_S, Sender_RSA, Receiver_RSA;
end Test_Server;

Similarly to the server example code above, an example client sends both RSA and Serpent packages and has consumer and producer tasks for both:

 -- S.MG, 2018
with Snd_Rcv;
with Interfaces;
with Ada.Text_IO; use Ada.Text_IO;
with Raw_Types;
with UDP;

procedure Test_Client is
  PortRSA   : Interfaces.Unsigned_16 := 34340;
  PortS     : Interfaces.Unsigned_16 := 34341;
  N_S       : Interfaces.Unsigned_8 := 105;
  N_RSA     : Interfaces.Unsigned_8 := 82;

  Server    : String := "127.0.0.1";
  package test_udp is new UDP(10);
  ServerA   : Interfaces.Unsigned_32 :=
                test_udp.IP_From_String(Server);
  ServerRSA : Interfaces.Unsigned_16 := 44340;
  ServerS   : Interfaces.Unsigned_16 := 44341;
  package Snd_Rcv_RSA is new Snd_Rcv(Raw_Types.RSA_Pkt'Length);
  package Snd_Rcv_S is new Snd_Rcv(Raw_Types.Serpent_Pkt'Length);
  -- sender RSA and Serpent
  Sender_RSA: Snd_Rcv_RSA.Sender( PortRSA );
  Sender_S  : Snd_Rcv_S.Sender( PortS );
  -- receiver RSA and Serpent
  Receiver_RSA: Snd_Rcv_RSA.Receiver( PortRSA );
  Receiver_S: Snd_Rcv_S.Receiver( PortS );

  -- producer of serpent messages
  task s_prod is
    entry Finish;
  end s_prod;
  task body s_prod is
    Payload : Raw_Types.Serpent_Pkt := (others => 10);
  begin
    Put_Line("S Producer with " &
             Interfaces.Unsigned_8'Image(N_S) & "messages.");
    -- send the messages with first octet the number
    for I in 1..N_S loop
      Payload(Payload'First) := I;
      Snd_Rcv_S.out_q.Put( Payload, ServerA, ServerS);
      Put_Line("Sent S message " &
                Interfaces.Unsigned_8'Image(I));
    end loop;

    -- signal it's done
    accept Finish;
    Put_Line("S prod got the finish.");

  end s_prod;

  -- producer of RSA messages
  task rsa_prod is
    Entry Finish;
  end rsa_prod;
  task body rsa_prod is
    Payload : Raw_Types.RSA_Pkt := (others => 20);
  begin
    Put_Line("RSA Producer with " &
             Interfaces.Unsigned_8'Image(N_RSA) & "messages.");

    -- send the messages with first octet the number
    for I in 1..N_RSA loop
      Payload(Payload'First) := I;
      Snd_Rcv_RSA.out_q.Put( Payload, ServerA, ServerRSA);
      Put_Line("Sent RSA message " &
                Interfaces.Unsigned_8'Image(I));
    end loop;

    -- signal it's done
    accept Finish;
    Put_Line("RSA prod got the finish.");

  end rsa_prod;

  -- Serpent Consumer
  task s_cons is
    Entry Finish;
  end s_cons;
  task body s_cons is
    Payload: Raw_Types.Serpent_Pkt;
    A: Interfaces.Unsigned_32;
    P: Interfaces.Unsigned_16;
  begin
    for I in 1..N_S loop
      -- consume one message
      Snd_Rcv_S.in_q.Get(Payload, A, P);
      Put_Line("S msg " &
               Interfaces.Unsigned_8'Image(Payload(Payload'First)) &
               " from " & Interfaces.Unsigned_32'Image(A) &
               ":" & Interfaces.Unsigned_16'Image(P));
      -- do NOT echo it back
    end loop;

    accept Finish;
    Put_Line("S Cons got the finish.");
  end s_cons;

  -- RSA Consumer
  task rsa_cons is
    Entry Finish;
  end rsa_cons;
  task body rsa_cons is
    Payload: Raw_Types.RSA_Pkt;
    A: Interfaces.Unsigned_32;
    P: Interfaces.Unsigned_16;
  begin
    for I in 1..N_RSA loop
      -- consume one message
      Snd_Rcv_RSA.in_q.Get(Payload, A, P);
      Put_Line("RSA msg " &
               Interfaces.Unsigned_8'Image(Payload(Payload'First)) &
               " from " & Interfaces.Unsigned_32'Image(A) &
               ":" & Interfaces.Unsigned_16'Image(P));
      -- do NOT echo back
    end loop;

    accept Finish;
    Put_Line("RSA Cons got the finish.");
  end rsa_cons;
begin
  Put_Line("Test client");
  -- wait for producers/consumers to finish
  rsa_prod.Finish;
  s_prod.Finish;
  rsa_cons.Finish;
  s_cons.Finish;

  -- abort the sender & receiver to be able to finish
  abort Sender_S, Receiver_S, Sender_RSA, Receiver_RSA;
end Test_Client;

One important issue to note here is the way in which exceptions (hence: potential issues) will be handled in this specific implementation of the Snd_Rcv package: since the Sender and Receiver are tasks and don't handle any exceptions themselves, it means that an UDP "eggog" aka exception3 will have as effect the silent death of the Sender/Receiver in which it happens4. I did consider ways of handling such exceptions rather than letting them kill the task silently but so far at least I don't quite see what the task itself can do other than re-trying whatever it was trying to do when it failed. While this could perhaps be considered a better option than not handling exceptions at all, it's been pointed to me that UDP errors mean almost always some hardware failure and as such a re-try is not going to help at all. Moreover, re-trying means also that the failure remains hidden from the calling task since there is no way in which one would be able to tell whether a task is just stuck re-trying or actually proceeding with its work just fine. Considering all this, I decided to leave it for now to the higher level task to monitor its subtasks if/when desired and take action accordingly (e.g. check perhaps periodically if a Sender/Receiver is Terminated or in other abnormal state). This doesn't mean of course that the code can't be changed at a later date to provide a different approach to handling this - all it means is that currently this is the best decision I can see given what I know so far.

With this chapter, the SMG Comms code provides everything that is needed to build on top of it a basic client for Eulora that is compliant with the published communication protocol. So I'd suggest to anyone interested in this to give it a go since starting now means that they would have some time to tinker with it before everything else is in place! At any rate, the SMG Comms series takes at least a break for now (at a later stage there should be a few bits and pieces to add) as I'll focus for a while more on the server-side moving parts that need to be done before Eulora can finally fully work on a sane protocol. The full .vpatch for this chapter and my signature for it:


  1. It's a pleasure to use Ada's implementation of threads aka tasks. But it still can take me a while to be able to say I get something to some reasonable degree - even when the something in question is actually a good implementation of threads, what can I do. 

  2. And note that even if I'd implement this with a select to allow some sort of "Shutdown" option, the task could still be stuck waiting on a queue or on UDP since the calls to those are blocking when there is nothing to retrieve yet/no place to write to; so the caller would STILL have to do an abort or at least to be prepared to do abort if the shutdown is not obeyed within some interval. In a nutshell, a shutdown option would therefore still not work at *all times* and as a result I can't quite see why bother with it really. 

  3. There are currently 7 of those with clear names: UDP_Invalid_Text_IP, UDP_Failed_Open, UDP_Failed_SetOpt, UDP_Bind, UDP_Failed_Transmit, UDP_Truncated_Send, UDP_Failed_Receive. 

  4. In Ada the exceptions are not propagated to the parent task since they would be potentially too disruptive and I can see that quite clearly. 

« Newer PostsOlder Posts »

Theme and content by Diana Coman