Ossa Sepia

June 18, 2019

Euloran Blu's and Boos

Filed under: Coding, Open Sores — Diana Coman @ 9:24 a.m.

The legacy code base on client side can best be described as a pile of tangled pointerisms and assorted CPP mess under the name of Planeshift (PS) on top of a smaller ball of ifdefism1 and XMLade called CrystalSpace (CS), on top of a comparatively saner and at any rate smaller ball of CPP called Cal3d, the whole construction bursting with a lot of good intentions and precious little useful structuring or maintenance consideration. While sanity seems to quickly decrease from bottom (cal3d) up (to CS and then various parts of PS), what strikes one even more is that the whole thing if afflicted with a grave case of XML addiction that starts from CS and then simply comes to its ugly fruition in PS. It has been obviously written at the very top of the XML hype - everything and I do mean *everything* and everywhere is best described in XML! It could be argued that the authors simply wanted to have a lot of XML of their own but since they couldn't do just that2, they had to also do a graphics engine and a sort of game on the side, almost as a second consideration really. Come to think of it, why not call it then at least XMLSpace and XMLShift?

While CS itself has at least some clear initial design and therefore some structure (if a very clunky one at best of times mainly because the author really had too many good intentions), its main underlying assumptions are precisely the opposites of mine: besides the total focus on xml, there is also this obsession with event-driven (which makes it very hard to actually follow anything from one end to the other), the generic plugins (which again means that doing the simplest thing involves a huge scaffolding upfront) and the fact that the whole thing tries rather desperately to make itself "easy to use" by those who don't understand (nor even look into) the code at the expense of course of those who do since there is no possible way to have one without the other. Leaving aside for a bit how one can never really do the work of others for themselves, the sad result is that tracking down anything in full becomes extremely difficult. The documentation helps here but only up to a point since it is not exactly up to date and more importantly, it doesn't really focus on helping the reader understand the code - it focuses instead with all its good intentions on helping the reader use the library without really understanding it, as a sort of black or at best graying box. And moreover, pervasive everywhere, there is the wrong sort of "simplification": while the weight of it all becomes obvious to the author too (it's quite usual for various constructors to have at least 10 parameters and they are already several layers deep so they anyway call parent constructors with yet another set of parameters), he doesn't see or choose a better abstraction that would help rather than hinder the effort. Instead, he arbitrarily fixes some bits and pieces: for instance, the terrain2 plugin simply expects the heightmap to be in a file called sectorName_heightmap in exactly the current path of the VFS3. In other, fewer words, weaning the legacy code base off its unsavoury assumptions and xml-addiction so that it simply loads art as it becomes available is extremely frustrating and time-consuming, making the overall situation quite foggy blue with cornered cases and hardly anything useful in sight:
cube_eulora

The art4 that you just admired above might not look like much but it's a big achievement on Euloran soil: for the first time in Euloran history, a Testy character (still sadly xml-based himself) made it outside the pre-defined sphere of the XML-Island and into the great unknown of non-predefined, non-xml, fully code-loaded and discovered as you go world! It's true that the world is for now only a cube with rather sharp and unyielding corners and no landscape at all, it's true that the very three suns that gave reasonable light on the Island look foggishly blue in here and it's sadly true that the character itself is still trapped in the XML bubble. But it's a working start and moreover it's a seed that has just caught roots within the hostile soil of the legacy code so that it can grow out of it, hopefully discarding as it grows more and more of the unhelpful mess!


  1. This is a term of art, the logs will help you. 

  2. Why? Why can't you do *just that*, whatever it is you actually want, did you ever ask yourself? 

  3. Virtual File System inside CS.... 

  4. In computer graphics EVERYTHING is art, ok? There is no such thing as mere fitted drawings and clumsy, clunky approximations, no, no. All the games have nothing but art in them! Hell, they *are* nothing but art. Well, of snorts, yes. 

June 11, 2019

Eulora Client Hierarchy of Data Draft

Filed under: Eulora — Diana Coman @ 10:29 p.m.

This is a very first draft to help with the current work on enabling Eulora's GUI client to make use of new information as it becomes available from the server rather than expecting everything to be fully known upfront (as it currently does, laugh you not!).

Eulora's data is one big hierarchy with 3 main types of information: structure, description, representation.

  • 1. Structure (i.e. what entities are inside what other entities)
    Structure is always a tree of entities where each entity is given through ID (unsigned 32 bits) + Position (x,y,z,rx,ry,rz):

    The Root of the world will ALWAYS contain:

    • an *abstract object* with meta-info:
      • ID of "self" (SelfID)
      • ID of current location (aka "sector" or room or world or whatever else it is) (LocID)
      • List of Skills: list of (skillID, skillName, skillCategory, skillRank, skillXP, skillKnowledge) (?)
      • Game Date and Time (?) --- is this a meta-matter or is it a location-property?
    • the *top-most ID* (aka topmost entity); NB: this is not always the same as LocID since character may move into a location inside another location, presumably.
    • from top-most ID down, EACH node will have zero or more contained entities (hence, child nodes given through their unique ID and concrete position within their parent).
    • NB: each node can also have in addition to its structure-children (hence entities), any relevant number of description-children (see 2 below) or representation-children (see 3 below).

    • 2. Description of what entities themselves "are" through their relevant properties

      ANY structure-node may further have, alongside its children, one or several of the following properties (given as a tuple property,value):

      • name
      • description (??)
      • (- equipment slots (list of (slotID, entityID) ?) -- are those any different from "contained"? I can't see a truly significant difference.)
      • stack count
      • quality
      • HP (hitpoints)
      • BP (bloodpoints)
      • SP (stamina points)
      • MP (mana points)
      • SpP (spirit points)
      • MSP (mana spirit points)
      • weight
      • bulk
      • weight limit (max weight)
      • bulk limit (max bulk)
    • 3. Available representation of entities (graphics, sounds, effects, maps etc)

      NB: in principle there can easily be a structure + properties split at this level too since a graphical representation may have its own structure (e.g. map defined as a hierarchy of contained terrain + geometry meshes that have in turn different submeshes with corresponding textures/materials). This seems rather unseemly though and it's unclear if there really is a need for a hierarchy of graphical detail that is anything OTHER than hierarchy of world entities to start with. So perhaps this can be flattened so that representation of entities means just that: a list of relevant graphical characteristics of ONE entity.

      ANY structure node may further have, alongside its entity-children and its properties, one or several of the following representations:

      • Mod2D (aka 2D model; at its most basic, the filename of icon to show for this; tbd)
      • Mod3D (aka 3D model; at its most basic, spec for mesh including material/texture and sockets if any; tbd) ; NB: sockets allow graphical representation of contained (specifically "equipped" or "component") entities and therefore they should probably be identified by "position" so that the entity "child" on that position will then give the actual representation shown to the user (?).
      • ModAnimations tbd
      • ModMap (aka map for this entity; at its most basic heightmap + size (or is this fixed?); possibly also a list of materials to use + corresponding density)
      • ModWeather (unclear yet if this is any different from sounds+effects really)
      • ModSounds tbd
      • ModEffects tbd (unclear yet if this is substantially different from animations+sounds)
      • ModSkin tbd (stuff like window borders and button colours)

      ANYWAY: GUI may request whatever it wants from local cache, regardless of cache's own internal representation (just as cache may store data in any form, regardless of smg's own description of hierarchy).

    TO DO: flesh out in more detail each possible representation, for GUI to ask & work with + get it to actually work outside of/inside the intricate mess of ps paws, widgets, managers and whatnots maggots.

May 27, 2019

Eulora's Client Core: The Dedicated Requester

Filed under: Coding, Eulora — Diana Coman @ 9:38 p.m.

A crucial requirement of Eulora's new client is to actively request from the server ANY data that it may find itself missing at any point in time. At first glance, this seemed to me simply a matter of providing request services1 from Eulora's new Ada-based core and then adjusting the existing C/CPP code of the legacy client to make use of those services. This rather optimistic idea is of course plain wrong: "adjusting the existing C/CPP code" in this context is similar to saying that one "adjusts" a sheep to use the library - while it can certainly be done for various definitions of "done" and the sheep may indeed use the library one way or another, it's at best a huge waste (of time, of resources, even possibly of steak) for everyone involved and no matter how one looks at it.

Even leaving aside for a moment the trouble with "adjusting" the legacy tangle in any direction, the more important issue here is that this requirement is not as much a functional requirement as a non-functional, quality of service requirement: whichever part of the client provides data services, it should better be dedicated to the task and do whatever it takes to get it done instead of "providing" it only fair-weather style - if it's easy, there you are and if it's not easy then it's your problem really. In other words and in marked contrast to the very democratic "best"-effort-you-can-do-anything-anytime existing C/CPP code2, the new code should have a clearly defined task and then either complete it or die trying over and over again, taking full responsibility for the process involved, not just for some specific detail conveniently chosen nor - as an excuse for not delivering - for the outcomes that are not fully under its control3.

Considering therefore "active and dedicated request" as a quality of data service on Eulora's client side, it follows that its place is rather close to the data cache mentioned previously and at any rate inside the new Client Core since it's certainly not some additional part of client logic nor some bit of user interface. However, I'm reluctant to make it the responsibility of the cache itself since the cache is a passive structure that focuses on *storing* data and *providing access* to it. Mixing passive data storage with active data acquisition doesn't make much sense to me and even seems ill-advised for Eulora's client given the competing requirements: on one hand passive, immediate-response local data storage and on the other hand active, possibly-delayed and world-facing (i.e. communicating with the server) data acquisition. So I'd rather avoid this passive-active construction and have instead the two as separate entities: a EuCache dedicated to storing and retrieving on demand *any* data; a Requester dedicated to acquiring *any* data that is demanded of it. Note that the definition of "acquiring" here has nothing to do with the means through which the Requester actually gets this data (specifically nothing to do with the exact messages sent/received/exchanged with the server). Acquiring some data means simply that the required piece of data becomes available in the local cache aka EuCache. So the Requester will keep requesting this data from the server through whatever means it knows until either the data arrives and becomes available from EuCache or otherwise the whole client kills itself for lack of server4 and therefore of any possibility of playing the game.

Specifically, the Requester will be implemented in EuCore (and therefore in Ada) as a protected object exposing only a few procedures that are essentially notifications: some are notifications of demands for some specific piece of data (either an Object or a File really since those are the only 2 overall types of game-data that one can request from the server); the others are notifications of data being received or of timeout interval having elapsed (in other words a notification of failure to receive data). Note that the demand for an "Object" effectively means a demand of its properties and those might be indeed anything at all but each and every one of them will be either directly a value or otherwise an ID of another Object or of a File. All notifications (including those demanding data) are always accepted by the Requester but not necessarily immediately acted upon. Clients of the Requester do NOT control the actual requests to the server or messages exchanged and are not even concerned with those at all - the production of actual requests, their content and their timing are entirely the job of the Requester and under its sole control. Implementation-wise, the Requester will simply keep queues of requested Objects/Files and will then proceed as soon as it can to pack a request for as many of them as possible; this request will then be posted to the server and the Requester will set a timer to the timeout value so that in the worst case it is at least notified that nothing happened; when/if any data is received or when this timer expires, the Requester will check in EuCache to see what items (if any) of those requested are now available; any items that have become available will be discarded from the watchlist of the Requester (i.e. the demands for them are considered completed) and a new request may be created and posted to the server for any items that are still in demand but not yet available. Note that even in the event of a timeout, a "repeated" request to the server may not be identical to the previous request since the list of demanded data possibly changed in the interval.

One potentially iffy point for the Requester is its need to be notified of any incoming data. At the moment I don't see any real way around this, short of making the Requester poll at set times the EuCache and checking if any data of interest has meanwhile arrived. I don't really like this polling approach here because it's rather wasteful without good reason: any incoming data is indeed received and processed by another unit that is also on the same level with the Requester, namely the Consumer (the part that processes messages from the inbound queue). So the Consumer will have to notify the Requester when new data is received. While several Consumers may be active at the same time (at least one for Serpent and one for RSA messages) this is not a problem at all since the Requester is anyway a protected object i.e. thread-safe. Note also that even if (some of) the consumers fail to notify the Requester of some incoming data, the whole thing will still work if only slower than it could: the timeout timer will wake up the Requester and the check of data will happen there at any rate. In other words, the Requester is capable of reacting to events if notified of them but not dependent on those notifications to do its job correctly.

Given its rather complex task, the Requester is currently on the top conceptual layer of EuCore, making use of quite a lot of other units from lower levels. Currently, the main relevant units on this top level are the following:

  • Data Cache aka EuCache - this is a passive, thread-safe entity responsible for storing all and any data given to it and retrieve or delete it on demand. As such, it *owns* the specific format in which data is stored5 and it simply exposes functions and procedures for storing, retrieving, deleting and checking for data.
  • Communication Link aka Comm_Link - this is a passive, thread-safe entity responsible for persistent storage and updating of communication details, most notably RSA and Serpent keys for inbound and outbound communications as well as a message counter. This is effectively a specialized cache - where EuCache is for game data, Comm_Link is for communication protocol data. The requirements (including use contexts) and specifics of the two seem to me sufficiently different to keep them separate at least for now.
  • Consumers of incoming messages (RSA and Serpent) - those are separate, active tasks, responsible for processing incoming messages. Consumers own and get to define what "processing" means exactly but their role is to extract the data contained in the messages received and make it available for use by Eulora's client. In practice this means currently that Consumers will pass any data received on to EuCache for storage and will notify the Requester of any data receipt.
  • Requester - this is an active, thread-safe entity responsible for acquiring data that is in demand. It owns the process of data acquisition from the server and it accepts any demands of data specified by some identifier. While it guarantees that all demands will be served at some point in time as long as the whole program is running, it does not (and can not) guarantee when or if the corresponding data becomes available. It can't even guarantee that there IS any corresponding data and so ALL it can do is to guarantee that it will try with the same dedication for each and every bit of data to acquire it. Anyone demanding data can of course pester the Requester with more demands or give up or decide for themselves for how long they can or will wait.

And now that the main idea and overall design of this whole thing is at least quite clear to me, there remains of course the small bit of actually implementing it in full (rather than the current skeleton I already made) and sorting out the various implementation-level troubles that will appear as they always do, in all sorts of details. Relatively easy work compared to what follows it inevitably, namely teaching that C/CPP sheep to use the EuCore library...


  1. Mainly picking, packing and encrypting everything in the right format + sending it through to the correct place. 

  2. Seriously, think of it: existing client code is this event-driven thing where *anyone* can subscribe to any event and then "do" anything at any time provided of course that the any and anything are in fact very narrowly defined and set in stone even to the level of names of various art files (it has to be a zoneinfo.xml inside this and that and put exactly there, sort of thing). If this narrowing of "do" was not a price high enough to pay for such code "liberty", there is also the added reality of a huge tangle that does precious little indeed since everyone ends up calling anything from anywhere and no piece of code is really and truly responsible for anything bigger than a few variables here and there. And at the end of the day how could any code even be responsible for anything since it can't *own* any process by itself (shared! event-driven!) and it has to be passive, mainly reacting to some events or at most... signalling through events of its own but never able to rely on anyone giving a fig about its signalling! So there it is, code - like anything you do - is more of a mirror than you might think. And "teaching people to code" has way more layers than "teach them Java" and certainly more issues than current "courses" even seem to be able to imagine. 

  3. And now that it's clearly stated like this, tell me first just how many people you know who actually handle their own work like that? And just what bit of "programming language" you think one needs to teach so that you get programmers to actually design their code this way? 

  4. This "lack of server" is defined in practice as a number of successive timeouts on requests sent to the server, where the specific threshold value is chosen by the user via a config file. 

  5. Currently game objects are stored in memory in Hashmaps aka associative arrays while art/files are stored directly on disk. Note however that any of this can be changed without touching anything outside EuCache as it's nobody else's business. 

May 4, 2019

Eulora's Client Core - Basic Docs

Filed under: Coding, Eulora — Diana Coman @ 8:56 p.m.

As I'm doing anyway the work required for a saner Eulora client, I'm stuck as well with writing a minimal documentation for it, since on one hand there's no one else who can quite write it anyway and on the other hand there is no such thing at all1 for the old piece of inherited gnarl so the slate is blank, the space infinite and the need impossible to know in advance. So I'll just jot it down here for future reference and if you have anything to say about it, the sooner you write it down in the comments section below the better for you.

As general structure, the current plan for the new client is to have essentially 2 main parts: one is the actual Core, written in Ada and - hopefully2 - provided as a standalone library, blissfully unaware of ANY of the C/CPP legacy tangle; the other is an updated version of the current C/CPP tangle of a "client" that will use the Core for pretty much everything except the actual GUI. The GUI remains provided via CrystalSpace and cal3d, as it currently is. In other words, the Core offers an implementation of a basic Euloran client, to be used from/by any UI, graphical or otherwise as you care to make for yourself. Note that the interface is NOT the only thing you can build on top of the client Core. Essentially any client logic, bots and anything of the kind would still be the sort of thing built on top of / around the Core. A quick diagram of the whole thing looks like this:
2019_05_04_clientcore_struct_narrow

The Core itself is currently made of 3 conceptual layers, in order from bottom to top: encryption utilities (EuCrypt), SMG communications (SMG_Comms), client data cache. The encryption utilities are those of EuCrypt: CRC32 calculation, Serpent encryption/decryption, Keccak encryption/decryption, RNG interface using Fuckgoats, RSA key generation, RSA encryption/decryption using TMSR OAEP schema. The SMG communication part provides mainly: UDP with fixed-size but configurable payload, in/out message queues and attached "consumer" tasks, config file reader, packing/unpacking (i.e. raw, bottom-most layer of SMG communication protocol), read/write of messages conforming to SMG communication protocol specification, thin layer on top for encapsulating a "communication link" i.e. required addresses, ports and keys for both ends. The client data cache keeps effectively the whole world of Eulora as the client knows it at some specific time and provides thread-safe getters and setters for anything and everything that it can possibly hold.

Note that the client's knowledge of the world is by definition a cache as it flows always from the server and entirely in response to client's own requests - there is by design no "push" of information or anything else from server to client at any time. So the client may request from the server anything and everything it wants to3, having however to decide for itself what to make of the answers it receives and even what "obsolete" means in various contexts. At the moment this cache is still in the works and for this reason the least fully-fledged but at any rate, it has to grow around the structuring of data that is discussed in the SMG protocol specification: a potentially infinite hierarchical structure of "objects"4 having each a subset of an overall set of potential "properties"5 with their corresponding "values"6. Due to this trinity of objects-properties-values with its corresponding trinity of ID spaces, I took to calling the whole thing a tricache but the name as the design are still in the works so they'll have to enjoy their own detailed description at a later date. Until then, what have *you* been working on and how is that going?


  1. Perhaps you count this spaghetti piece as documentation but even so, just note who extracted it in the first place, anyway. 

  2. I'm holding quite tightly to this idea as I certainly do NOT want to have it depend on any of the current C/CPP mess. This being said, I can't know in advance what I might still uncover so for now and until it's all done, the qualifier remains "hopefully" rather than surely. 

  3. This in practice is of course quite severely restricted by what it *knows*, unsurprisingly. 

  4. Where "world", "map", "actor" or "pot" are equally objects and not even the only objects. Essentially an object is anything that is present in the game's world. 

  5. A property is a pre-defined and public structure that specifies the *type* of values that make up the property and their actual meaning. 

  6. The values may be given directly (e.g. "3" or "somename") or by reference aka as an ID. This ID may be itself either the ID of a suitable object or the ID of a value that is essentially too big to pass around every time - in the protocol's specification this sort of ID is currently called for historical reasons "filename" but it's worth noting that there is no imposition for it to be stored as a file at any point. 

April 17, 2019

Reading Notes on FFA Ch1 - Ch4

Filed under: Coding — Diana Coman @ 7:57 p.m.

While Stan says - and I believe him - that each chapter of FFA is meant to take one somewhere between 1 and 2 hours to grasp, I can confidently say by now that either I don't do grasping at all1 or I'm a very, very slow grasper/learner (quite possible at that). Sure, I could always *read* one FFA chapter in 1-2 hours and even with pen and paper but it took way longer to get to the point where I'm happy to sign it. And yes, it's also perfectly true that *now* it takes me about 1 hour give or take to go *again* through one of the FFA chapters I have already studied but that's the sort of "oh, 3 hours only; after how many years?" timing. While part of this "speed-up" is inevitably down to familiarity with the material, there is also a large component of pitfalls that I can avoid now, after having been in and out of them before. And while I tend to at least not fall for the same mistake twice, this is not to say that having the obtained notes written down in one place is useless - at the very least, they will serve me as a quicker way of getting back into FFA at a later stage, a bit of quicker-loading of sorts.

The structure of FFA itself is quite logical but I still kept stumbling over / forgetting bits and pieces as I went. So I made myself a map of the FFA packages, updated with every chapter and kept handy for easier reference. I found this map especially useful as a way of making sure I don't miss anything since otherwise the FFA series is (by design, I think) quite strictly presented from the bottom up and I tend to struggle to keep in memory all moving parts unless I can link them together even in the loosest of ways to some bigger picture2. So here are my FFA3 maps so far with whatever observations, comments and various nitpicks of all sorts I still have attached to each chapter.

Chapter 1

This chapter introduces the very foundation of FFA: what is a FFA number (FZ by its name), what is it made out of ("Word" components), how can one do - in constant time, of course - basic operations at Word level and very simple arithmetic operations (+ and -) with whole FZs. Crucially, this chapter introduces FFA's approach to implementing "if" without actually branching: the Mux selector. Notes and nitpicks:

  • Packages are Ada's units for structuring code, hence the units in the diagram below are exactly that: Ada packages. The ".ads" only packages happen to not have a body - this means simply that they group together a set of definitions (types, constants, variables or even imported procedures/functions) but do not require/have anything else.
  • In Words.ads I puzzled at first over why not use for Byteness both Iron.MachineBitness and Iron.ByteBits rather than using one constant from Iron and one local to the Words package, especially since the local value is defined precisely equal to the one in Iron and I still get confused over simple names like that. I suppose the idea is though that this is not necessarily true if one wants to have a Word that doesn't directly match the underlying iron (or rather the other way around: if one runs the program on whatever non-standard iron). Alternatively I suppose one can simply define "MachineByteness" in Iron as well and then simply define Words.Bytness = MachineByteness. It's just the mix of the two that pokes me in the eye even though it's just such a tiny detail.
  • In FZ_Type my inner maniac keeps asking: why Word_Index and not FZWord_Index, why Word_Count and not FZWord_Count given FZBit_Index? Sure, there is no other word_index than the one in a FZ (while there are bits in a Word too for instance) but each tiny non-standardised name adds to the number of things I need to remember and my memory for this sort of stuff is pretty much entirely absent. So for all the additional, annoying task of having to write FZWord_Index every time instead of the shorter Word_Index, I'd still rather do it and standardise the names: if it relates to a FZ, then it starts with FZ (or any other standard really, as long as it doesn't have exceptions).
  • The W_Mux is a FUNDAMENTAL mechanism of FFA: effectively it is "if" the FFA way (i.e. constant-time and without actual branching). So it's worth studying this in more detail if you are not familiar with the way a mux works.
  • The WBool type can be at times a bit ambiguous in use since it's both a true/false sort of thing by definition (1 and 0 are the only values it can take) but also the direct result of arithmetic operations. The key to this is of course the fact that the arithmetic operations in question have a 1-bit long result (or look at only 1 bit of a larger result) and so their possible range of values is restricted to 0 and 1.
  • In W_Shifts the Amount parameter is everywhere of Natural type, presumably to fit the import requirements. However, given the intended use as part of FFA, I think this should logically be one of the Index types defined and most specifically WBit_Index (from Words.ads) since those shifts are done at word level (hence any shift by more than WBit_Index'Last is just the same as a shift by WBit_Index'Last anyway).
  • A crucial observation regarding indices and counts: a FFA number (i.e. a FZ) can have by definition any range of indices for its component words (e.g. an FZ could be composed very well of 3 Words on positions 4, 5 and 6 in principle) BUT it has to have at any given time AT LEAST 1 word. This of course is perfectly sane since there is no such thing as a number of 0 length (0 itself has length!) but it's worth noting it since it comes into such wonderful contrast with previous sad "bignum" implementations that confuse 0 with "doesn't exist".

ffa_ch1_struct1

Chapter 2

This chapter adds logical operations (i.e. "predicates") at Word level and new operations with FFA numbers (FZs), most notably logical operations, bit operations and comparisons. A crucial part to get here correctly is the convention used throughout FFA for the order of words and bits. While "endianness doesn't matter", nevertheless "set_head", "get_head", "shift_left" and "shift_right" do imply a direction and messing with *that* turns out to be for me a lot of pain. After quite a few times tripping over this in various places, I found I'd rather NOT use Stan's convention that "most junior bits on the left" as that just doesn't work for me: the "FZ_Get_Head" returns "lowest, bottom-most word" wtf head is bottom!!; the directions for shift_right and shift_left are inversed (since their names fit the "most senior bits first" representation) and moreover one is sort-of having to think of the whole number in reverse representation all the time. The ONLY reason I can see for this is so that indices increase from left to right but honestly, it's way, way easier for me to count from right to left (or in decreasing order!) and keep everything else in place than to keep the counting direction in place at the cost of having to remember repeatedly that the "direction" of all sorts is the opposite to what is written down. So I go with the following approach instead:

  • FZ for me are represented with most senior words and bits on the left. If you prefer, call it big-endian. I'll just call it human-convention. The machine doesn't care either way but it helps me tremendously to understand what goes on (see below).
  • Given the above, all bit-level and word-level operations actually fit the direction given in the name: a shift_left moves numbers to the left and therefore to higher positions, increasing the value of the FZ (i.e. it's a multiplication by 2^Amount); a right_shift moves to the right, to more junior/lower positions, decreasing the value (a division by 2^Amount).
  • The index (if it starts at 0 anyway) neatly matches in fact the power of the base corresponding to that position if one was to convert to denary. So it's no additional load for me at least to look at it as increasing from the right to the left.
  • I think Get_Head/Set_Head would be better named at the very least Get_Bottom/Set_Bottom or, even better, Get_Junior/Set_Junior or similar. I don't want to... turn the left-right convention on its head just to have in there also the head/bottom convention. Instead, I think it's best to stick ideally to the *role* of the Word/bit retrieved (i.e. most junior) rather than marrying it to the position in any given representation and if this ideal is too much to ask for then at least as the next-best keep it within the already imposed convention of left-right and NOT add yet another convention to the mix4.

ffa_ch2_struct

  • Most of the procedures involving 2 FZ have as precondition that the lengths of the 2 are the same since otherwise comparisons and copies hardly make sense. While this makes sense in itself each and every time, it seems to me that FZ_Mux for instance *also* relies on the FZ having the same *range* of indices. While I'm sure that the later FFA use of FZ_Mux does not end up feeding it FZ with different ranges, I wonder whether there is any significant penalty to making FZ_Mux NOT rely on the FZ having the same range? Or is there some reason that I'm missing for requiring the same range?
  • What is exactly the need for FZ_Get_Head and FZ_Set_Head anyway? I could see the idea that it's a way to ensure that everything remains as it is even if one decides to move the head somewhere else (to the end) but for one thing I find them confusing (perhaps my fault, sure) and for the other FZ_And_W for instance still uses N(N'First) rather than FZ_Get_Head(N)/FZ_Set_Head so changing the position of the head would still require some changes in other code.
  • Why is fz_cmp a separate package? As far as I can see, the contents of this one are still FZ predicates so I'd put them totally fine into FZ_Pred. I'm not sure that the distinction unary (fz_pred) vs binary (fz_cmp) predicates really requires a different package or that FZ_Pred is growing way too big if it swallows up FZ_Cmp.
  • Perhaps make all the packages related to Words start with W_ so that there isn't a Word_Ops but a W_Pred and W_Shifts?

Chapter 3
This chapter adds the shifts for whole FZs. Once I remembered to stick to "most senior words/bits to the left", the rest is quite straightforward to understand. And I don't even have any nitpicks left for it, can you believe that?
ffa_ch3_struct

Chapter 4
This chapter focuses on introducing FFACalc and as a result the changes/additions to FFA lib itself are minimal: a new subtype "Nibble" in words.ads; FZ_NZeroP in fz_pred; WBool_To_FZ in fz_basic; a new package for enforcing/checking valid bitness, fz_lim. The difficulty of this chapter is mainly in getting used to FFACalc. As with the rest of the FFA series, the best way I found was simply to sit down and try and write myself the code that would do one or another thing - it turns out that there aren't all that many significantly different approaches given the existing building blocks and moreover, any differences are informative, usually of one's own mistakes/current lack of understanding. So without further nitpicks, the updated diagram looks like this:
ffa_ch4_struct

Update 18 April 2019 - the diagrams above are made with Dia so here are the .dia files if you want to tinker/change them for your own use: ffa_ch1_struct.dia ; ch2.dia; ch3.dia; ch4.dia.


  1. Perhaps I do only bulldog-style vicious hold-and-never-let-go. It goes with attention to detail and various other traits in my inventory. 

  2. A bottom-up approach has its advantages and it's not in itself a problem at all, of course, except that one needs to keep going for quite a while without really knowing where they are going. It's not even a matter of trust - I certainly trust Stan that he knows where he's going and he even drops clear hints on the way. It's simply that I find it easier to follow and retain the detail if I can also place it even vaguely in some bigger picture. On the bright side, I can say I have *plenty* of practice with the strictly bottom-up approach since my Romanian education followed precisely this approach and nothing else to the point that it took more than 10 years of school for some things to finally click as belonging *somewhere*. So yes, by comparison to that, FFA is far, far less demanding of the learner's trust and patience. But my overall learning experience has always shown that it helps - me at least - to have a more balanced approach. 

  3. I focus on the FFA lib itself so the Demo/Peh are so far not included. 

  4. I'm aware that there are and can be as many conventions as there are mouths to emit them. Still, every new convention added to the mix has a cost and I'd much rather keep this down as much as possible. First of all for myself, yes. 

March 15, 2019

EuCrypt Chapter 16: Bytestream Input/Output Keccak

Filed under: EuCrypt — Diana Coman @ 4:30 p.m.

~ This is part of the EuCrypt series. Start with Introducing EuCrypt. ~

Keccak1 suffers from a bit/byte issue: while internally the actual transformations work at byte-level, the input is taken bit by bit and expected to be in LSB order2 while the output is offered again bit by bit but coming out in MSB3 order if taken byte by byte. Moreover, the padding applied to any input to bring it to a convenient length is again defined - and even applied - at bit level rather than byte level. While originally I discussed this issue in more detail and systematised the options available for Keccak to get some clarity and make a decision on either bit-level or byte-level, the actual implementation followed as closely as possible the original specification retaining bitstream (i.e. bit by bit) input and output while doing internally the transformations at byte level, as confusing as that was. As soon as this implementation was put to actual use though, it became clear that the bitstream part really has to go because it causes huge waste (8x stack-allocated space for any input, quite correctly described as exploding) and trouble in the form of overflowing the stack even for relatively small inputs. So this is the promised .vpatch that updates Keccak to work on bytestream input and produce bytestream output, getting rid of the x8 waste and effectively choosing options 1.1, 2.2 and 3.2.

The obvious change is to convert all the bit* to byte*. This includes constants such as the Keccak rate that is now expressed in number of octets, types such as bitstream that becomes bytestream and functions such as BitsToWord that becomes BytesToWordLE. Note that the internals of the Keccak sponge (i.e. the transformations) are unchanged since they weren't working at bit-level anyway. The less obvious change is the addition of bit-reversing (via lookup table since it's fastest) - this is needed to ensure that Keccak receives and produces at octet-level the same values as it did at bit-level. Specifically, this means that input values on Big Endian iron will be bit-reversed for input (since input is expected LSB) and obtained output values on Little Endian iron will be bit-reversed (since output is extracted MSB). It's ugly but so far it's the only option that doesn't change the Keccak hash essentially.

The .vpatch contains those main changes:

  • Bit reversing: there is a lookup table with the corresponding bit-reversed values for any byte (i.e. 0-255 values mapped to corresponding values with the other bit-order convention). So the bit-reversing of a byte is simply a matter of reading the value from the table at that index.
  • Input: bytestream instead of bistream so effectively an array of octets. Because of Keccak's LSB expectation re input bits, the BitsToWord function became BytesToWordLE meaning that it will reverse the bits on Big Endian Iron. The Padding is also expressed at byte level in LSB format (so the last byte is 0x80 rather that 0x01.
  • Output: bytestream instead of bitstream. Because of Keccak spitting out MSB when output is extracted at byte level, the WordToBits function became WordToBytesBE meaning that it will flip the bits on little endian so that the iron sees the same value as a big endian iron would.
  • Tests: there is an additional test that checks the values in the bit-reversing table for correctness, effectively calculating each of them and comparing to the constant; other than this, the same tests are simply updated to use bytestream/bytes as input and output; as in the previous version, there are also tests for calculating a hash or using Keccak on a String, quite unchanged.

The .vpatch for the above can be found on my Reference Code Shelf and is linked here too, for your convenience.

As I don't have any Big Endian iron around, I couldn't test the above on anything other than Little Endian so if you are looking for something easy to help with, you've found it: kindly test and report here in the comments or on your blog (a pingback will be enough or otherwise comment here with a link).


  1. As originally specified by Bertoni et al. 

  2. Least significant bit. 

  3. Most significant bit. 

March 11, 2019

Compiling Crystal Space with SJLJ

Filed under: Coding — Diana Coman @ 12:42 p.m.

Miserable failures and unwanted problems make great blog fodder1 and so here I am, writing yet another set of compilation notes. This time it's all about how to ditch the non-standard ZCX run-time for its well-behaved and otherwise similarly fast ancestor, SJLJ.

My project to compile is fairly complex as it links dynamically with a bunch of libraries and it includes code in C, CPP and Ada all together. The first naive attempt at simply setting the PATH environment variable to point to the ZCX GNAT (with dynamic linking) and then recompiling the server code itself (i.e. without recompiling the libraries) failed at linking stage:

Error relocating ./euserver: _Unwind_SjLj_RaiseException: symbol not found
Error relocating ./euserver: _Unwind_SjLj_Unregister: symbol not found
Error relocating ./euserver: _Unwind_SjLj_Register: symbol not found
Error relocating ./euserver: __gxx_personality_sj0: symbol not found
Error relocating ./euserver: _Unwind_SjLj_Resume: symbol not found
Error relocating ./euserver: _Unwind_SjLj_ForcedUnwind: symbol not found

The good thing about those errors are that they confirm at least that the SjLj run-time is indeed to be used, rather than ZCX. So the correct intentions are there but the result is still miserable failure. A quick report in the forum yielded immediate help as Stan kindly provided the very useful observation that such errors at the final linking stage can only mean that it is still trying to link with a non-SJLJ gcc standard lib. Outrageous and unpermitted and how does it dare2!! A poke of the executable euserver by means of ldd revealed that indeed, it is apparently linking the ZCX libstdc++.so.6 and libgcc_s.so.1 and all of that because of the pesky Crystal Space (CS) library! After a bit of initial doom when I thought I might need to recompile the whole bloody toolchain manually with --enable-sjlj-exceptions or such something, some rummaging around in the SJLJ-istic GNAT's directory revealed that it actually has already everything that's needed - if only they'd be linked instead of the ZCX-isting versions! So it came as a relief really that it's "only" CS3 that needs to be recompiled too4. Onwards!

A quick look through CS's makefile made it plenty clear that the naive approach of a simple .gpr file is not going to work: basically CS suffers deeply from ifdefism since it caters to all-and-everyone it can think of. So at the moment no option other than going at it with simply the PATH set properly to the SJLJ GNAT and then running: make clean; ./configure --without-java --without-perl --without-python --without-3ds --with-cal3d=$HOME/eulora/cal3d . And then all of a sudden, the configure reports that it couldn't find zlib although obviously, zlib did not run away anywhere. Moreover, a closer inspection of the full output of configure reveals that it *does* actually find zlib (so it's not a matter of "don't know where it is") but it reports it as unusable as it can't find apparently the ...headers. After various attempts at feeding it the path to the headers in various ways5, I got a bigger hammer and simply ran (making sure that it is PRECISELY the same version I had otherwise in the ZCX-istic toolchain):

curl -sL "http://www.zlib.net/zlib-1.2.11.tar.gz" -O
tar -xzvf zlib-1.2.11.tar.gz
./configure --prefix=$HOME/eulora/zlib
make
make install

Back in the CS dir, I ran again:

jam distclean
./configure --with-z=$HOME/eulora/zlib --without-java --without-perl --without-python --without-3ds --with-cal3d=$HOME/eulora/cal3d
jam -aq libs plugins cs-config

The configure above worked fine and it reported that it did find zlib this time and it's usable and all is good. And then the jam went ahead and compiled all sorts, some of them even *with* #include < zlib.h > so obviously all was well, right? Except:

In file included from ./include/csutil/archive.h:36:0,
                 from cs/plugins/filesys/vfs/vfs.cpp:39:
./include/csutil/zip.h:32:18: fatal error: zlib.h: No such file or directory
 #include 
                  ^
compilation terminated

So VFS6 is brighter than all the rest and it can't find zlib. Moreover, a manual run of the very same compilation line with added -I$HOME/eulora/zlib/include completed absolutely fine and then the jam -q libs plugins cs-config went through without any trouble at all. The rest was simply waiting for the rather slow and bloated build of CS. So at this point, one would need to chase about in the config and jam files of CS and vfs where is the place to add the missing include flag. Some searches through the whole autoconf+configure+makefile+jamfile revelead that the ZLIB.CFLAGS and ZLIB.LFLAGS are set correctly for Jam to use so the problem seems to be simply that the vfs plugin is not aware it needs to use them. Nevertheless, I am not very keen on spending even more time essentially learning more on all those tools that are set to be discarded as at some point everything will have to move to GNAT and gprbuild anyway. So for now there it stands, ugly as it is: manually re-run that compilation line for vfs and be done with it7

Once CS was compiled with ZCX, the rest went perfectly fine, server and tests and everything else. Do remember to fully clean your previous compilation though before recompiling for changing from ZCX to SJLJ as otherwise all sorts of weird errors will appear. With gprbuild that's as simple and fast as running in your project's directory where the .gpr file is:

gprclean -r
gprbuild

And that's all! I happily have the server using SJLJ and so I can finally go up rather than down a few levels, up this pit of unexpected troubles and back to what I was actually doing - migrating both server and client to Eulora's new communication protocol so that there can finally be a text client too and everything one wants!


  1. Not that I am really lacking either problems or blog fodder but at this rate I'll end up with a compilation notes category all by itself. Other problems, failures and assorted troubles want a place in the sun too! 

  2. Especially since all the .gpr files of the server code specifically include the option --nostdlib to make sure the standard lib is not included and then a direct specification of the musl, ZCX rpath and dynamic-linker to use. 

  3. I recompiled also Cal3d since CS depends on Cal3d but that was simply a matter of cleaning properly the previous compilation, configure and recompile. Cal3d compiles even with a basic .gpr file so by comparison to CS it's absolutely wonderful. 

  4. In hindsight of course it has to be recompiled since CS contains all but the kitchen sink and specifically for this part of interest all sorts of threads and thread management "utilities" that pull in all sorts.  

  5. A run of ./configure --help said that one can use --with-z=path-to-zlib to directly tell configure where zlib is; very nice, except it still wasn't enough. 

  6. Virtual File System; inside a supposedly graphics engine; you have no idea. 

  7. CS is really to be compiled only once per system and that's it. If it becomes something I need to compile more often then it will have to be trimmed and ported to gprbuild anyway. 

March 4, 2019

GNAT Compilation Notes

Filed under: Coding — Diana Coman @ 4:55 p.m.

Ada is *the* republican programming language1 and GNAT is its compiler but the compiler of GNAT itself is... another GNAT! While this would mean at first glance a self-contained environment and the end of headaches once GNAT + Ada are fully digested, the recent surprises with GNAT's default disdain for the Ada standard show that some parts are likely to be simply indigestible. Such parts - ZCX run-time especially - will probably end up at least sidelined if not directly and explicitly discarded by an emerging republican standard. Still, as part of investigating those unwanted surprises and especially the actual differences between ZCX and SJLJ, I gained a lot more experience with compiling GNAT itself in all sorts of configurations and on various environments. And since I promised to publish my GNAT compilation notes, it makes sense to publish at least the complete summary to have it in one place for my own future reference2.

GNAT Versions

  • The initial GNAT that made it into TMSR and served so far rather well is Adacore's 2016 GNAT GPL, mirrored previously as part of my work on EuCrypt. The mirrored version is compiled with both ZCX and SJLJ run-times, allowing one to easily switch between them with a simple flag, namely: --RTS=sjlj given to gprbuild3. ZCX is the default so compiling without the --RTS flag simply means that ZCX is used.
  • Ave1's GNAT version 2018-01-17: this uses Adacore's 2016 GNAT only to bootstrap the compilation of a republican GNAT linked against the MUSL C library. Note that the shitgnomes have already messed up some of the links in the scripts there - according to my notes the current scripts fail with a "not found" error at fetchextract http://ftp.gnu.org/gnu/gcc/gcc-4.9.adacore2016/ gcc-4.9.adacore2016 .tar.bz2 . Obviously, the pill to this is to have everything already downloaded and only then run the script (i.e. no fetch, only extract).
  • Ave1's GNAT version 2018-04-30 adding support for ARM architectures.
  • Ave1's GNAT version 2018-05-15 adding parallel build support that effectively cuts significantly the time required for the full build4
  • Ave1's GNAT version 2018-05-29 moving towards supporting only static linking (but not quite there yet, see next version).
  • Ave1's GNAT version 2018-06-01 ditching dynamic linking and therefore building only statically linked programs.
  • Ave1's GNAT version 2018-09-24 fixing an issue with a hard-coded path.

As it can be easily seen above, there are essentially only three main versions: the original Adacore GNAT (glibc-based), Ave1's musl-based GNAT supporting dynamic+static builds (i.e. version 2018-05-15) and Ave1's musl-based GNAT supporting only static builds (i.e. latest version, 2018-09-24). Ideally one would be able to ditch dynamic linking entirely and simply work only with Ave1's latest version but so far I still have on my hands all sorts of code that is far from ideal.

Compilation Options

  • To make use of your computer's cores and speed up the build time: export MAKEOPTS="-j8" (obviously, adjust the value depending on your number of cores).
  • To build effectively offline5: download all the needed tarballs and unpack into the tarballs directory. Since I did this, I went one step further and also simply deleted the scripts download-adacore*.sh as well as their calls from the build-*.sh scripts but this is entirely up to you of course. NOTE: currently the 2018-09-24 seems to have a problem with all the tarballs already in place - my latest run failed complaining that the bin dir doesn't exist inside the created bootstrap, specifically: ../../extra/../build.sh: line 148: cd: /home/eu-test/adabuilds/build510/bootstrap/bin: No such file or directory. This requires deeper digging and Ave1 will perhaps help me out through the maze of scripts in there.
  • To build with SJLJ run-time (NOTE: you'll obtain ONLY SJLJ so no switching with --RTS):
    1. Add to extraconfig.sh the following 2 lines:

    GCC_BOOTSTRAP_CONFFLAGS="--enable-sjlj-exceptions"
    GCC_CONFFLAGS="--enable-sjlj-exceptions"
    

    2. Change the setting of ZCX_BY_DEFAULT from "True" to "False". Ideally this would be changed via a Makefile but so far I wasn't able to find the Makefile that actually works for this. So I hacked it for now: unpack the gcc-4.9.adacore2016.tar.bz2 archive from tarballs/ ; the files of interest are in gcc-4.9.adacore2016/gcc/ada namely system-linux-* (one for each architecture - I modified all of them): change in those ZCX_BY_DEFAULT to False; pack the tar.bz2 archive back (e.g. tar -cjvSf gcc-4.9.adacore2016.tar.bz2 gcc-4.9.adacore2016); run the ./build-ada.sh script and that's it.

So far I was able to compile the following:

  • 2018-09-24 (aka latest, static ONLY): ZCX version using existing Adacore's GNAT; ZCX version using previously compiled 2018-09-24 ZCX version (i.e. itself!)
  • 2018-05-15 (aka older, supporting both dynamic and static linking): SJLJ and ZCX versions using existing Adacore's GNAT; SJLJ and ZCX versions using previously compiled 2018-05-15 on same machine (i.e. without running into the paths issue).


  1. For a summary of "why Ada", read Mocky's Log Reference on it. 

  2. Or so I hope - that in the future the reference will save me from gaining even *more* experience of the same sort that I gained here. Ever the optimist that I am, I know. 

  3. gprbuild --RTS=sjlj 

  4. I could build GNAT in a bit over 1 hour on 8 cores. 

  5. I recommend this for at least 3 reasons: you know what you are building with, since you put those tarballs there - hopefully you checked them; you avoid any trouble with broken links and similar shitgnommeries; it's faster. 

March 3, 2019

Inconsiderate Considerations or a Tiny Euloran Dataset on Blueprints

Filed under: Eulora — Diana Coman @ 8:57 p.m.

Blueprints in Eulora are currently the product of expensive clicks: millions go in to pay only for the most inconsequential of tokens, thousands come out and few of the blueprints obtained are what one wanted to get anyway. In typical euloran fashion however, this exercise in the frustrations swamps is anyway on one hand unavoidable and on the other hand potentially, eventually, greatly rewarding1 - with a bit or a bit more of luck and other assorted arcana. That being so, I let my luck do whatever it does and otherwise I set about to collect in one place a tiny bit of data on what comes out of where. About 200mn ECu later, here's what I've got when clicking them with a noob2:

Consideration Type3 Consideration q4 Bundle q Output q Blueprints obtained5
10 x Apprentice Tinker Considerations 179 37006 34 LTF, Slag, CT, CC, PPB, IO, DUH, SCS, BCT
10 x Apprentice McGuyver Considerations 47 32877 17 QF, POC, Rockpile, CSW, GT, PC, CB, CDS, ETD, RH
7 x Neophyte Gumbo Considerations 152 7528 14 NP, TT, FT, TF, WOA, BBB, ACG, BNG, CP, CF
10 x Neophyte McGuyver Considerations 2670 18389 91 SFH, IT, PS, MK, TM, BH, RH, CB, POC, HG

Conspicuously missing from the above, the very blueprint that I really wanted to see, namely BMS10. The BMS would be in the McGuyver line but 10 Neophyte clicks and 10 Apprentice clicks failed to produce even 1 single bp. Rather sadly, the Apprentice McGuyver clicks still produced the useless Caveman Bedroll blueprints and other such shit (Reed Hauberk!) that one has too much of, even just from Neophyte clicks anyway. It's totally unclear why exactly would some blueprints be SO common even though they are not necessarily the cheapest ones: take for instance the ton of LTF (4070 bv) or CT (1404 bv) blueprints obtained (0.1mn of each) and compare that with the 13 total BCT (540 bv) blueprints! If anything, it would be the cheaper blueprints that are harder to get but then again, I got tons of "Somewhat Fashionable Hat" and "Hoof Gloves" useless blueprints from the McGuyver line and those are the cheapest in that line (656 bv and 622 bv respectively).

Assuming that those rare bps aren't simply unobtainable at this moment for whatever reason, the obvious conclusion is that those considerations are rather inconsiderate of the ~200mn base value sunk into them and won't reveal in return even the very basic of *what* blueprints should one expect from where. Then again, it's not *consideration* you want from a good game, is it?


  1. Different players might find different things rewarding but a basic reward would be the rare serious "popping" i.e. obtaining for once millions out of thousands rather than the other way around.  

  2. the ~only way I have to actually get a wider spectrum of bps since clicks with Foxy end up high quality output and 1 type of bp in the usual cases; in the rare case, Foxy can in principle get more bps too but here I wanted to see precisely WHAT bps one gets from where so I suppose it's a gamble on the types more than the overall value itself. 

  3. Each crafting line in Eulora has its own Considerations line. Each line has several levels of which so far there are three seen: neophyte, apprentice and journeyman. 

  4. Quality of the Consideration blueprint used for this click. 

  5. They are ordered based on quantity obtained, from high to low. 

  6. ~=6.8mn base value. 

  7. ~=6.05mn base value 

  8. ~=1mn 

  9. ~=2.83mn base value 

  10. In the missing and becoming ever so rare range there would also be the CFT bp from Tinkering but at least I did not do a Neophyte Tinkering click this time... 

February 28, 2019

ZCX vs SJLJ - Data Set

Filed under: Coding — Diana Coman @ 5:37 p.m.

This all started with a misbehaving Ada program: the one that keeps "killing" its tasks with blind bullets that do precisely and exactly nothing. The horror of having code that refuses to actually abort its own spawned tasks is amply discussed in the logs linked above but I think it's worth pointing out the thick layers of horrid at play: it's not only (as if that wasn't enough by itself) that it won't abort its tasks but also that it does this on the quiet as it were - the abort code is effectively hijacked to do nothing for as long as one uses the default ZCX1. Now why exactly is ZCX the default given that it's not providing what the Ada standard says it should? And what is anyway its rationale2 for chopping away bits of the standard? No idea really, the docs are silent on this: it's apparently enough bother that they admit at least that indeed "the ZCX run-time does not support asynchronous abort of tasks(abort and select-then-abort constructs)" and otherwise throw in there an unsubstantiated statement that "most programs should experience a substantial improvement by being compiled with a ZCX run-time." Why would they bother with rationale or with justifying decisions - nobody ever asks for or indeed reads such things anyway as they are way too complex, aren't they?

Once the culprit was identified as ZCX, a quick test3 showed that the older SJLJ4 is indeed perfectly capable of killing spawned tasks. And so the investigation moved on to the possible cost of using SJLJ vs ZCX. Supposedly ZCX is faster, SJLJ is slower but suppositions are one thing and hard data quite another. To figure out if that's indeed the case, one needs simply to design an experiment for the exact sort of thing investigated and then set to work implementing and running it. Given the docs' claim that SJLJ comes with a signficant penalty for any exception handlers present, the experiment included also runs with and without exception handlers to try and gauge if those add indeed signficantly to the running time. Specifically, there were two main programs used (linked below if you want to reproduce this experiment):
1. Nested procedure calls (with and without exception handlers): there are three procedures A, B, C that increment each a global counter and pseudo-randomly5 call one or both of the other two procedures for as long as the counter is below a certain value. Note that you'll likely need to increase the size of the stack for this experiment since otherwise it will quickly overflow.
2. Nested for loops with Serpent encryption (NO exception handlers): there are between 1 and 22 for loops counting either from 1 to 10 or from 1 to 100 and going to the next level on a subset of values (those divisible by 2, by 3 or by 4 - those are essentially knobs to adjust for different runs); the inner-most loop performs a Serpent encryption.

I ran this experiment on two different computers:

  • S.MG Test server: this is an x86_64 AMD Opteron(tm) Processor 6376 (8 cores, 2.3GHz).
  • A torture-room desktop code-named D1: this is an Intel i5, 4GB RAM (2 cores, 3.2GHz).

The S.MG Test server runs 3 different incarnations of Ave1's GNAT: the ZCX variant which is obtained as a straight compilation using Ave1's scripts from the 15th of July 2018; the SJLJ-compilation of the same (i.e. supporting both dynamic and static linking); the SJLJ compilation of the latest, static-only version from September 2018 .

The D1 machine runs Adacore's GNAT 2016 and simply switches between ZCX and SJLJ via the --RTS parameter given to gprbuild.

The full set of data obtained is this:

S.MG Test server
Ave1's GNAT, build 301 (dynamic linking) and build 202 (static only) for SJLJ, older Ave1's GNAT for ZCX

Procedure Calls testing  with MT as PRNG;
Max = 16777216; 3 procs, mod 3 calls both, one or the other of the 2;
NO serpent calls; X final gives the total number of calls;
Seeding of MT: 16#111#, 16#222#, 16#333#, 16#444#  -> X = 22368144.
NB: as MT is used, X final is repeatable between runs (same seed).

Handlers | ZCX (s)  | SJLJ (s) (build 301) | SJLJ (s) (build 202 - static only)
_______________________________________________________________________________
0        | 1.119    | 1.211                | 1.165
0        | 1.174    | 1.210                | 1.161
1        | 1.299    | 4.106                | 3.850
1        | 1.302    | 3.777                | 3.755
2        | 1.599    | 3.929                | 3.301
3        | 1.862    | 4.125                | 4.193

 Nested Loops testing 
loops 1 to 100; if mod 2
Loops (runs)    | ZCX	(s)    |	Serpent Timing             | SJLJ (s) (b301) | SJLJ (s) (b202 - static ONLY)
_____________________________________________________________________________________________________________________
1 (a)  (1k runs)| 0.000168893  | 0.000168893 / (50^1) = 3.377e-6   |   0.000168199   | 0.000168727
2 (b)  (1k runs)| 0.007213758  | 0.007213758 / (50^2) = 2.8855e-6  |   0.007055130   | 0.007084592
3 (c)  (1k runs)| 0.351611073  | 0.351611073 / (50^3) = 2.81e-6    |   0.352684722   | 0.351471743
4 (d)  (1 run)  | 17.740324000 | 17.740324000 / (50^4) = 2.83e-6   |  17.580107000   | 17.73699000
5 (e)  (1 run)  | 879.95117100 | 879.951171000 / (50^5) = 2.81e-6  | 881.139986000   | 879.7342540

loops 1 to 10; if mod 4
Loops (runs=1)  | ZCX (s)      | Time per Serpent (ZCX)            | SJLJ (s) (b301) | SJLJ (s) (b202 - static ONLY)
_____________________________________________________________________________________________________________________
1               | 0.000008     | 0.000008 / (2^1) = 4e-6           |  0.000010       |  0.000009
2               | 0.000017     | 0.000017 / (2^2) = 4.25e-6        |  0.000016       |  0.000017
3               | 0.000031     | 0.000031 / (2^3) = 3.875e-6       |  0.000030       |  0.000024
4               | 0.000046     | 0.000046 / (2^4) = 2.875e-6       |  0.000057       |  0.000056
5               | 0.000150     | 0.000150 / (2^5) = 4.6875e-6      |  0.000110       |  0.000089
10              | 0.027650     | 0.027650 / (2^10)= 2.7e-5         |  0.002937       |  0.002915
22              | 11.98272     | 11.98272 / (2^22)= 2.8569e-6      | 12.045471       | 12.011244

loops 1 to 10; if mod 3
Loops (runs=1)  | ZCX (s)      | Time per Serpent (ZCX)            | SJLJ (s) (b301) | SJLJ (s) (b202 - static ONLY)
_____________________________________________________________________________________________________________________
1               | 0.000012000  | 0.000012000 / (3^1) = 4e-6        |   0.000013000   |   0.000010000
2               | 0.000034000  | 0.000034000 / (3^2) = 3.778e-6    |   0.000027000   |   0.000033000
3               | 0.000076000  | 0.000076000 / (3^3) = 2.815e-6    |   0.000111000   |   0.000075000
4               | 0.000219000  | 0.000219000 / (3^4) = 2.704e-6    |   0.000221000   |   0.000220000
5               | 0.000654000  | 0.000654000 / (3^5) = 2.691e-6    |   0.000823000   |   0.000673000
10              | 0.167347000  | 0.167347000 / (3^10)= 2.834e-6    |   0.167906000   |   0.167250000
15              | 40.69105200  | 40.69105200 / (3^15)= 2.836e-6    |  40.743956000   |  40.716526000
16              | 121.9171600  | 121.9171600 / (3^16)= 2.832e-6    | 123.760958000   | 121.816518000

D1 computer 
Adacore's GNAT 2016, switching with --RTS between ZCX and SJLJ

Procedure Calls testing  with MT as PRNG;
Max = 16777216; 3 procs, mod 3 calls both, one or the other of the 2;
NO serpent calls; X final gives the total number of calls;
Seeding of MT: 16#111#, 16#222#, 16#333#, 16#444#  -> X = 22368144.
NB: as MT is used, X final is repeatable between runs (same seed).

Handlers | ZCX (s)  | SJLJ (s)
______________________________
0        | 0.896    | 0.882
0        | 0.905    | 0.906
1        | 1.058    | 184.516
1        | 1.064    | 265.339
2        | 1.215    | 372.574
3        | 1.329    | 446.821

 Nested Loops testing 
loops 1 to 100; if mod 2
Loops           | ZCX (s)      | Time per Serpent                  | SJLJ (s)
___________________________________________________________________________________
1 (a)  (1k runs)|   0.000095269|  0.000095269 / (50^1) = 1.905e-6  |   0.000098855
2 (b)  (1k runs)|   0.004609351|  0.004609351 / (50^2) = 1.844e-6  |   0.004598492
3 (c)  (1k runs)|   0.231582664|  0.231582664 / (50^3) = 1.853e-6  |   0.230719036
4 (d)  (1 run)  |  11.548194   | 11.548194 / (50^4)    = 1.848e-6  |  11.579138
5 (e)  (1 run)  | 580.261706   |580.261706 / (50^5)    = 1.857e-6  | 586.996228

loops 1 to 10; if mod 4
Loops (runs=1)  | ZCX (s)      | Time per Serpent                  | SJLJ (s)
_______________________________________________________________________________
1               | 0.000009     | 0.000009 / (2^1) = 4.5e-6         | 0.000013
2               | 0.000023     | 0.000023 / (2^2) = 5.75e-6        | 0.000023
3               | 0.000036     | 0.000036 / (2^3) = 4.5e-6         | 0.000043
4               | 0.000083     | 0.000083 / (2^4) = 5.18e-6        | 0.000100
5               | 0.000167     | 0.000167 / (2^5) = 5.218e-6       | 0.000167
10              | 0.005537     | 0.005537 / (2^10)= 5.4e-6         | 0.007399
22              | 8.007915     | 8.007915 / (2^22)= 1.909e-6       | 7.944913
22              | 7.971711     | 7.971711 / (2^22)= 1.9006e-6      | 7.966864         

loops 1 to 10; if mod 3
Loops (runs=1)  | ZCX (s)      | Time per Serpent                  | SJLJ (s)
_______________________________________________________________________________
1               |  0.000008    |  0.000008 / (3^1)  = 2.6e-6       |  0.000009
2               |  0.000017    |  0.000017 / (3^2)  = 1.8e-6       |  0.000019
3               |  0.000142    |  0.000142 / (3^3)  = 1.75e-6      |  0.000143
4               |  0.000427    |  0.000427 / (3^4)  = 5.27e-6      |  0.000150
5               |  0.001267    |  0.001267 / (3^5)  = 5.21e-6      |  0.001259
10              |  0.107943    |  0.107943 / (3^10) = 1.82e-6      |  0.113296
15              | 27.163182    | 27.163182 / (3^15) = 1.89e-6      | 27.167316
16              | 81.724382    | 81.724382 / (3^16) = 1.89e-6      | 81.951616        

Based on the data above, my conclusion so far is that there is rather precious little to justify the use of ZCX: when no exception handlers are present in the code, the running times are similar under ZCX and SJLJ; when exception handlers are present in the code, there is indeed a penalty for using SJLJ but this penalty is not all that big on some irons. This being said, there is little reason that I can see for having lots of exception handlers in any sane code anyway so I'm really not all that concerned about this honest cost of SJLJ especially when compared with the "you can't kill your own tasks" cost of the so-called ZCX.

Separate from the above but still relevant, there are still quite a few issues remaining with SJLJ including the fact that it's apparently broken on ARM. But every pain in its own time and so for now simply look at the data above and let me know: do you see any real reason why one would *not* simply go with SJLJ?


  1. "Zero-Cost Exceptions" being the full name of this great idea of reducing costs by not doing the work - meaning here by simply choosing to not implement some parts of the Ada standard, namely the asynchronous abort.  

  2. Or its excuse at the very least since rationale might be too much to ask for under those circumstances. 

  3. You can try this yourself: the code is in test_tasks_ada.zip, download, read it, run it and see if the tasks actually abort or not. Try it also with SJLJ (gprbuild --RTS=sjlj) and spot the difference. 

  4. "Setjmp / longjmp" by its name, the older exception handling model that actually implements the Ada standard so it allows asynchronous abort. 

  5. Using the Mersenne-Twister pseudo-random generator

Older Posts »

Theme and content by Diana Coman