October 18, 2018

SMG Comms Chapter 3: Packing Serpent

Filed under: Coding, SMG_Comms — Diana Coman @ 10:12 a.m.

~ This is a work in progress towards an Ada implementation of Eulora's communication protocol. Start with Chapter 1.~

This chapter uses the raw types of the protocol as defined in the previous chapter and adds two methods that are still at layer 0 of the protocol: one method for packing Serpent messages into corresponding Serpent packets and one method for unpacking such Serpent packets to extract their contained Serpent message. It is a single small step forward but the corresponding .vpatch is still quite large as it contains the Serpent code in addition to the packing/unpacking methods. The packing is effectively an encryption with a given Serpent key while the unpacking is the corresponding decryption. Nevertheless, there are a few bits that are specific to this implementation as they reflect the requirements of Eulora's protocol:

  • Packing receives as input a Serpent_Msg and produced as output a Serpent_Pkt. Symmetrically, unpacking receives as input a Serpent_Pkt and produces a Serpent_Msg. Both Serpent_Pkt and Serpent_Msg are arrays of octets but of fixed, pre-defined size: 1472 octets and nothing else.
  • Both packing and unpacking will split their input into blocks of the size that Serpent can handle, encrypt/decrypt them and then glue together the results to produce the output. So there are effectively 921 encrypting / decrypting operations with the same, given Serpent key, for one single pack / unpack call.

I've adapted the Serpent implementation that I previously published as part of EuCrypt, effectively integrating it into SMG Comms and stripping away anything that isn't directly needed by SMG Comms:

  • Serpent is now simply a package like all the others rather than a stand-alone library. While it is true that any changes to the original will have to be manually ported to this one as well, that was always going to be the case anyway. So I don't really see much point in carrying about all the glue and additional files to make a library out of only 2 files. Hence, Serpent in SMG Comms has 2 files and nothing more: serpent.ads and serpent.adb. Short and clear.
  • Since this is production use already, testing parts such as the "Selftest" method don't really have any business in the code itself. I've moved this method where it belongs, namely with the tests for all the code, in the tests directory (test_serpent.ads/.adb)
  • Since Serpent here becomes part of SMG Comms it follows that it should also use the raw types of the protocol - it will anyway be called to use variables of those types rather than anything else. There is no point in forcing back and forth conversions between SMG Comms' "Octets" and Serpent's "Bytes" types that are both arrays of octets anyway. So I've changed the definition of the "Bytes" type in Serpent so that it is here simply a subtype of the "Octets" type. This has the advantage that it allows smooth calls to Serpent from SMG Comms while being a small, easily-reversible change that also maintains otherwise the clarity of Serpent's code as it is. Basically SMG Comms gets to call Serpent without having to do explicit conversions between types that are anyway the same thing and Serpent gets to keep calling arrays of octets Bytes internally as it does in its stand-alone lib version.

In addition to the above, the .vpatch for this chapter also adds tests for the packing/unpacking methods2. I've also made a small change to raw_types.ads so that there is now only one variable for Serpent length: SERPENT_OCTETS. This reflects better the fact that there really is only one length for Serpent and it still allows to keep code clear by having the two array types Serpent_Pkt and Serpent_Msg - they just use the same length. Clarity of code is a tricky choice, what more can I say. Here's the updated code in raw_types:

  -- constants from SMG.COMMS standard specification
    -- size of a serpent-encrypted packet and message, in octets
    -- note that this corresponds to 1472/16 = 92 Serpent blocks
    -- NB: lengths are the same!
  SERPENT_OCTETS : constant Positive := 1472;

    -- size of a RSA-encrypted packet and message in octets and bits
  RSA_PKT_OCTETS     : constant Positive := 1470;
  RSA_MSG_OCTETS     : constant Positive := 234;
  RSA_MSG_BITS       : constant Positive := RSA_MSG_OCTETS * 8; --1872

  -- raw, low-level types
  -- all messages and packets are simply arrays of octets at low level/raw
  type Octets is array( Natural range <> ) of Interfaces.Unsigned_8;

  -- raw representations of basic types (with fixed, well-defined sizes)
  subtype Octets_1 is Octets( 1 .. 1 );
  subtype Octets_2 is Octets( 1 .. 2 );
  subtype Octets_4 is Octets( 1 .. 4 );
  subtype Octets_8 is Octets( 1 .. 8 );

  -- RSA packets and contained raw messages
  subtype RSA_Pkt is Octets( 1 .. RSA_PKT_OCTETS );
  subtype RSA_Msg is Octets( 1 .. RSA_MSG_OCTETS );

  -- Serpent packets and contained raw messages
  -- NB: length is the same but the distinction makes the code clearer
  subtype Serpent_Pkt is Octets( 1 .. SERPENT_OCTETS );
  subtype Serpent_Msg is Octets( 1 .. SERPENT_OCTETS );

And the new code in packing.ads:

  -- Packing/unpacking for Eulora's communication protocol:
  -- Serpent Message to/from Serpent Packet
  -- RSA Message to/from RSA Packet
  -- S.MG, 2018

with Raw_Types;
with Serpent;

package Packing is
  -- no side effects or internal state
  Pragma Pure(Packing);

  -- Packing a Serpent message into Serpent package, using the given key
  function Pack( Msg : in Raw_Types.Serpent_Msg;
                    K   : in Serpent.Key )
                  return Raw_Types.Serpent_Pkt;

  -- Unpacking a Serpent packet into contained message, using the given key
  function Unpack( Pkt : in Raw_Types.Serpent_Pkt;
                    K   : in Serpent.Key)
                  return Raw_Types.Serpent_Msg;

  -- internals of this package, NOT for outside use
  -- length of 1 Serpent block
  Block_Len: constant Natural := Serpent.Block'Length;

  -- number of Serpent blocks in one single Serpent message/packet
  S_Blocks : constant Natural := Raw_Types.SERPENT_OCTETS / Block_Len;

end Packing;

The new code in packing.adb:

  -- Packing/unpacking for Eulora's communication protocol:
  -- Serpent Message to/from Serpent Packet
  -- RSA Message to/from RSA Packet
  -- S.MG, 2018

package body Packing is

  -- Packing a Serpent message into Serpent package, using the given key
  function Pack( Msg : in Raw_Types.Serpent_Msg;
                 K   : in Serpent.Key )
               return Raw_Types.Serpent_Pkt is

    -- single Serpent blocks containing plain / encrypted data
    Plain    : Serpent.Block;
    Encr     : Serpent.Block;

    -- Serpent Key Schedule - needed for direct encr/decr calls
    KS       : Serpent.Key_Schedule;

    -- final resulting Serpent package
    Pkt      : Raw_Types.Serpent_Pkt := (others => 0);
    -- prepare the Serpent key schedule based on given key
    Serpent.Prepare_Key( K, KS );

    -- encrypt message block by block and copy result in packet
    for I in 1 .. S_Blocks loop
      -- get current block to encrypt
      Plain := Msg( Msg'First + (I-1) * Block_Len ..
                    Msg'First +  I    * Block_Len - 1 );
      -- encrypt with Serpent
      Serpent.Encrypt( KS, Plain, Encr );
      -- copy result to output packet
      Pkt( Pkt'First + (I-1) * Block_Len ..
           Pkt'First +  I    * Block_Len - 1 )
         := Encr;
    end loop;

    -- return result
    return Pkt;
  end Pack;

  -- Unpacking a Serpent packet into contained message, using the given key
  function Unpack( Pkt : in Raw_Types.Serpent_Pkt;
                   K   : in Serpent.Key)
                 return Raw_Types.Serpent_Msg is
    -- single Serpent blocks containing plain / encrypted data
    Plain    : Serpent.Block;
    Encr     : Serpent.Block;

    -- Serpent Key Schedule - needed for direct encr/decr calls
    KS       : Serpent.Key_Schedule;

    -- the message extracted from the given packet
    Msg : Raw_Types.Serpent_Msg := (others => 0);
    -- prepare the Serpent key for use
    Serpent.Prepare_Key( K, KS );

    -- decrypt the Serpent packet block by block
    for I in 1 .. S_Blocks loop
      -- get current block from input and decrypt
      Encr := Pkt( Pkt'First + (I-1) * Block_Len ..
                   Pkt'First +  I    * Block_Len - 1 );
      Serpent.Decrypt( KS, Encr, Plain );

      -- copy result to its correct position in final output
      Msg( Msg'First + (I-1) * Block_Len ..
           Msg'First +  I    * Block_Len - 1 )
         := Plain;
    end loop;

    -- return the result - the message content of given package
    return Msg;
  end Unpack;

end Packing;

The .vpatch and my signature for it are as usual on my Reference Code Shelf as well as linked here for your convenience:

  1. 1472 / 16 = 92 

  2. And needed they were too for they actually caught an error that had survived somehow several re-readings of the code to the point that I was totally surprised when the tests first...failed. Never underestimate your own capacity of introducing idiotic errors in the simplest of things! 

October 16, 2018

SMG Comms Chapter 2: Raw Types

Filed under: Coding, SMG_Comms — Diana Coman @ 11:20 a.m.

~ This is a work in progress towards an Ada implementation of Eulora's communication protocol. Start with Chapter 1.~

My previous Chapter 1 with its assorted warts and lamentations sparked a rather productive discussion in the forum that ended up with a significant revision of the protocol specification. The main change has to do with packet sizes and types: ALL packets will be either 1470 octets long and RSA encrypted OR 1472 octets long and Serpent encrypted. Those sizes are chosen to ensure that there is no fragmenting of UDP frames sent on the wire since such fragmenting increases significantly the chance of packet loss. A test with those UDP packet sizes sent back and forth between UK and UY did not uncover any trouble - if anything, the results were actually better than expected. In any case, increased chances of reliable communications aside, the added benefit of those fixed, unique sizes is a clearer protocol alltogether. So after a bit of digesting the new specification, I can finally say that I have some sort of plan for its implementation rather than grasping at it from any thread that I could perceive at all (as it was pretty much the case in Chapter 1). Specifically, my current implementation plan is made of 3 layers that build on one another and give me a structure to rely on:

  1. Layer 0 - Raw types
  2. Layer 1 - Sender/Receiver
  3. Layer 2 - Data structures

1. Layer 0 - Raw types

The raw types for Eulora's communication protocol are essentially two: octet (the basic unit, effectively 8 bits) and arrays of octets (various fixed sizes matching the basic types in the protocol's specification). However, for the arrays of octets, there is a useful distinction to be made between packets (i.e. the received array of octets of fixed length, either 1470 or 1472 octets) and messages (i.e. the content of a packet). Essentially messages are obtained from packets in Eulora's communication protocol through decryption using either Serpent (for 1472 octets long packets) or RSA (for 1470 octets long packet). Symmetrically, packets are obtained by encrypting messages with either Serpent (for 1472 octets long messages) or RSA (for 234 octets long messages). Consequently, this layer 0 offers the definitions of types that cover the full set of possible messages and packets together with the relevant conversion methods between packets and messages as well as between raw octets and int/unsigned/float data types. Note that a "message" here is simply the decrypted content of a packet, nothing more. The interpretation of the octets in a message and/or their use to fill some data structure that the protocol defines happens at a higher layer, not here.

2. Layer 1 - Sender/Receiver

Layer 1 is concerned with sending and receiving packets using the types and conversion methods provided by layer 0 together with the UDP lib for actual network communication. So far this seems to me at the moment a very thin layer really since it's little more than a basic receiver and sender: the receiver simply uses UDP lib to endlessly read all incoming packets and store them in a queue for later processing by consumer(s) external to this layer; the sender may have perhaps an input queue from which it keeps reading packets together with their intended destination and then sends them along using the UDP lib. This layer comes right after the raw types because it needs and uses those raw types but at the moment it's unclear to me whether it belongs as part of a protocol implementation since I can see the case for it being left to the user application of the protocol. At any rate, the protocol specification is not really concerned with this part and this description of the sender/receiver is not a given - it may change as required by each application.

4. Layer 2 - Data Structures

This layer is on top of Layer 0 - it provides the data structures for the different types of messages in Eulora's communication protocol. While any message in Eulora's protocol is just an array of octets at layer 0, the information contained in any message has to be structured according to the protocol's rules described in sections 4, 5 and 7 of the specification. This layer provides the definitions of those data structures as well as the conversion methods from messages to data structures and back.

Note that those 3 layers do not include protocol mechanics since those are really part of the server or client applications. Perhaps a demo / test application implementing the specified protocol mechanics would be useful to include as part of this implementation but I'd rather not specify it as such at this point. For now I'm fine with those 3 layers but if you see any holes in them or problems with them, let me know in the comments below!


To start it all off, here's the first part of layer 0 - raw types, including the definitions of types and the most basic conversions via Ada.Unchecked_Conversion:

 -- raw types for the communication protocol
 -- these are used throughout at the lowest level of the protocol
 -- essentially they are the units of packets and of messages
 -- SMG.Comms has only 2 types of packets: RSA and Serpent
 -- a message is the decrypted content of a packet
 -- S.MG, 2018

with Interfaces; use Interfaces; -- Unsigned_n and Integer_n
with Ada.Unchecked_Conversion;

package Raw_Types is

  -- constants from SMG.COMMS standard specification
    -- size of a serpent-encrypted packet and message, in octets
    -- note that this corresponds to 1472/16 = 92 Serpent blocks
    -- NB: lengths are the same but the distinction makes the code clearer
  SERPENT_PKT_OCTETS : constant Positive := 1472;

    -- size of a RSA-encrypted packet and message in octets and bits
  RSA_PKT_OCTETS     : constant Positive := 1470;
  RSA_MSG_OCTETS     : constant Positive := 234;
  RSA_MSG_BITS       : constant Positive := RSA_MSG_OCTETS * 8; --1872

  -- raw, low-level types
  -- all messages and packets are simply arrays of octets at low level/raw
  type Octets is array( Natural range <> ) of Interfaces.Unsigned_8;

  -- raw representations of basic types (with fixed, well-defined sizes)
  subtype Octets_1 is Octets( 1 .. 1 );
  subtype Octets_2 is Octets( 1 .. 2 );
  subtype Octets_4 is Octets( 1 .. 4 );
  subtype Octets_8 is Octets( 1 .. 8 );

  -- RSA packets and contained raw messages
  subtype RSA_Pkt is Octets( 1 .. RSA_PKT_OCTETS );
  subtype RSA_Msg is Octets( 1 .. RSA_MSG_OCTETS );

  -- Serpent packets and contained raw messages
  -- NB: length is the same but the distinction makes the code clearer
  subtype Serpent_Pkt is Octets( 1 .. SERPENT_PKT_OCTETS );
  subtype Serpent_Msg is Octets( 1 .. SERPENT_MSG_OCTETS );

  -- blind, unchecked casts ( memcpy style )
  function Cast is new Ada.Unchecked_Conversion( Integer_8  , Octets_1 );
  function Cast is new Ada.Unchecked_Conversion( Octets_1   , Integer_8 );
  function Cast is new Ada.Unchecked_Conversion( Unsigned_8 , Octets_1 );
  function Cast is new Ada.Unchecked_Conversion( Octets_1   , Unsigned_8 );

  function Cast is new Ada.Unchecked_Conversion( Integer_16 , Octets_2 );
  function Cast is new Ada.Unchecked_Conversion( Octets_2   , Integer_16 );
  function Cast is new Ada.Unchecked_Conversion( Unsigned_16, Octets_2 );
  function Cast is new Ada.Unchecked_Conversion( Octets_2   , Unsigned_16 );

  function Cast is new Ada.Unchecked_Conversion( Integer_32 , Octets_4 );
  function Cast is new Ada.Unchecked_Conversion( Octets_4   , Integer_32 );
  function Cast is new Ada.Unchecked_Conversion( Unsigned_32, Octets_4 );
  function Cast is new Ada.Unchecked_Conversion( Octets_4   , Unsigned_32 );

  -- Gnat's Float has 32 bits but this might be different with other compilers
  function Cast is new Ada.Unchecked_Conversion( Float, Octets_4 );
  function Cast is new Ada.Unchecked_Conversion( Octets_4, Float );

  function Cast is new Ada.Unchecked_Conversion( Integer_64, Octets_8 );
  function Cast is new Ada.Unchecked_Conversion( Octets_8, Integer_64 );
  function Cast is new Ada.Unchecked_Conversion( Unsigned_64, Octets_8 );
  function Cast is new Ada.Unchecked_Conversion( Octets_8, Unsigned_64 );

end Raw_Types;

I packed the above code as a .vpatch on top of the previous genesis .vpatch mainly for keeping my original promise of showing this being built as it is with detours and corrections on the way. Otherwise this could equally well be a genesis in itself given how radically it changes pretty much everything in there. Nevertheless, at least for now - meaning until a final version is achieved and a regrind makes perhaps sense - I'll keep growing the smg_comms tree. I'll link all smg_comms .vpatches and my signatures for them from the Code Shelf page as usual, while linking the current ones here as well:

The next chapter will add to the raw types layer, providing conversions from packets to messages and back. That will bring in parts of EuCrypt of course, namely the Serpent and RSA implementations. For this reason, I aim to do this perhaps in two steps (first Serpent and only then RSA) in order to keep patches as clear and easy to follow as possible. And this is of course a very good opportunity to re-read the EuCrypt implementation, to re-write it on one level or another as I bring it over to SMG_Comms and to see first-hand how it is to use it!

October 13, 2018

Results of Testing UDP - Take 2

Filed under: Coding, UDP — Diana Coman @ 8:32 p.m.

As the first test revealed that most UDP packets make it safely at least when sent 1 second apart, I decided that the next week-long test should run with a substantially smaller delay: 0.01 seconds between each 2 packets. Moreover, I've changed the sender to simply send 10000 packets at each run, alternating between 1472 and 1470 octets in size. The focus on those 2 sizes reflects the fact that ALL packets are either 14721 or 14702 octets long in the newest version of Eulora's communication protocol. Here's the summary of the data I collected:

Sent: 21400003 2140000
Received: 2134569 (99.74%) 2139788 (99.99%)
Errors received: 0 0
1470-size sent: 1070000 1070000
1470-size received: 1067322 (99.75%) 1069901 (99.99%)
1472-size sent: 1070000 1070000
1472-size received: 1067247 (99.74%) 1069887 (99.99%)

From the above, it would seem that even 0.01 delay between any two packets being sent is not causing any problem at all - at least not for the sizes considered4. If anything, the % of received packets increased slightly compared to the previous test. Of the packets that made it to the other side, none were mangled, hence no errors received5. Unsurprising perhaps, the 2 octets difference in size between the 2 types of packets does not seem to have the slightest impact on % of packets that make it to the other side - the route itself is likely to be more important in this respect.

The timings remain iffier to investigate as I did not change the tester specifically for this. Applying the same correction as in the previous test (i.e. calculating first the difference between the logged local times and then simply adjusting so that the minimum difference is 0) yields again the same stats on both sides (if slightly different values than during the last test):

Data Min 1st Q Median Mean 3rd Q Max
UK -> UY 0 7 13 13.09 20 27
UY -> UK 0 7 13 13.19 20 27

As before, I would not suggest taking the above timings for anything really. The only thing that can perhaps be said is that packets don't seem to take all that much to arrive at their destination at least on those routes. This is of course no guarantee at all, just a limited observation of this particular data set.

  1. Serpent encrypted packets that are meant for most communications in Eulora. 

  2. RSA encrypted packets that are meant only for new accounts in Eulora. 

  3. I've ran this for almost 9 days so more than the previous 1 week for the first test. 

  4. Arguably larger sizes that would get fragmented on the way might have higher chances of becoming lost. 

  5. There was however one spam packet received by the node in UY from an entirely unrelated IP but I removed that before making the summary above since it has nothing to do with the test itself. Also, the fact that one will receive spam is not exactly news otherwise so there isn't much to say about it at this moment. 

October 10, 2018

EuCrypt Chapter 14: CRC32 Implementation with Lookup Table

Filed under: Coding, EuCrypt — Diana Coman @ 1:51 p.m.

~ This is part of the EuCrypt series. Start with Introducing EuCrypt. ~

The communication protocol for Eulora uses CRC32 as checksum for its packages and so I found myself looking in disbelief at the fact that GNAT imposes the use of Streams for something as simple as calculating the checksum of anything at all, no matter how small no matter what use, no matter what you might need or want or indeed find useful, no matter what. No matter! As usual, the forum quickly pointed me in the right direction - thank you ave1! - namely looking under the hood of course, in this case GNAT's own hood, the Systems.CRC32 package. Still, this package makes a whole dance of eating one single character at a time since it is written precisely to support the stream monstrosity on top rather than to support the user with what they might indeed need. Happily though, CRC32 is a very simple thing and absolutely easy to lift and package into 52 lines of code in the .adb file + 130 in the .ads file so 182 all in total1, comments and two types of input (string or raw array of octets) included. And since a CRC32 implementation is anyway likely to be useful outside of Eulora's communication protocol too, I'm adding it as a .vpatch on the EuCrypt tree where it seems to fit best at the moment. It's a lib on its own as "CRC32" when compiled via its own crc32.gpr or otherwise part of the aggregate EuCrypt project if you'd rather compile all of EuCrypt via its eucrypt.gpr.

My CRC32 lib uses the 0x04C11DB7 generator polynomial and a lookup table with pre-calculated values for faster execution. As Stanislav points out, implementing the actual division and living with the slow-down incurred is not a huge problem but at least for now I did not bother with it. The CRC32 lib provides as output 32 bits of checksum for either a string or an array of octets. At the moment at least I really don't see the need for anything more complicated than this - even the string-input method is more for the convenience of other/later uses than anything else. For my own current work on Eulora's protocol I expect to use CRC32 on arrays of octets directly.

The .vpatch adding CRC32 to EuCrypt and my signature for it are as usual on my Reference Code Shelf with additional links here for your convenience:

  1. Specifically: 52 lines is the count of crc32.adb that does the work. The .ads file brings in another 130 lines that are mostly the lookup table with pre-calculated values. The .gpr file has another 61 lines and the restrict.adc another 80 lines. 

October 4, 2018

Results of Testing UDP - Take 1

Filed under: Coding, UDP — Diana Coman @ 6:09 p.m.

For one full week, between 26 September 2018 and 3 October 2018, my UDP Tester ran on 2 computers, one in the UK and one in Uruguay (UY), sending and receiving UDP messages in both directions. On each side, the receiver ran continuously, logging all UDP messages that it received during the whole interval. By contrast, the sender ran on both sides hourly but at different times so that the communications did not overlap. I don’t expect it would have been any trouble even if they did overlap but this was meant to be a test of UDP under best conditions and for this reason I set the times so that the sender on one end always finished a full run before the one on the other end started. For the same reason, the messages were sent at a rate of at most 1 message per second1.

At each run, the sender sent exactly 2043 UDP messages with lengths between 6 and 2048, each message having a different length. The order of messages was pseudo-random, relying on the Mersenne Twister prng using as seed the local time at the start of the run (in unix format). The sender kept a log of all messages it sent, including destination IP and port, seed used for MT, local time when message was sent and size of message. The receiver also logged basic information about each message: source IP and port, local time and message size as observed by receiver as well as those contained in the message’s own header, number of observed incorrect bits in the message’s payload as well as the expected and actual values of incorrect octets.

A first look at the week-long data yielded a bit of a surprise in that the UY receiver had actually received *more* messages than were sent from the UK! At a closer look, it turned out that 4933 UDP messages arriving at the UY node were actually sent by its own local switch! And moreover, they were all, without exception, recorded as corrupted since neither size nor payload matched the expected values2. At the moment those switch-generated messages are a bit of a mystery - it’s unclear what they are exactly or why and how they appeared. Working hypotheses would be that they are either local dhcp messages (although the port number would be a weird choice for those) or stray frags of bigger UDP messages. My one single attempt to replicate this behaviour while simultaneously capturing everything with tcpdump has so far failed - there were no such unexpected messages at all over several hours of UK sender at work. I might perhaps try again at a later date after I’m done with the more pressing tests that I need for SMG comms or simply process the existing error log and reconstitute the already observed weird messages from there. Anyway, for now I put those anomalous messages to the side and focus instead on the rest of the messages (which were sent as expected either by the node in the UK or by the one in UY). Here’s a summary of the data thus cleaned:

UK node UY node
Total sent: 3459283 3452674
Total received: 3447175 3451836
% received: 99.84%7 99.78%8
Errors received9: 0 0

Arguably the lost messages are of most interest in all the above: can one say perhaps that the largest messages10 get lost more often? Not really or at least not based on this little set of data. Compare the summary stats for three groups of messages: all messages sent from the UK (reflecting as expected the sizes sent and the fact that the same number of messages of each size are sent), all messages lost at UY (i.e. did not make it on the way from UK to UY) and all messages lost at UK (i.e. did not make it on the way from UY to UK):

Data Min 1st Q Median Mean 3rd Q Max
All sent from UK 6 516 1027 1027 1538 2048
Lost at UY (UK->UY) 13 513 1049 1051 1602 2045
Lost at UK (UY->UK) 16 553.5 1072 1061 1576 2047

While the data set of lost messages is quite small (550 messages lost at UK and 745 at UY), note that this is mainly due to the fact that there are relatively few losses overall: less than 0.4% of messages sent got lost on the way. So it would seem that at least under the conditions and on the routes considered11, UDP is not all that unreliable anyway. In any case, those summaries above seem to me remarkably close to one another - meaning that there isn’t any visible evidence that some sizes would get lost more than others, at least not for the set of sizes considered. Arguably sizes of up to 2048 octets of message are quite fine for communications over UDP - or at any rate, just as fine as smaller sizes.

In terms of order of received messages, the UY node received ALL messages precisely in the order in which they were sent but the UK node reported 66 messages in total that arrived out of order. Although this is a tiny number, it is perhaps reasonable to assume that it might increase in worse conditions (e.g. significantly less than 1 second between sending messages).

The actual timings are a bit iffier to investigate since the precision of UDP Tester turns out to be less than what would be needed for such task. Moreover, there is something weird going on with the way I recorded the time because the difference between the two nodes should be of ~34 seconds (UY node local time = UK node local time + 34) but this doesn't quite square with all the data especially at the UY receiver end12. On the more positive side though, at least the measurement bias there is constant for all the data and it doesn't introduce any weird effects so I can still attempt to infer something considering also that observed behaviour suggests that most UDP messages really make it to the other end within 1 second. Consequently, I calculated the delta on both sides as TR - TS at first and then I added (on UK side) respectively subtracted (on UY side) the quantity needed to make the lowest delta 0. So at the UY receiver, delta = TR - TS - 11 while at the UK receiver, delta = TR - TS + 32. With this correction, the summary stats for the delta on both sides turn out to be remarkably similar:

Data Min 1st Q Median Mean 3rd Q Max
Deltas at UK node: 0 5 11 10.60 16 21
Deltas at UY node: 0 5 11 10.62 16 21

Note that I do *not* recommend taking the above delta values for anything really, as the tester's precision in recording time is just not enough for this.

You are of course warmly invited to run your own tests and to play with this dataset in any way you find fit. So here's the data from both nodes, including the additional 4933 messages that the UY node received from its own switch:

  • udp_test_take1_data.zip (~10MB)
  • SHA512SUM: 963b8a1467630eea35532122ab7c2d25cb8741001808841f7cf02b34abb6ad5300adcb1d667dd902b4278dd2b373dc46427b0b0bbc918ee52f326456535a4114 udp_test_take1_data.zip

Have fun!

  1. Specifically: the sender had a delay of 1 second between any two consecutive messages. 

  2. The UDP tester simply fills the message up to any length with values calculated as Pos mod 256 where Pos is the position of the respective octet in the full message. 

  3. This is precisely 345928/2043=169 runs. 

  4. This includes a partial 170th run since I stopped the whole test while the UY sender was running already its 170th run. 

  5. 550 messages lost in total. 

  6. 745 messages lost in total. 

  7. 344717 / 345267 * 100 

  8. 345183 / 345928 * 100 

  9. This refers to messages received but with payloads that don’t match the expected values. 

  10. Note that this test capped the messages at 2048 so “largest” here means strictly < 2049 octets. 

  11. The UK node is a “consumer” node i.e. behind a router and on a residential connection; the UY node is S.MG’s test server with Pizarro. 

  12. Considering TR as TimeReceived and TS as TimeSent, at UY receiver the delta should be calculated as TR - (TS + 34) = TR - TS - 34; however, there are entries with TR-TS as low as 11 so basically it would seem that messages arrived before they were even sent. 

September 27, 2018

Tester for UDP Communications

Filed under: Coding, UDP — Diana Coman @ 8:33 p.m.

This code builds on Stanislav Datskovskiy's minimal UDP lib to provide a convenient way of gathering data to evaluate UDP communications between any desired two nodes. The initial specification was to provide a reliable way to "send a soup of all packets lengths from 1 to 65536 bytes each hour back and forth" but this was refined further down the line1 to reduce the maximum length of the payload to 2048 and to include a Mersenne-Twister (MT) pseudorandom number generator (prng) for scrambling the various message sizes. To accomplish that, there are two main parts that I'm adding to the original UDP library:

  1. The MT lib that is simply an Ada implementation of the MT prng algorithm.
  2. The UDP_Tester package that uses a slightly adapted UDP lib and the MT lib above to provide a UDP sender and a corresponding UDP receiver that implement the testing specification and log the relevant data.

MT lib
The MT lib is a standalone Ada library implementing the well-known Mersenne Twister algorithm for pseudorandom number generation2. I am not aware of any other Ada implementation of MT and since I really don't want to add C code to my plate unless I absolutely have to, I simply ported to Ada the reference C implementation provided by the original authors of MT. As such, there isn't much to discuss about the implementation itself - I'll point however the changes I made with respect to the original implementation, namely:

  • In my Ada version of MT, there is no default seeding of the MT generator. The C version allows the caller to ask the generator for numbers without having seeded it - in this case, the C implementation seeds itself with a magic default value that is hardcoded at the checking spot and then just proceeds as if nothing was wrong. I find this approach abhorent because it effectively hides an error (calling the generator without seeding it first) rather than complaining about it and forcing its correction. Consequently, my MT lib will simply abort if the generator is asked for numbers without having been seeded first.3
  • Because Ada is most pointedly not a sort of C, there was no need for the various types of hacks added to the C version to ensure that it still worked correctly on 64-bit machines and not only on the 32-bit machines for which the code was initially written. In Ada it's enough to specify one's types correctly to be exactly and guaranteed 32-bits and then proceed to work with them as such, regardless of whether the processor running the code has 32-bit or 64-bit or x-bit registers.
  • My MT lib provides just one type of pseudorandom numbers, namely unsigned integers on 32 bits. The original C implementation had several wrappers around this main function to map the 32 bits pseudorandom numbers to the interval (0,1) for instance and similar. All the mappings were trivial and they can be easily done by any caller that requires them - precisely in the way they require them. At this stage at least I don't quite see the need for those as part of MT lib and so I left them out - there is only the core offering of 32 pseudorandom bits as it were and the caller can then use or interpret them as they see fit.
  • To check that my implementation results in precisely the same sequence of numbers as the original C MT, I added an automated test (still in Ada) that uses the reference seed and checks the output of MT lib (number by number) against the reference output from the original C MT, reporting any mismatches.

The code of MT lib is self-contained with declarations in mt.ads:

 -- Ada implementation of the Mersenne Twister Pseudo-Random number generator
 -- S.MG, 2018

with Interfaces; use Interfaces;

package MT is
  -- Interfaces.Unsigned_32 in GNAT is mod 2**32 and has bitwise shifts defined
  subtype U32 is Interfaces.Unsigned_32;

  -- period parameters
  N          : constant := 624;
  M          : constant := 397;
  MATRIX_MASK: constant U32 := 16#9908_b0df#;
  UPPER_MASK : constant U32 := 16#8000_0000#;
  LOWER_MASK : constant U32 := 16#7fff_ffff#;

  -- array type for storing the state vector of the generator
  type State_Type is Array( 0 .. N-1 ) of U32;

  -- array type for initialization by array - change key len here if needed
  KEY_LEN    : constant := 4;
  type Init_Array_Type is Array( 0 .. KEY_LEN - 1 ) of U32;

  -- exception raised by a call to generator before initializing it
  No_Init_Exception : exception;

  -- initialize the generator with a seed (number)
  procedure Init_Genrand(Seed : in U32);

  -- initialize the generator with array of 8-octets elements
  procedure Init_Genrand(Seed : in Init_Array_Type);

  -- generate the next pseudo-random 32 bits number in the sequence
  function Gen_U32 return U32;

  -- for testing
  function Get_State return State_Type;

  -- internals of the generator, NOT for direct access
  -- actual state of the generator
  State      : State_Type;

  -- flag for generator routine
  Mti_Flag : U32 := N + 1;  -- default value -> state(N) is not initialised

end MT;

And implementation in mt.adb:

 -- Ada implementation of the Mersenne Twister Pseudo-Random number generator
 -- S.MG, 2018

package body MT is

  procedure Init_Genrand(Seed : in U32) is
    State(0) := Seed;
    for I in State'First + 1 .. State'Last loop
      State(I) := U32(1812433253) *
                  ( State(I - 1) xor
                    ( Shift_Right(State(I - 1), 30) )
                  ) + U32(I) ;
    end loop;
    Mti_Flag := N;
  end Init_Genrand;

  procedure Init_Genrand(Seed : in Init_Array_Type) is
    Default_Seed: constant U32 := U32(19650218); -- magic value!
    I, J, K : Integer;
    I := 1;
    J := 0;
    if N > Seed'Length then
      K := N;
      K := Seed'Length;
    end if;

    while K > 0 loop
      State(I) := (State(I) xor
                  ( (State(I-1) xor
                     Shift_Right(State(I-1), 30)
                    ) * U32(1664525)
                  )) + Seed(J) + U32(J);
      I := I + 1;
      J := J + 1;
      if I >= N then
        State(0) := State(N-1);
        I := 1;
      end if;
      if J >= Seed'Length then
        J := 0;
      end if;
      K := K - 1;
    end loop;

    K := N -1;
    while K > 0 loop
      State(I) := (State(I) xor
                  ( (State(I-1) xor
                     Shift_Right(State(I-1), 30)
                    ) * U32(1566083941)
                  )) - U32(I);
      I := I + 1;
      if I >= N then
        State(0) := State(N-1);
        I := 1;
      end if;
      K := K - 1;
    end loop;
    State(0) := 16#8000_0000#; -- MSB is 1 to ensure non-zero initial state
  end Init_Genrand;

  function Gen_U32 return U32 is
    Y     : U32;
    MASK1 : constant U32 := U32(1);
    Mag01 : Array ( 0 .. 1 ) of U32;
    -- Mag01[x] is x * Matrix_A of the algorithm for x 0 or 1
    Mag01(0) := U32(0);
    Mag01(1) := MATRIX_MASK;

    -- if no numbers available, generate another set of N words
    if Mti_Flag >= N then

      -- check it's not a non-initialised generator
      if Mti_Flag = (N + 1) then
         -- Generator was NOT initialised!
         -- Original C code initialises with default seed 5489
         -- This code will simply raise exception and abort
         raise No_Init_Exception;
      end if;

      for K in 0 .. N - M - 1 loop
        Y := ( State(K)   and UPPER_MASK ) or
             ( State(K+1) and LOWER_MASK );
        State(K) := State(K+M) xor
                      Shift_Right(Y, 1) xor
                        Mag01(Integer(Y and MASK1));
      end loop;
      for K in N-M .. N - 2 loop
        Y := ( State(K)   and UPPER_MASK  ) or
             ( State(K+1) and LOWER_MASK);
        State(K) := State(K + M - N) xor
                      Shift_Right(Y, 1) xor
                        Mag01(Integer(Y and MASK1));
      end loop;
      Y := (State(N-1) and UPPER_MASK ) or
             (State(0) and LOWER_MASK );
      State(N - 1) := State(M-1) xor
                        Shift_Right(Y, 1) xor
                          Mag01(Integer(Y and MASK1));
      Mti_Flag := 0;
    end if;

    -- retrieve next available number
    Y        := State(Integer(Mti_Flag));
    Mti_Flag := Mti_Flag + 1;

    -- tempering
    Y := Y xor Shift_Right(Y, 11);
    Y := Y xor (Shift_Left(Y, 7) and 16#9d2c_5680#);
    Y := Y xor (Shift_Left(Y, 15) and 16#efc6_0000#);
    Y := Y xor Shift_Right(Y, 18);

    -- return tempered number
    return Y;
  end Gen_U32;

  function Get_State return State_Type is
    return State;
  end Get_State;

end MT;

The test for the above MT lib is in its own testmt/test_mt.adb, including the reference output (that really is the largest part of the whole file):

  --S.MG, 2018

with Ada.Text_IO; use Ada.Text_IO;
with Interfaces; use Interfaces;
with MT;

procedure Tests_MT is
  Seeds : MT.Init_Array_Type;
  X     : MT.U32;
  No    : constant Integer := 1000;
  Result: Array(0..No-1) of MT.U32 := (
    1067595299,  955945823,  477289528, 4107218783, 4228976476,
    3344332714, 3355579695,  227628506,  810200273, 2591290167,
    2560260675, 3242736208,  646746669, 1479517882, 4245472273,
    1143372638, 3863670494, 3221021970, 1773610557, 1138697238,
    1421897700, 1269916527, 2859934041, 1764463362, 3874892047,
    3965319921,   72549643, 2383988930, 2600218693, 3237492380,
    2792901476,  725331109,  605841842,  271258942,  715137098,
    3297999536, 1322965544, 4229579109, 1395091102, 3735697720,
    2101727825, 3730287744, 2950434330, 1661921839, 2895579582,
    2370511479, 1004092106, 2247096681, 2111242379, 3237345263,
    4082424759,  219785033, 2454039889, 3709582971,  835606218,
    2411949883, 2735205030,  756421180, 2175209704, 1873865952,
    2762534237, 4161807854, 3351099340,  181129879, 3269891896,
     776029799, 2218161979, 3001745796, 1866825872, 2133627728,
      34862734, 1191934573, 3102311354, 2916517763, 1012402762,
    2184831317, 4257399449, 2899497138, 3818095062, 3030756734,
    1282161629,  420003642, 2326421477, 2741455717, 1278020671,
    3744179621,  271777016, 2626330018, 2560563991, 3055977700,
    4233527566, 1228397661, 3595579322, 1077915006, 2395931898,
    1851927286, 3013683506, 1999971931, 3006888962, 1049781534,
    1488758959, 3491776230,  104418065, 2448267297, 3075614115,
    3872332600,  891912190, 3936547759, 2269180963, 2633455084,
    1047636807, 2604612377, 2709305729, 1952216715,  207593580,
    2849898034,  670771757, 2210471108,  467711165,  263046873,
    3569667915, 1042291111, 3863517079, 1464270005, 2758321352,
    3790799816, 2301278724, 3106281430,    7974801, 2792461636,
     555991332,  621766759, 1322453093,  853629228,  686962251,
    1455120532,  957753161, 1802033300, 1021534190, 3486047311,
    1902128914, 3701138056, 4176424663, 1795608698,  560858864,
    3737752754, 3141170998, 1553553385, 3367807274,  711546358,
    2475125503,  262969859,  251416325, 2980076994, 1806565895,
     969527843, 3529327173, 2736343040, 2987196734, 1649016367,
    2206175811, 3048174801, 3662503553, 3138851612, 2660143804,
    1663017612, 1816683231,  411916003, 3887461314, 2347044079,
    1015311755, 1203592432, 2170947766, 2569420716,  813872093,
    1105387678, 1431142475,  220570551, 4243632715, 4179591855,
    2607469131, 3090613241,  282341803, 1734241730, 1391822177,
    1001254810,  827927915, 1886687171, 3935097347, 2631788714,
    3905163266,  110554195, 2447955646, 3717202975, 3304793075,
    3739614479, 3059127468,  953919171, 2590123714, 1132511021,
    3795593679, 2788030429,  982155079, 3472349556,  859942552,
    2681007391, 2299624053,  647443547,  233600422,  608168955,
    3689327453, 1849778220, 1608438222, 3968158357, 2692977776,
    2851872572,  246750393, 3582818628, 3329652309, 4036366910,
    1012970930,  950780808, 3959768744, 2538550045,  191422718,
    2658142375, 3276369011, 2927737484, 1234200027, 1920815603,
    3536074689, 1535612501, 2184142071, 3276955054,  428488088,
    2378411984, 4059769550, 3913744741, 2732139246,   64369859,
    3755670074,  842839565, 2819894466, 2414718973, 1010060670,
    1839715346, 2410311136,  152774329, 3485009480, 4102101512,
    2852724304,  879944024, 1785007662, 2748284463, 1354768064,
    3267784736, 2269127717, 3001240761, 3179796763,  895723219,
     865924942, 4291570937,   89355264, 1471026971, 4114180745,
    3201939751, 2867476999, 2460866060, 3603874571, 2238880432,
    3308416168, 2072246611, 2755653839, 3773737248, 1709066580,
    4282731467, 2746170170, 2832568330,  433439009, 3175778732,
      26248366, 2551382801,  183214346, 3893339516, 1928168445,
    1337157619, 3429096554, 3275170900, 1782047316, 4264403756,
    1876594403, 4289659572, 3223834894, 1728705513, 4068244734,
    2867840287, 1147798696,  302879820, 1730407747, 1923824407,
    1180597908, 1569786639,  198796327,  560793173, 2107345620,
    2705990316, 3448772106, 3678374155,  758635715,  884524671,
     486356516, 1774865603, 3881226226, 2635213607, 1181121587,
    1508809820, 3178988241, 1594193633, 1235154121,  326117244,
    2304031425,  937054774, 2687415945, 3192389340, 2003740439,
    1823766188, 2759543402,   10067710, 1533252662, 4132494984,
      82378136,  420615890, 3467563163,  541562091, 3535949864,
    2277319197, 3330822853, 3215654174, 4113831979, 4204996991,
    2162248333, 3255093522, 2219088909, 2978279037,  255818579,
    2859348628, 3097280311, 2569721123, 1861951120, 2907080079,
    2719467166,  998319094, 2521935127, 2404125338,  259456032,
    2086860995, 1839848496, 1893547357, 2527997525, 1489393124,
    2860855349,   76448234, 2264934035,  744914583, 2586791259,
    1385380501,   66529922, 1819103258, 1899300332, 2098173828,
    1793831094,  276463159,  360132945, 4178212058,  595015228,
     177071838, 2800080290, 1573557746, 1548998935,  378454223,
    1460534296, 1116274283, 3112385063, 3709761796,  827999348,
    3580042847, 1913901014,  614021289, 4278528023, 1905177404,
      45407939, 3298183234, 1184848810, 3644926330, 3923635459,
    1627046213, 3677876759,  969772772, 1160524753, 1522441192,
     452369933, 1527502551,  832490847, 1003299676, 1071381111,
    2891255476,  973747308, 4086897108, 1847554542, 3895651598,
    2227820339, 1621250941, 2881344691, 3583565821, 3510404498,
     849362119,  862871471,  797858058, 2867774932, 2821282612,
    3272403146, 3997979905,  209178708, 1805135652,    6783381,
    2823361423,  792580494, 4263749770,  776439581, 3798193823,
    2853444094, 2729507474, 1071873341, 1329010206, 1289336450,
    3327680758, 2011491779,   80157208,  922428856, 1158943220,
    1667230961, 2461022820, 2608845159,  387516115, 3345351910,
    1495629111, 4098154157, 3156649613, 3525698599, 4134908037,
     446713264, 2137537399, 3617403512,  813966752, 1157943946,
    3734692965, 1680301658, 3180398473, 3509854711, 2228114612,
    1008102291,  486805123,  863791847, 3189125290, 1050308116,
    3777341526, 4291726501,  844061465, 1347461791, 2826481581,
     745465012, 2055805750, 4260209475, 2386693097, 2980646741,
     447229436, 2077782664, 1232942813, 4023002732, 1399011509,
    3140569849, 2579909222, 3794857471,  900758066, 2887199683,
    1720257997, 3367494931, 2668921229,  955539029, 3818726432,
    1105704962, 3889207255, 2277369307, 2746484505, 1761846513,
    2413916784, 2685127085, 4240257943, 1166726899, 4215215715,
    3082092067, 3960461946, 1663304043, 2087473241, 4162589986,
    2507310778, 1579665506,  767234210,  970676017,  492207530,
    1441679602, 1314785090, 3262202570, 3417091742, 1561989210,
    3011406780, 1146609202, 3262321040, 1374872171, 1634688712,
    1280458888, 2230023982,  419323804, 3262899800,   39783310,
    1641619040, 1700368658, 2207946628, 2571300939, 2424079766,
     780290914, 2715195096, 3390957695,  163151474, 2309534542,
    1860018424,  555755123,  280320104, 1604831083, 2713022383,
    1728987441, 3639955502,  623065489, 3828630947, 4275479050,
    3516347383, 2343951195, 2430677756,  635534992, 3868699749,
     808442435, 3070644069, 4282166003, 2093181383, 2023555632,
    1568662086, 3422372620, 4134522350, 3016979543, 3259320234,
    2888030729, 3185253876, 4258779643, 1267304371, 1022517473,
     815943045,  929020012, 2995251018, 3371283296, 3608029049,
    2018485115,  122123397, 2810669150, 1411365618, 1238391329,
    1186786476, 3155969091, 2242941310, 1765554882,  279121160,
    4279838515, 1641578514, 3796324015,   13351065,  103516986,
    1609694427,  551411743, 2493771609, 1316337047, 3932650856,
    4189700203,  463397996, 2937735066, 1855616529, 2626847990,
      55091862, 3823351211,  753448970, 4045045500, 1274127772,
    1124182256,   92039808, 2126345552,  425973257,  386287896,
    2589870191, 1987762798, 4084826973, 2172456685, 3366583455,
    3602966653, 2378803535, 2901764433, 3716929006, 3710159000,
    2653449155, 3469742630, 3096444476, 3932564653, 2595257433,
     318974657, 3146202484,  853571438,  144400272, 3768408841,
     782634401, 2161109003,  570039522, 1886241521,   14249488,
    2230804228, 1604941699, 3928713335, 3921942509, 2155806892,
     134366254,  430507376, 1924011722,  276713377,  196481886,
    3614810992, 1610021185, 1785757066,  851346168, 3761148643,
    2918835642, 3364422385, 3012284466, 3735958851, 2643153892,
    3778608231, 1164289832,  205853021, 2876112231, 3503398282,
    3078397001, 3472037921, 1748894853, 2740861475,  316056182,
    1660426908,  168885906,  956005527, 3984354789,  566521563,
    1001109523, 1216710575, 2952284757, 3834433081, 3842608301,
    2467352408, 3974441264, 3256601745, 1409353924, 1329904859,
    2307560293, 3125217879, 3622920184, 3832785684, 3882365951,
    2308537115, 2659155028, 1450441945, 3532257603, 3186324194,
    1225603425, 1124246549,  175808705, 3009142319, 2796710159,
    3651990107,  160762750, 1902254979, 1698648476, 1134980669,
     497144426, 3302689335, 4057485630, 3603530763, 4087252587,
     427812652,  286876201,  823134128, 1627554964, 3745564327,
    2589226092, 4202024494,   62878473, 3275585894, 3987124064,
    2791777159, 1916869511, 2585861905, 1375038919, 1403421920,
      60249114, 3811870450, 3021498009, 2612993202,  528933105,
    2757361321, 3341402964, 2621861700,  273128190, 4015252178,
    3094781002, 1621621288, 2337611177, 1796718448, 1258965619,
    4241913140, 2138560392, 3022190223, 4174180924,  450094611,
    3274724580,  617150026, 2704660665, 1469700689, 1341616587,
     356715071, 1188789960, 2278869135, 1766569160, 2795896635,
      57824704, 2893496380, 1235723989, 1630694347, 3927960522,
     428891364, 1814070806, 2287999787, 4125941184, 3968103889,
    3548724050, 1025597707, 1404281500, 2002212197,   92429143,
    2313943944, 2403086080, 3006180634, 3561981764, 1671860914,
    1768520622, 1803542985,  844848113, 3006139921, 1410888995,
    1157749833, 2125704913, 1789979528, 1799263423,  741157179,
    2405862309,  767040434, 2655241390, 3663420179, 2172009096,
    2511931187, 1680542666,  231857466, 1154981000,  157168255,
    1454112128, 3505872099, 1929775046, 2309422350, 2143329496,
    2960716902,  407610648, 2938108129, 2581749599,  538837155,
    2342628867,  430543915,  740188568, 1937713272, 3315215132,
    2085587024, 4030765687,  766054429, 3517641839,  689721775,
    1294158986, 1753287754, 4202601348, 1974852792,   33459103,
    3568087535, 3144677435, 1686130825, 4134943013, 3005738435,
    3599293386,  426570142,  754104406, 3660892564, 1964545167,
     829466833,  821587464, 1746693036, 1006492428, 1595312919,
    1256599985, 1024482560, 1897312280, 2902903201,  691790057,
    1037515867, 3176831208, 1968401055, 2173506824, 1089055278,
    1748401123, 2941380082,  968412354, 1818753861, 2973200866,
    3875951774, 1119354008, 3988604139, 1647155589, 2232450826,
    3486058011, 3655784043, 3759258462,  847163678, 1082052057,
     989516446, 2871541755, 3196311070, 3929963078,  658187585,
    3664944641, 2175149170, 2203709147, 2756014689, 2456473919,
    3890267390, 1293787864, 2830347984, 3059280931, 4158802520,
    1561677400, 2586570938,  783570352, 1355506163,   31495586,
    3789437343, 3340549429, 2092501630,  896419368,  671715824,
    3530450081, 3603554138, 1055991716, 3442308219, 1499434728,
    3130288473, 3639507000,   17769680, 2259741420,  487032199,
    4227143402, 3693771256, 1880482820, 3924810796,  381462353,
    4017855991, 2452034943, 2736680833, 2209866385, 2128986379,
     437874044,  595759426,  641721026, 1636065708, 3899136933,
     629879088, 3591174506,  351984326, 2638783544, 2348444281,
    2341604660, 2123933692,  143443325, 1525942256,  364660499,
     599149312,  939093251, 1523003209,  106601097,  376589484,
    1346282236, 1297387043,  764598052, 3741218111,  933457002,
    1886424424, 3219631016,  525405256, 3014235619,  323149677,
    2038881721, 4100129043, 2851715101, 2984028078, 1888574695,
    2014194741, 3515193880, 4180573530, 3461824363, 2641995497,
    3179230245, 2902294983, 2217320456, 4040852155, 1784656905,
    3311906931,   87498458, 2752971818, 2635474297, 2831215366,
    3682231106, 2920043893, 3772929704, 2816374944,  309949752,
    2383758854,  154870719,  385111597, 1191604312, 1840700563,
     872191186, 2925548701, 1310412747, 2102066999, 1504727249,
    3574298750, 1191230036, 3330575266, 3180292097, 3539347721,
     681369118, 3305125752, 3648233597,  950049240, 4173257693,
    1760124957,  512151405,  681175196,  580563018, 1169662867,
    4015033554, 2687781101,  699691603, 2673494188, 1137221356,
     123599888,  472658308, 1053598179, 1012713758, 3481064843,
    3759461013, 3981457956, 3830587662, 1877191791, 3650996736,
     988064871, 3515461600, 4089077232, 2225147448, 1249609188,
    2643151863, 3896204135, 2416995901, 1397735321, 3460025646);

  Seeds(0) := MT.U32(16#123#);
  Seeds(1) := MT.U32(16#234#);
  Seeds(2) := MT.U32(16#345#);
  Seeds(3) := MT.U32(16#456#);

  Put_Line("Generating numbers...");
  for I in 0 .. No-1 loop
    X := MT.Gen_U32;
    if X /= Result(I) then
      Put_Line("ERROR at position " & Integer'Image(I) &
               "; expected " & MT.U32'Image(Result(I)) &
               " but result is " & MT.U32'Image(X));
    end if;
  end loop;
end Tests_MT;

The UDP_Tester has two main components: the sender and the receiver. For convenience, there are simple wrappers for those components - udp_sender.adb and udp_receiver.adb respectively - so that a build of UDP_Tester will directly provide two executables - sender and receiver. The IP and port numbers are knobs that can be easily changed from those wrappers directly since they are passed as parameters to the UDP_tester methods themselves.

  -- S.MG, 2018

with UDP_Tester;

procedure UDP_Sender is
  -- Sender and Receiver addresses + ports
  Receiver_IP   : constant String   := ""; -- smg.test machine
  Receiver_Port : constant          := 7000;

  Sender_Port   : constant          := 5000;


  UDP_Tester.Sender( Receiver_IP, Receiver_Port, Sender_Port) ;

end UDP_Sender;


  -- S.MG, 2018

with UDP_Tester;

procedure UDP_Receiver is
  -- Port to listen on
  Receiver_Port : constant          := 7000;

  UDP_Tester.Receiver( Receiver_Port );

end UDP_Receiver;

The UDP_Tester.Receiver method simply runs an endless loop in which it listens for UDP packages on a given port number and logs basic information about each pacakge it receives. The UDP_Tester.Sender method assembles and sends in a pseudorandom order all packages with length between 6 (size of own header) and maximum length. Note that the sender will send each size of package *only once* and it will simply finish once it sent one package of each size. Consequently, for data collection purposes, the sender will have to be started repeatedly from outside - most conveniently through a cron task4. Because the workings of both sender and receiver are quite basic, I won't cover them here in more detail - the code and comments should make it clear enough and if that's not the case, you can always ask your questions here in the comments. However, it's worth noting instead a few important points about the tester as a whole and the changes it imposes on the UDP lib:

  • To enable sending of packets with various sizes, the UDP lib had to be made generic. In Ada, this means that a calling packet can then instantiate the UDP type with its desired parameters (in this case the size of the UDP package). Specifically, from the UDP_Tester.Sender code, this reads:
            K : constant Positive := Packet_Size;
            -- sender will have *current* size of UDP packet
            package UDP_Sdr is new UDP( Payload_Size => K );

    Essentially the sender runs a loop and at each iteration it selects one of the remaining packet sizes, it instantiates UDP_Sdr as a parametrized UDP package and only then proceeds to use it to send the actual data.

    Note that this change is *not* required in production - it's simply needed for testing purposes. It's for this reason that it's part of the .vpatch for UDP_Tester rather than a .vpatch in itself on the UDP lib directly.

  • The sender will wait 1 second between any 2 packages it sends. This is mainly because a first version without this delay proved to result in significant packet loss that was clearly due to the burst mode (i.e. more than 2000 packages sent in 1 second) rather than anything else. Since the point of the test was to gather some data on UDP communication in relaxed rather than stress conditions, the delay of 1 second was introduced. However, should you wish to stress-test for any reason, you can easily remove this delay - simply comment out the "delay 1.0" line in the sender and that's all.
  • The UDP_Tester currently logs some information on both sides. Both sender and receiver will first check for a fixed-name file and if it doesn't exist they will create it, writing the header to it as well. After that, all new data will simply be appended to the file. The sender logs the size, local time, destination IP, destination port and seed used for the MT prng - this seed is the local time (unix style) when the sender is started. The receiver logs the size (both as received and as claimed in the packet's own header), the time (local time when sent and local time when received), the source IP, the source port and the number of octets that are different from expected. In addition, the receiver will further log in a separate file the actual octets that are found to contain errors, if any (together with information to easily link them back to the exact package they were part of).
  • There is no attempt to match directlty the times at the sender and at the receiver - any difference between the two local times should be noted when the tests are run and then provided together with the data so that the analysis can be done in a meaningful way. Specifically, the local times used by sender and receiver are simply retrieved with Ada.Calendar.Clock and then converted to unix style by subtracting from them the "epoch" (0 on 1/1/1970).
  • The use of generic packages and of Ada.Calendar in UDP_Tester imposed a relaxation of restrictions upstream i.e. in the UDP and MT libs. Since the relaxation of those restrictions is again (just like the generics change) required for testing *only*, they are simply commented out as part of the changes included in the UDP_Tester .vpatch.

To obtain a working UDP_Tester, simply download the relevant set of .vpatch and .sig files and press with your favourite V5. The patches are available as usual on my Reference Shelf and linked from here as well, for your convenience:

At the moment I am running the above tester in both directions between a node in the UK and one in Uruguay so hopefully I'll soon have some data to look at!

  1. Follow the discussion in the logs as there is no better way to get the details than to actually read through the log. 

  2. M. Matsumoto and T. Nishimura, "Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator", ACM Transactions on Modelling and Computer Simulation Vol. 8, No. 1, January, pp.3-30, 1998. 

  3. If one really wants a default value, this could be perhaps added as a constant in the package and then used as such by the caller but I must say that I don't quite see the point of this. 

  4. It is of course possible to change the code so that the sender also runs in a loop and simply wakes up at given intervals and resends packages but I don't quite see any real benefit to doing it this way - on the contrary, I see benefit to *not* doing it this way, since cron tasks are quite perfect for this and even more robust (no idle time and guaranteed recovery even in more extreme cases such as computer reset). 

  5. NB: Using Keccak hashes, as per current republican standard for V-trees. 

September 14, 2018

SMG.Comms Implementation - Chapter 1

Filed under: Coding, SMG_Comms — Diana Coman @ 11:02 a.m.

~ This is a work in progress towards an Ada implementation of the communication protocol discussed on Trilema four months ago. ~

As far as I know, there isn't any more recent discussion of the above specification, nor any other attempt at all at any sort of implementation. Consequently, this is the first ever attempt - a prototype at this stage rather than a reference implementation. Moreover, it's also a sort of double first, since it clearly requires a deeper knowledge of Ada than I ever needed before. My approach to this pile of unknown here is to simply start working on it and expose the path travelled from this very first attempt to the final product, mistakes and detours and pitfalls to come included. You are welcome to follow along, to help if you can, to question if you don't understand, to extract perhaps in doing so some benefit for yourself.

The first decision I had to make before I could even really attempt any sort of prototype implementation at all concerned the library to use for the actual network communication. Since GNAT is the de-facto republican Ada compiler1, the logical decision is to simply use GNAT's own library, GNAT.Sockets and avoid otherwise as much as possible introducing additional, external dependencies - I really can't see any reason to add even more code even indirectly. So then GNAT.Sockets it is and hooray for that - except that there doesn't seem to be much documentation about it other than the comments in g-socket.ads and g-socket.adb! Still, the .ads file has a rather detailed introduction of the package with some commented examples in there so definitely worth reading as a starting point especially since... there isn't any other starting point really.

The g-socket files reveal essentially that any data to be sent or received through GNAT.Sockets will have to be stored at one point or another in an entity of Ada.Streams.Stream_Element_Array type. The examples in the file (and most everything else I could find on the topic) really focus almost exclusively2 on TCP connections - I suspect one "should prefer" TCP over UDP simply like that and we are even past the point of discussing therefore UDP in any scant docs or something. Nevertheless, I don't prefer it: Eulora's protocol is rather specifically designed to be stateless and to keep communications as simple and clear as possible - for as long as UDP is enough for the job, I'd rather use it!

Rummaging a bit in the g-sockets files reveals happily that UDP with GNAT.Sockets is in fact quite straightforward: one simply needs to specify "Socket_Datagram" as mode for the socket when calling Create_Socket and then use directly the Send_Socket( socket, data, last, to_address) and Receive_Socket(socket, data, last, from_address) methods. As one would expect, there is no reliable connection established, no "stream" of send and receive but only sockets that allow independent "send" and "receive" calls - each of those could in principle use a different address even. A quick example - keep reading for a a full, working example - of a server and client working with a UDP socket looks like this:

 -- on server:
     -- create UDP socket
    Create_Socket( Sock, Family_Inet, Socket_Datagram );

    -- set options on UDP socket
    Set_Socket_Option( Sock, Socket_Level, (Reuse_Address, True));

    -- set address and bind
    Address.Addr := Any_Inet_Addr;
    Address.Port := Port_No; -- the port number that server is listening on
    Bind_Socket( Sock, Address );
    Receive_Socket( Sock, Data, Last, From );

 -- on client:
    Address.Port := Port_No; -- server's port number
    Address.Addr := Inet_Addr(""); -- this should be server's address here!
    Create_Socket(Sock, Family_Inet, Socket_Datagram);

    Send_Socket(Sock, Data, Last, Address);

With the very basics of the sockets part at least in place, the next part to decide on is how to send and receive over those sockets the actual data types defined by the protocol. After reading quite a bit on the Systems Programming (an annex of the Ada standard and therefore compiler-dependent) part of Ada (that is relevant for data representation) and on streams and on representation clauses and pragmas supported by GNAT and records and discriminants for records and everything else from the Ada reference book3 that I thought could help with this, I got to the conclusion that I'll keep it simple and clear especially for now, while I'm just starting to figure it all out! So I'll simply use GNAT's fixed width types defined in the Interfaces package and I'll copy otherwise the raw octets from and to those types using Ada.Unchecked_Conversion that works as far as I understand it precisely as a raw copy from memory. Once the data is simply obtained as a vector of raw octets, I can implement two simple functions to copy them into the Stream_Element_Array structure that can be sent directly through the UDP socket. Moreover, at this stage - and at this stage *only* - I'll worry about potentially different endianness of the local machine and the network: if local environment is little endian, the methods converting to and from network format will simply read the octets in reverse order. The relevant basic types and conversion methods are defined in the smg_comms_types.ads file:

 -- S.MG, 2018
 -- prototype implementation of S.MG communication protocol

with Ada.Streams; use Ada.Streams;
with Interfaces; use Interfaces; -- Integer_n and Unsigned_n
with Ada.Unchecked_Conversion; -- converting int/uint to array of octets

package SMG_comms_types is
  -- basic types with guaranteed lengths
  type Octet_Array is array(Natural range <>) of Unsigned_8;

  subtype Octets_1 is Octet_Array( 1 .. 1 );
  subtype Octets_2 is Octet_Array( 1 .. 2 );
  subtype Octets_4 is Octet_Array( 1 .. 4 );
  subtype Octets_8 is Octet_Array( 1 .. 8 );

  subtype Message is Octet_Array( 1 .. 512 );
  subtype RSAMessage is Octet_Array( 1 .. 245 );

  -- blind, unchecked casts ( memcpy style )
  function Cast is new Ada.Unchecked_Conversion( Integer_8, Octets_1 );
  function Cast is new Ada.Unchecked_Conversion( Octets_1, Integer_8 );
  function Cast is new Ada.Unchecked_Conversion( Integer_16, Octets_2 );
  function Cast is new Ada.Unchecked_Conversion( Octets_2, Integer_16 );

  function Cast is new Ada.Unchecked_Conversion( Integer_32, Octets_4 );
  function Cast is new Ada.Unchecked_Conversion( Octets_4, Integer_32 );

  function Cast is new Ada.Unchecked_Conversion( Integer_64, Octets_8 );
  function Cast is new Ada.Unchecked_Conversion( Octets_8, Integer_64 );

  -- to and from streams for network communications - general
  procedure ToNetworkFormat(
      Item   : in Octet_Array;
      Buffer : out Stream_Element_Array);

  procedure FromNetworkFormat(
      Buffer : in Stream_Element_Array;
      Item   : out Octet_Array);

  -- specific, convenience methods for the basic types
    -- Integer_8
  procedure ToNetworkFormat(
      Item   : in Integer_8;
      Buffer : out Stream_Element_Array);

  procedure FromNetworkFormat(
      Buffer : in Stream_Element_Array;
      Item   : out Integer_8);

end SMG_comms_types;

As you can easily notice, the above does not yet cover fully even the 3.0 "Basic types" part of the protocol specification. It's all right for now though - there is quite enough there for the first basic tests and once those are fine, I'll add gradually the rest of types too. There is little point in spending the time now to implement them all before I even got the chance to change my mind regarding *how* to implement them! So if you have a better implementation solution than the above, speak up in the comments section below and save me some time and a lot of headache! Note however that simplicity, clarity and hard guarantees are rather important here.

One small point on which I'm already rather undecided is whether to continue implementing the convenience methods for different types so that one can simply call ToNetworkFormat and FromNetworkFormat for anything or to leave only the generic methods at least for basic types. At the moment I incline towards providing all those methods (i.e. adding to the single pairof methods for the Integer_8 type defined above) because more complex types will likely need such methods anyway and moreover, this approach helps to keep all those casts (unchecked_conversion) in one place rather than scattered all through the rest of the code. However, it does add to the LOC count of this package. Anyway, moving further for now, the corresponding .adb file with the implementation of smg_comms_types:

  -- S.MG, 2018
  -- prototype implementation of S.MG communication protocol

with SMG_comms_types; use SMG_comms_types;
with System; use System; -- endianness
with Ada.Exceptions;
with Ada.Streams; use Ada.Streams;

package body SMG_comms_types is

  -- to and from network format (i.e. big endian, stream_element_array)
  procedure ToNetworkFormat(
      Item   : in Octet_Array;
      Buffer : out Stream_Element_Array) is
    if Item'Length /= Buffer'Length then
      raise Constraint_Error with "Item and Buffer lengths do NOT match!";
    end if;

    if Default_Bit_Order = Low_Order_First then
      for I in 0 .. Item'Length - 1 loop
        Buffer( Buffer'Last - Stream_Element_Offset(I) ) := Stream_Element(Item(Item'First + I));
      end loop;
      for I in 0 .. Item'Length - 1 loop
        Buffer( Buffer'First + Stream_Element_Offset(I) ) := Stream_Element(Item(Item'First + I));
      end loop;
    end if;
  end ToNetworkFormat;

  procedure FromNetworkFormat(
      Buffer : in Stream_Element_Array;
      Item   : out Octet_Array) is
    if Item'Length /= Buffer'Length then
      raise Constraint_Error with "Buffer and Item length do NOT match!";
    end if;

    if Default_Bit_Order = Low_Order_First then
      for I in 0 .. Buffer'Length - 1 loop
        Item( Item'Last - I ) :=
          Unsigned_8( Buffer( Buffer'First + Stream_Element_Offset( I ) ) );
      end loop;
      for I in 0 .. Buffer'Length - 1 loop
        Item( Item'First + I ) :=
          Unsigned_8( Buffer( Buffer'First + Stream_Element_Offset( I ) ) );
      end loop;
    end if;
  end FromNetworkFormat;

  -- Integer_8
  procedure ToNetworkFormat(
      Item   : in Integer_8;
      Buffer : out Stream_Element_Array) is
    ToNetworkFormat( Cast( Item ), Buffer );
  end ToNetworkFormat;
  procedure FromNetworkFormat(
      Buffer : in Stream_Element_Array;
      Item   : out Integer_8) is
    octets: Octets_1;
    FromNetworkFormat(Buffer, octets);
    Item := Cast( octets );
  end FromNetworkFormat;

end SMG_comms_types;

Something that irks me every time I look at the above: those for loops that convert octet by octet to Stream_Element. This is not only ugly but also rather inefficient, especially given that it's potentially the *second* time when those octets are read one by one (the first time being at the conversion from a protocol type - especially one of the more complex types - to an array of octets). However, I have no idea how to do that conversion from array of octets to array of Stream_Element in one single move! Do you know a better way to do this? A direct assign fails because the cast from one type to another would be on the array type rather than individual elements type. And I'm rather reluctant to work directly with Stream_Element as the basic type because this type is implementation dependent and outside my direct control - so I can't actually really know *what* it is.

The above being said regarding Stream_Element, it is important to note that the code above (and almost all the code for a network communication protocol) is anyway, strictly speaking, rather tightly linked to GNAT since it relies on Stream_Element being exactly 8 bits long. And to make this clear, here is the full trail I followed to make sure that I can indeed simply convert an octet (i.e. 8 bits) to a Stream_Element and the other way around while preserving exactly the number of bits specified in the protocol for each type: first, GNAT implements Stream_Element in a-stream.ads as follows:

type Stream_Element is mod 2 ** Standard'Storage_Unit;

Then, the definition of Standard'Storage_Unit for GNAT, which reads that Standard'Storage_Unit always has, according to the GNAT reference manual, the same value as System.Storage_Unit. In turn, System.Storage_Unit is indeed declared in GNAT's system.ads as a constant with value 8, so a Stream_Element in GNAT will indeed have exactly 8 bits:

Storage_Unit : constant := 8;

Moving on, the next step is to write the client-server part even if only a test version for now. Initially I took the easy way out here and simply wrote the separate server and client, each listening/sending one single packet. Running those on different machines worked perfectly fine. This is both fine and needed for a reasonable basic test of the whole thing, sure. However, at the moment as I'm just starting on this I'd rather *not* faff about with 2 machines each and every time I change or add something to this prototype protocol implementation. Moreover, this server/client part is perfect to experiment as well with threads4 in Ada - a topic that I'm still struggling to learn so perfect to practice! Therefore, in the basic test, server and client are implemented as different tasks (Ada's "threads") of the same main program - obviously, it follows that client and server will send data to one another on the same, local machine. This is of course not ideal nor sufficient as test in the long term but it'll do nicely for now and until the basic layer of the protocol at least is more fleshed out. The code in test_comms.adb:

 -- S.MG, 2018
 -- prototype implementation of S.MG communication protocol

with GNAT.Sockets; use GNAT.Sockets;
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Streams; use Ada.Streams;
with Interfaces; use Interfaces;

with SMG_comms_types; use SMG_comms_types;

procedure test_comms is
  Port_No : constant := 2222;

  task type Client is
    entry Send;
  end Client;

  task type Server is
    entry Listen;
    entry Ready;
  end Server;

  task body Client is
    Sock: Socket_Type;
    Address: Sock_Addr_Type;
    Data: Ada.Streams.Stream_Element_Array(1..10) := (others => 42);
    Last: Ada.Streams.Stream_Element_Offset;
    N   : Integer_8 := -36;
    accept Send; -- task WILL block here until asked to send
    Address.Port := Port_No;
    Address.Addr := Inet_Addr("");
    Create_Socket(Sock, Family_Inet, Socket_Datagram);

    ToNetworkFormat( N, Data(1..1));
    Send_Socket(Sock, Data, Last, Address);
    Put_Line("Client sent data " & "last: " & Last'Img);
  end Client;

  task body Server is
    Sock: Socket_Type;
    Address, From: Sock_Addr_Type;
    Data: Ada.Streams.Stream_Element_Array(1..512);
    Last: Ada.Streams.Stream_Element_Offset;
    N : Integer_8;
    accept Listen; -- wait to be started!
    Put_Line("Server started!");
    -- create UDP socket
    Create_Socket( Sock, Family_Inet, Socket_Datagram );

    -- set options on UDP socket
    Set_Socket_Option( Sock, Socket_Level, (Reuse_Address, True));
    Set_Socket_Option( Sock, Socket_Level, (Receive_Timeout, Timeout => 10.0));

    -- set address and bind
    Address.Addr := Any_Inet_Addr;
    Address.Port := Port_No;
    Bind_Socket( Sock, Address );

    accept Ready; -- server IS ready, when here
    -- receive on socket
      Receive_Socket( Sock, Data, Last, From );
      Put_Line("last: " & Last'Img);
      Put_Line("from: " & Image(From.Addr));
      Put_Line("data is:");
      for I in Data'First .. Last loop
        FromNetworkFormat(Data(I..I), N);
      end loop;
      when Socket_Error =>
      Put_Line("Socket error! (timeout?)");
    end;  -- end of receive

  end Server;

  S: Server;
  C: Client;
  S.Ready; -- WAIT for server to be ready!
  C.Send;  -- client is started only after server!
end test_comms;

Tasks in Ada can be defined as types as above. This is not mandatory - one can equally well simply define the tasks and they'll run in parallel with the main begin-end block. However, Task types are effectively needed if one wants to be able to explicitly and potentially dynamically create several workers of the same type. Since I'll need this later for sure, I might as well practice it at any occasion, so there they are, the Client Task type and Server Task type. To make sure that the server is ready *before* the client sends anything, I'm using the rendez-vous communication method that Ada provides: each "entry" of a task is basically a rendez-vous point between the task itself that "accepts" that entry at some specified point in its body and a caller task that "calls" that entry from outside. The main body of test_comms illustrates this: S and C start in parallel with the main body of test_comms but they both stop almost immediately as they wait on their first rendez-vous points (accept Listen for the Server type and accept Send for the Client type). The main body first calls S.Listen, effectively releasing the S (Server type) task from its wait. Immediately after that, the S.Ready entry is called so that now the main body is actually waiting for S to get to its "Ready" entry - basically to finish setting up the socket. Once S got to its Ready entry, the main body can go further and the next statement calls C.Send that releases the client for its wait. The client will send therefore its data and promptly finish, while the server (that was running in parallel all this time) will finally receive the data, process it and finish as well. Go ahead, give it ago and let me know if there's anything funny going on!

To compile all the above with one neat gprbuild call, there is of course a .gpr file:

-- S.MG, 2018
 -- prototype implementation of S.MG communication protocol
 -- http://trilema.com/2018/euloras-communication-protocol-restated/

project SMG_comms is
  for Languages use ("Ada");

  for Source_Dirs use ("src");
  for Ignore_Source_Sub_Dirs use (".svn", ".git", "@*");

  for Object_Dir use "obj";
  for Exec_Dir use ".";

  for Main use ("test_comms.adb");

  package Builder is
    for Executable ("test_comms.adb") use "test_comms";
  end Builder;

  package Compiler is
    for Default_Switches ("Ada") use ("-O2");
  end Compiler;

end SMG_comms;

An example of running the test_comms executable produced by gprbuild:

smg_comms$ ./test_comms
Server started!
Client sent data last:  10
last:  10
data is:

On a slightly different note, I had a bit of an internal debate on whether it's appropriate or not to still release this as a V tree given that it's work in progress and absolutely nowhere near a reference object of any sort or even a fully working item. I came to view V however as simply a versioning system rather than a "set in stone only the end products" sort of thing - for that matter what end products anyway, it's at most, in the happiest of situations, seed products rather than end products. So all the above code is the genesis of smg_comms and it's simply a tiny, far from perfect seed that will hopefully grow into something useful while preserving in its v-tree the whole history of that growth:

  1. Due to a large extent to ave1's work on forcing it into some useful shape so that it builds on a sane, musl-based system. 

  2. The Socket_Datagram mode to be set for UDP communications is at least mentioned in the examples although the rest of differences in effectively comunicating something are left to be discovered. 

  3. My Ada reference book is Barnes, John, "Programming in Ada 2012", Cambridge University Press, 2016, ISBN 978-1-107-42481-4. 

  4. On-demand, resilient threads are crucial for Eulora's server and I'm not yet confident at all that I really fully grasp Ada's mechanisms so I'm pushing it to the front, reading on it and otherwise banging my head on it at any and all occasions - how else to figure it out faster? 

August 4, 2018

A Collection of Pearls as Well as Ever Sadder Epitaphs

Filed under: Open Sores — Diana Coman @ 11:39 p.m.

Motto: Gura pacatosului, adevar graieste.

Deep inside the muck that passes for code among the open source tribes, there are pearls left not before the swine but rather plainly after them1. Like any pearls, they are the fruit of pain avoided and postponed, the signs of the beginning of defeat, the ever sadder epitaphs:

//@@@ Dirty, hack, this might be in a diffrent thread
//@@@ need to syncronize this in some way. But at least
//@@@ by moving this here the client code dosn't call
//@@@ SendOut anymore.

Oh, how truthful it starts, for it is indeed dirty and a hack and it "might be" something else "in some way" - that way being of course if only someone else wrote it! "But at least" it's written there and then never read or acted upon again of course, for how else would code be muck and shit accumulate at such tremendous speeds to make all those half a million lines of code?

 // REVERTED 3/8/03 Until I can figure out why this is causing client conencts to fail - Andrew Mann (Rhad)
// If we don't know who this connection is, don't handle ACKs
//if (!connection)
// return false;

It's 2018 and I'm sure he has figured it all out by now, right? The best way to figure things out is by avoiding them. Only for now, only a bit, only in "some way", only temporarily, of course. OF COURSE. And the best way to clean your house is by sweeping the dust under the carpet, don't you know?

// XXX: This is hacky, but we need to send the message to the client
// here and now! Because in the next moment he'll be deleted

"Where there is a need there is a right" or how did it go? The fact that you *need* to send a message in an ill-suited place at the wrong time to an idiot client and so on are ALL signs of problems to be sorted out not reasons for piling the shit higher and higher!

 // TODO: Hack, for now just get the y value and sector from the first point.

There, the great discovery of this year's lazy bums just as lazy as last year's lazy asses: don't do today what you can postpone indefinitely! And if it's on a todo list, then it doesn't need to actually be done! If I write that I'll do it then it's just as good as if I did it and way less work so it's win-win-me, isn't it? Especially since nobody will dare laugh at me for it since that's discrimination and a big no-no and I'll get all upset and they'll hurt my feelings.


This one truly reminds me of a children's story: a granny was once asked how many people were there at the party she had just attended. And she truthfully replied: only 1 person; for there was an old dirty barrel blocking the entrance and everyone got around it except for this last man who actually moved the barrel out of the way and only then got in.

What can I tell you, they used to teach this stuff in kindergarten!

 // Hack to work around the weird sector stuff we do.

This is the very way hacks reproduce until there is nothing left but a big pile of hacks precariously stacked on one another all the way to the moon. The hack because of the hack because of the hack because of the... Just stop hacking already and clean the darned shit from your house, you know? Why do you keep shitting where you eat?

// Special case for string, because we have to specify the namespace
// std::string, which doesn't play nicely with our FLAG__namespace hackery.

Hack and... counter-hack! If hacking breeding more hacking and shit growing up to your eyeballs wasn't already clearly and plainly the only result of this insanity. What does it take for you to stop this stupid approach with the special case for the extraordinary situation of the most unique item on the singularity of shit?

/* hack: if we included OpenSSL's ssl.h, we know about SSL_CTX
* this will of course break if we're included before OpenSSL headers...

You know it's broken but you ... do it anyway? The once inconceivable, the un-human, is that the modern human? Why, just why would anyone do such a thing knowingly?

/** @@@ Hack: please someone tell me how to do this better? */

Perhaps one of the saddest of them all: he is asking, you know? Perhaps in the wrong place, perhaps poorly stated, perhaps at the wrong moment, perhaps even only half-hearted but nevertheless... he asks for some guidance away from the swamps! But who answers, who is there to even read his comments at least if not his code? Of all the thousands pairs of eyes, none even blinks in recognition, not even once and the question remains there unanswered, the hack stays in place, the cry for help goes unheard and unnoticed for there is nobody to hear it. The thousands pairs of eyes stare without seeing anything for they are all little more than painted eyes on painted masks on painted cardboard in the painted world. Pure insanity this brave new world, the very stuff nightmares are made out of.

There truly is no worse loneliness than the loneliness of those trying to be people while lost among the crowds. For crowds are never, ever, people. And more importantly perhaps, out of necessity, crowds will always and invariably choose to mend and make do, just as reflected in all those comments above, never to clear away the rot that is left therefore to accumulate and fester and stink and drown everything. So next time *you* make your choice between the expensive exposure and fix of the rot on one hand and the mend and make do on the other, think perhaps also of that loneliness of the person in the crowd and of what it truly means to be - to become! - only one of the many, many identical pairs of unseeing eyes in the crowd.


  1. Hey, swine have rights too! What are you getting offended about? 

July 14, 2018

Cutting MySQL into Musl Shape

Filed under: Coding — Diana Coman @ 3:40 p.m.

As I'm apparently the adventurous type or in fewer words an ice-breaker, I got to see all the ways in which MySQL fails to compile on a musltronic1 gentoo2. Note that there aren't that many versions of MySQL that portage knows about in the first place (exactly 5 versions: 5.5.60, 5.6.38, 5.6.39, 5.6.40, 5.7.22), mainly because of the nonsense proposition that "oh, we moved on to something new and shiny on which we got to stick another label and another name, be it mariadb". To quote specifically from mariadb itself when built3:

* Starting with version 10.1.8, MariaDB includes an improved systemd unit named mariadb.service
* You should prefer that unit over this package's mysqld.service.

Do you hear that? I won't even touch on the systemd nonsense but just read over and over again that part that says *you should prefer that unit* and nothing more. And tell me how can one write such a thing in the configuration files of a piece of software and then still have claim to ever be given the time of the day, not to mention to not being laughed out of town. And note that being laughed out of town is the best case there because frankly I find that offensive like hell rather than funny in the least.

Anyway, discarding with much pleasure the thing that "everyone" says I should prefer, here are the ways in which the various MySQL versions fail to emerge:

5.5.60: this version of MySQL pulls in openssl no matter what; even when libressl flag is added; even when -openssl is specifically added too; I just couldn't get it to leave openssl alone and so it's totally dead in the water. Not to mention that openssl then conflicts with libressl, so it's totally out of the question. RIP.

5.6.38: this was the preferred version so it was tried first, with a mask on anything greater than this aka >dev-db/mysql-5.6.38 written in /etc/portage/package.mask; it fails to compile because deep inside mysql code the stacktrace.c file relies on some thd_lib_detected variable and THD_LIB_LT constant to find out what implementation of threads it is running with. It was the first time I even saw those 2 things, a grep in MySQL code did NOT turn them anywhere else and then a quick dive into docs and everything else failed to find them anywhere *else* than in MySQL code. So I'm not even sure where are they meant to be set exactly, but given how the code compiles fine on a glibc system and fails miserably on the Musl system, I suspect it's some glibc-specific approach that got baked into this tiny corner of MySQL. Big mistake for the tiny corner, since the solution is to cut it away, of course. Anyway, the fail:

/var/tmp/portage/dev-db/mysql-5.6.38/work/mysql/mysys/stacktrace.c: In function 'my_print_stacktrace':
error: 'thd_lib_detected' undeclared (first use in this function)
sigreturn_frame_count = thd_lib_detected == THD_LIB_LT ? 2 : 1;
note: each undeclared identifier is reported only once for each function it appears in
error: 'THD_LIB_LT' undeclared (first use in this function)
sigreturn_frame_count = thd_lib_detected == THD_LIB_LT ? 2 : 1;

5.6.39: this is the version pulled by portage without any masking or keywords; it fails just like 5.6.38 with loud complaints about the unknown thd_lib constant & variable:

error: 'thd_lib_detected' undeclared (first use in this function)
sigreturn_frame_count = thd_lib_detected == THD_LIB_LT ? 2 : 1;
note: each undeclared identifier is reported only once for each function it appears in
error: 'THD_LIB_LT' undeclared (first use in this function)
sigreturn_frame_count = thd_lib_detected == THD_LIB_LT ? 2 : 1;

5.6.40: this fails quite the same as the 5.6.38 and 5.6.49 above:

/var/tmp/portage/dev-db/mysql-5.6.40/work/mysql/mysys/stacktrace.c: In function 'my_print_stacktrace':
error: 'thd_lib_detected' undeclared (first use in this function)
sigreturn_frame_count = thd_lib_detected == THD_LIB_LT ? 2 : 1;
note: each undeclared identifier is reported only once for each function it appears in
error: 'THD_LIB_LT' undeclared (first use in this function)
sigreturn_frame_count = thd_lib_detected == THD_LIB_LT ? 2 : 1;

5.7.22: this version requires the ~amd64 keywords on my proto-cuntoo so it's supposedly "at your own risk" grade. It fails at configuration stage with CMake barfing about a lack of mysys timer:

CMake Error at configure.cmake:573 (MESSAGE):
No mysys timer support detected!
Call Stack (most recent call first):
CMakeLists.txt:451 (INCLUDE)

And then I threw my hands up in despair at it all, took the plunge into MySQL code and Gentoo's very useful ebuild command4, cut away the misbehaving part and had the pleasant surprise that it was actually enough to make the whole thing work! After which I spent of course another few hours just checking the whole thing through because by now I am very, very suspicious when it's relatively easy to fix something.

The good news is that MySQL can be made to work on a musltronic system with just a small snip of the "safe printing" of a stacktrace - as I'm not terribly concerned at this time about MySQL's own stacktrace printing, I didn't really spend the time to *fix* the code although I'm sure it can be done with a bit of study of threads in Musl. For now though I am content to just delete that code and yes, have a MySQL that won't print its stacktrace. Specifically, here are the steps to get MySQL going on a proto-cuntoo system:

1. Get the source code by instructing ebuild to "fetch":

ebuild /usr/portage/dev-db/mysql/mysql-5.6.38.ebuild fetch

2. Unpack it:

ebuild /usr/portage/dev-db/mysql/mysql-5.6.38.ebuild unpack

3. Simply delete the contents of method my_print_stacktrace(uchar* stack_bottom, ulong thread_stack)  in /var/tmp/portage/dev-db/mysql-5.6.39/work/mysql/mysys/stacktrace.c and save the file. Leave the empty shell of the method there since it IS called (yes, I checked). Ideally one would fix the code I guess, but atm I really don't see this worth the time since the moment mysql fails that badly I seriously doubt I'll want to spend the time investigating its stack trace - it's more likely I'll throw it out the window entirely. At any rate, as this is not a "fix" by any measure, I'm not making it into a patch so you'll have to run through those steps rather than just emerge.

4. With the code thus modified, ask ebuild to go ahead and compile, install and then merge the result into the live file system, so three different commands to be run one at a time:

ebuild /usr/portage/dev-db/mysql/mysql-5.6.38.ebuild compile
ebuild /usr/portage/dev-db/mysql/mysql-5.6.38.ebuild install
ebuild /usr/portage/dev-db/mysql/mysql-5.6.38.ebuild qmerge

5. It's done; run your usual emerge --config=dev-db/mysql-5.6.38 and set the root password, create databases and users, as you need.

And there it is, I can happily report that MySQL 5.6.38 is now compiling and running quite happily on a fully musltronic system!

  1. Aka Musl-based rather than glibc-inflated. 

  2. Also known as proto-cuntoo since that's exactly what it is: an alpha version of trinque's cuntoo.  

  3. Yes, I built this one too and it builds fine. No, I won't use it though. 

  4. Man page for ebuild most recommended read! 

July 10, 2018

Some Branching Troubles on Existing V Trees

Filed under: Coding — Diana Coman @ 10:03 p.m.

In theory - when there IS a theory! - everything is neat, simple and takes almost no time1. Today's everything started similarly neat and simple with a clear goal that was not meant to take much time: finding a better home for my Keccak implementation.

To clarify the issue here: currently Keccak is part of EuCrypt mainly because at the time of its implementation there simply was no better place for it. But meanwhile phf integrated Keccak into his vtools, esthlos is actively searching for the best way to use Keccak hashes as part of his own V implementation and Arjen (ave1) has released a zero foot print (ZFP) library that is delightfully effective at eliminating bloat. And on top of all this, V itself is moving forward with more clarity and better practices: there shall be a manifest file, there shall be Keccak hashes instead of sha512. Combined, all those developments really point and prod at the simple fact: Keccak both needs and can get a better home now - namely a home as a branch off ave1's zfp tree!

Branching off an existing V tree is meant to be a straightforward task: press the original tree to the desired leaf, add SMG.Keccak, produce new patch, test that it sits indeed at desired place in the tree and it presses fine, write it all up, release and enjoy. But it didn't take long to get into trouble:

  1. To start with, it turned out that current Keccak can not yet use ZFP because its OAEP wrapper requires Interfaces.C and ZFP does not yet support it. Nevertheless, this is noted and added to the README next to the code but it's not in itself a game stopper: the runtime library to be used is anyway specified with the --RTS switch when invoking gprbuild so there is no need to change anything at a later time when ZFP supports Interfaces.C; moreover, the OAEP wrapper is really just a convenience wrapper so it can be removed if the user desires a stripped down pure Keccak in Ada.
  2. Then, vtools and v.pl seemed to not want to play nice together and they sputtered all sorts - this was the easy part to solve though as it turned out I had forgotten to chop the vtools branches properly and keep only one set of patches. Thanks to spyked and phf (and via the logs of course), the needed jolt to my memory was provided, the previous log discussion found and linked, the problem solved: as a result, I had again (as I even remembered I used to have at some other point) the spiffy2 vtools working smoothly and producing the desired Keccak-hashed vpatch.
  3. And then it turned out that I can't quite attach my shiny new vpatch to ave1's tree because my vpatch uses Keccak hashes while his other vpatches are still using sha512 and the combination is of course rather unsanitary.
  4. Adding to the above, I also have quite some trouble in finding a way to properly attach my Keccak patch to the current leaf in ave1's simple tree. Since there isn't a manifest file, I can't take the easy route of not even worrying about this issue and simply add the corresponding line for my patch to this file, knowing it'll come neatly after the current leaf. And it turns out that the difficult solution of making sure I change some file changed by the current leaf is also rather dubious because zfp_2_noc.vpatch changes only a few code files that Keccak really has NO business whatsoever in touching *at all*.
  5. A relatively minor issue by comparison with the ones above but still promising to give headache in the future is the fact that current zfp doesn't actually have its own top directory. So where exactly should I plonk smg_keccak folder in there?

What can I do?

  1. There's a reason many people like theories, especially at a safe distance from practice. 

  2. It's Stanislav's term but it fits best here! 

Older Posts »

Theme and content by Diana Coman