Ossasepia

September 14, 2018

SMG.Comms Implementation - Chapter 1

Filed under: Coding, SMG_Comms — Diana Coman @ 11:02 a.m.

~ This is a work in progress towards an Ada implementation of the communication protocol discussed on Trilema four months ago. ~

As far as I know, there isn't any more recent discussion of the above specification, nor any other attempt at all at any sort of implementation. Consequently, this is the first ever attempt - a prototype at this stage rather than a reference implementation. Moreover, it's also a sort of double first, since it clearly requires a deeper knowledge of Ada than I ever needed before. My approach to this pile of unknown here is to simply start working on it and expose the path travelled from this very first attempt to the final product, mistakes and detours and pitfalls to come included. You are welcome to follow along, to help if you can, to question if you don't understand, to extract perhaps in doing so some benefit for yourself.

The first decision I had to make before I could even really attempt any sort of prototype implementation at all concerned the library to use for the actual network communication. Since GNAT is the de-facto republican Ada compiler1, the logical decision is to simply use GNAT's own library, GNAT.Sockets and avoid otherwise as much as possible introducing additional, external dependencies - I really can't see any reason to add even more code even indirectly. So then GNAT.Sockets it is and hooray for that - except that there doesn't seem to be much documentation about it other than the comments in g-socket.ads and g-socket.adb! Still, the .ads file has a rather detailed introduction of the package with some commented examples in there so definitely worth reading as a starting point especially since... there isn't any other starting point really.

The g-socket files reveal essentially that any data to be sent or received through GNAT.Sockets will have to be stored at one point or another in an entity of Ada.Streams.Stream_Element_Array type. The examples in the file (and most everything else I could find on the topic) really focus almost exclusively2 on TCP connections - I suspect one "should prefer" TCP over UDP simply like that and we are even past the point of discussing therefore UDP in any scant docs or something. Nevertheless, I don't prefer it: Eulora's protocol is rather specifically designed to be stateless and to keep communications as simple and clear as possible - for as long as UDP is enough for the job, I'd rather use it!

Rummaging a bit in the g-sockets files reveals happily that UDP with GNAT.Sockets is in fact quite straightforward: one simply needs to specify "Socket_Datagram" as mode for the socket when calling Create_Socket and then use directly the Send_Socket( socket, data, last, to_address) and Receive_Socket(socket, data, last, from_address) methods. As one would expect, there is no reliable connection established, no "stream" of send and receive but only sockets that allow independent "send" and "receive" calls - each of those could in principle use a different address even. A quick example - keep reading for a a full, working example - of a server and client working with a UDP socket looks like this:

 -- on server:
     -- create UDP socket
    Create_Socket( Sock, Family_Inet, Socket_Datagram );

    -- set options on UDP socket
    Set_Socket_Option( Sock, Socket_Level, (Reuse_Address, True));

    -- set address and bind
    Address.Addr := Any_Inet_Addr;
    Address.Port := Port_No; -- the port number that server is listening on
    Bind_Socket( Sock, Address );
    Receive_Socket( Sock, Data, Last, From );

 -- on client:
    Address.Port := Port_No; -- server's port number
    Address.Addr := Inet_Addr("127.0.0.1"); -- this should be server's address here!
    Create_Socket(Sock, Family_Inet, Socket_Datagram);

    Send_Socket(Sock, Data, Last, Address);

With the very basics of the sockets part at least in place, the next part to decide on is how to send and receive over those sockets the actual data types defined by the protocol. After reading quite a bit on the Systems Programming (an annex of the Ada standard and therefore compiler-dependent) part of Ada (that is relevant for data representation) and on streams and on representation clauses and pragmas supported by GNAT and records and discriminants for records and everything else from the Ada reference book3 that I thought could help with this, I got to the conclusion that I'll keep it simple and clear especially for now, while I'm just starting to figure it all out! So I'll simply use GNAT's fixed width types defined in the Interfaces package and I'll copy otherwise the raw octets from and to those types using Ada.Unchecked_Conversion that works as far as I understand it precisely as a raw copy from memory. Once the data is simply obtained as a vector of raw octets, I can implement two simple functions to copy them into the Stream_Element_Array structure that can be sent directly through the UDP socket. Moreover, at this stage - and at this stage *only* - I'll worry about potentially different endianness of the local machine and the network: if local environment is little endian, the methods converting to and from network format will simply read the octets in reverse order. The relevant basic types and conversion methods are defined in the smg_comms_types.ads file:

 -- S.MG, 2018
 -- prototype implementation of S.MG communication protocol

with Ada.Streams; use Ada.Streams;
with Interfaces; use Interfaces; -- Integer_n and Unsigned_n
with Ada.Unchecked_Conversion; -- converting int/uint to array of octets

package SMG_comms_types is
  -- basic types with guaranteed lengths
  type Octet_Array is array(Natural range <>) of Unsigned_8;

  subtype Octets_1 is Octet_Array( 1 .. 1 );
  subtype Octets_2 is Octet_Array( 1 .. 2 );
  subtype Octets_4 is Octet_Array( 1 .. 4 );
  subtype Octets_8 is Octet_Array( 1 .. 8 );

  subtype Message is Octet_Array( 1 .. 512 );
  subtype RSAMessage is Octet_Array( 1 .. 245 );

  -- blind, unchecked casts ( memcpy style )
  function Cast is new Ada.Unchecked_Conversion( Integer_8, Octets_1 );
  function Cast is new Ada.Unchecked_Conversion( Octets_1, Integer_8 );
  function Cast is new Ada.Unchecked_Conversion( Integer_16, Octets_2 );
  function Cast is new Ada.Unchecked_Conversion( Octets_2, Integer_16 );

  function Cast is new Ada.Unchecked_Conversion( Integer_32, Octets_4 );
  function Cast is new Ada.Unchecked_Conversion( Octets_4, Integer_32 );

  function Cast is new Ada.Unchecked_Conversion( Integer_64, Octets_8 );
  function Cast is new Ada.Unchecked_Conversion( Octets_8, Integer_64 );

  -- to and from streams for network communications - general
  procedure ToNetworkFormat(
      Item   : in Octet_Array;
      Buffer : out Stream_Element_Array);

  procedure FromNetworkFormat(
      Buffer : in Stream_Element_Array;
      Item   : out Octet_Array);

  -- specific, convenience methods for the basic types
    -- Integer_8
  procedure ToNetworkFormat(
      Item   : in Integer_8;
      Buffer : out Stream_Element_Array);

  procedure FromNetworkFormat(
      Buffer : in Stream_Element_Array;
      Item   : out Integer_8);

end SMG_comms_types;

As you can easily notice, the above does not yet cover fully even the 3.0 "Basic types" part of the protocol specification. It's all right for now though - there is quite enough there for the first basic tests and once those are fine, I'll add gradually the rest of types too. There is little point in spending the time now to implement them all before I even got the chance to change my mind regarding *how* to implement them! So if you have a better implementation solution than the above, speak up in the comments section below and save me some time and a lot of headache! Note however that simplicity, clarity and hard guarantees are rather important here.

One small point on which I'm already rather undecided is whether to continue implementing the convenience methods for different types so that one can simply call ToNetworkFormat and FromNetworkFormat for anything or to leave only the generic methods at least for basic types. At the moment I incline towards providing all those methods (i.e. adding to the single pairof methods for the Integer_8 type defined above) because more complex types will likely need such methods anyway and moreover, this approach helps to keep all those casts (unchecked_conversion) in one place rather than scattered all through the rest of the code. However, it does add to the LOC count of this package. Anyway, moving further for now, the corresponding .adb file with the implementation of smg_comms_types:

  -- S.MG, 2018
  -- prototype implementation of S.MG communication protocol

with SMG_comms_types; use SMG_comms_types;
with System; use System; -- endianness
with Ada.Exceptions;
with Ada.Streams; use Ada.Streams;

package body SMG_comms_types is

  -- to and from network format (i.e. big endian, stream_element_array)
  procedure ToNetworkFormat(
      Item   : in Octet_Array;
      Buffer : out Stream_Element_Array) is
  begin
    if Item'Length /= Buffer'Length then
      raise Constraint_Error with "Item and Buffer lengths do NOT match!";
    end if;

    if Default_Bit_Order = Low_Order_First then
      for I in 0 .. Item'Length - 1 loop
        Buffer( Buffer'Last - Stream_Element_Offset(I) ) := Stream_Element(Item(Item'First + I));
      end loop;
    else
      for I in 0 .. Item'Length - 1 loop
        Buffer( Buffer'First + Stream_Element_Offset(I) ) := Stream_Element(Item(Item'First + I));
      end loop;
    end if;
  end ToNetworkFormat;

  procedure FromNetworkFormat(
      Buffer : in Stream_Element_Array;
      Item   : out Octet_Array) is
  begin
    if Item'Length /= Buffer'Length then
      raise Constraint_Error with "Buffer and Item length do NOT match!";
    end if;

    if Default_Bit_Order = Low_Order_First then
      for I in 0 .. Buffer'Length - 1 loop
        Item( Item'Last - I ) :=
          Unsigned_8( Buffer( Buffer'First + Stream_Element_Offset( I ) ) );
      end loop;
    else
      for I in 0 .. Buffer'Length - 1 loop
        Item( Item'First + I ) :=
          Unsigned_8( Buffer( Buffer'First + Stream_Element_Offset( I ) ) );
      end loop;
    end if;
  end FromNetworkFormat;

  -- Integer_8
  procedure ToNetworkFormat(
      Item   : in Integer_8;
      Buffer : out Stream_Element_Array) is
  begin
    ToNetworkFormat( Cast( Item ), Buffer );
  end ToNetworkFormat;
  procedure FromNetworkFormat(
      Buffer : in Stream_Element_Array;
      Item   : out Integer_8) is
    octets: Octets_1;
  begin
    FromNetworkFormat(Buffer, octets);
    Item := Cast( octets );
  end FromNetworkFormat;

end SMG_comms_types;

Something that irks me every time I look at the above: those for loops that convert octet by octet to Stream_Element. This is not only ugly but also rather inefficient, especially given that it's potentially the *second* time when those octets are read one by one (the first time being at the conversion from a protocol type - especially one of the more complex types - to an array of octets). However, I have no idea how to do that conversion from array of octets to array of Stream_Element in one single move! Do you know a better way to do this? A direct assign fails because the cast from one type to another would be on the array type rather than individual elements type. And I'm rather reluctant to work directly with Stream_Element as the basic type because this type is implementation dependent and outside my direct control - so I can't actually really know *what* it is.

The above being said regarding Stream_Element, it is important to note that the code above (and almost all the code for a network communication protocol) is anyway, strictly speaking, rather tightly linked to GNAT since it relies on Stream_Element being exactly 8 bits long. And to make this clear, here is the full trail I followed to make sure that I can indeed simply convert an octet (i.e. 8 bits) to a Stream_Element and the other way around while preserving exactly the number of bits specified in the protocol for each type: first, GNAT implements Stream_Element in a-stream.ads as follows:

type Stream_Element is mod 2 ** Standard'Storage_Unit;

Then, the definition of Standard'Storage_Unit for GNAT, which reads that Standard'Storage_Unit always has, according to the GNAT reference manual, the same value as System.Storage_Unit. In turn, System.Storage_Unit is indeed declared in GNAT's system.ads as a constant with value 8, so a Stream_Element in GNAT will indeed have exactly 8 bits:

Storage_Unit : constant := 8;

Moving on, the next step is to write the client-server part even if only a test version for now. Initially I took the easy way out here and simply wrote the separate server and client, each listening/sending one single packet. Running those on different machines worked perfectly fine. This is both fine and needed for a reasonable basic test of the whole thing, sure. However, at the moment as I'm just starting on this I'd rather *not* faff about with 2 machines each and every time I change or add something to this prototype protocol implementation. Moreover, this server/client part is perfect to experiment as well with threads4 in Ada - a topic that I'm still struggling to learn so perfect to practice! Therefore, in the basic test, server and client are implemented as different tasks (Ada's "threads") of the same main program - obviously, it follows that client and server will send data to one another on the same, local machine. This is of course not ideal nor sufficient as test in the long term but it'll do nicely for now and until the basic layer of the protocol at least is more fleshed out. The code in test_comms.adb:

 -- S.MG, 2018
 -- prototype implementation of S.MG communication protocol

with GNAT.Sockets; use GNAT.Sockets;
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Streams; use Ada.Streams;
with Interfaces; use Interfaces;

with SMG_comms_types; use SMG_comms_types;

procedure test_comms is
  Port_No : constant := 2222;

  task type Client is
    entry Send;
  end Client;

  task type Server is
    entry Listen;
    entry Ready;
  end Server;

  task body Client is
    Sock: Socket_Type;
    Address: Sock_Addr_Type;
    Data: Ada.Streams.Stream_Element_Array(1..10) := (others => 42);
    Last: Ada.Streams.Stream_Element_Offset;
    N   : Integer_8 := -36;
  begin
    accept Send; -- task WILL block here until asked to send
    Address.Port := Port_No;
    Address.Addr := Inet_Addr("127.0.0.1");
    Create_Socket(Sock, Family_Inet, Socket_Datagram);

    ToNetworkFormat( N, Data(1..1));
    Send_Socket(Sock, Data, Last, Address);
    Put_Line("Client sent data " & "last: " & Last'Img);
  end Client;

  task body Server is
    Sock: Socket_Type;
    Address, From: Sock_Addr_Type;
    Data: Ada.Streams.Stream_Element_Array(1..512);
    Last: Ada.Streams.Stream_Element_Offset;
    N : Integer_8;
  begin
    accept Listen; -- wait to be started!
    Put_Line("Server started!");
    -- create UDP socket
    Create_Socket( Sock, Family_Inet, Socket_Datagram );

    -- set options on UDP socket
    Set_Socket_Option( Sock, Socket_Level, (Reuse_Address, True));
    Set_Socket_Option( Sock, Socket_Level, (Receive_Timeout, Timeout => 10.0));

    -- set address and bind
    Address.Addr := Any_Inet_Addr;
    Address.Port := Port_No;
    Bind_Socket( Sock, Address );

    accept Ready; -- server IS ready, when here
    -- receive on socket
    begin
      Receive_Socket( Sock, Data, Last, From );
      Put_Line("last: " & Last'Img);
      Put_Line("from: " & Image(From.Addr));
      Put_Line("data is:");
      for I in Data'First .. Last loop
        FromNetworkFormat(Data(I..I), N);
        Put_Line(N'Image);
      end loop;
    exception
      when Socket_Error =>
      Put_Line("Socket error! (timeout?)");
    end;  -- end of receive

  end Server;

  S: Server;
  C: Client;
begin
  S.Listen;
  S.Ready; -- WAIT for server to be ready!
  C.Send;  -- client is started only after server!
end test_comms;

Tasks in Ada can be defined as types as above. This is not mandatory - one can equally well simply define the tasks and they'll run in parallel with the main begin-end block. However, Task types are effectively needed if one wants to be able to explicitly and potentially dynamically create several workers of the same type. Since I'll need this later for sure, I might as well practice it at any occasion, so there they are, the Client Task type and Server Task type. To make sure that the server is ready *before* the client sends anything, I'm using the rendez-vous communication method that Ada provides: each "entry" of a task is basically a rendez-vous point between the task itself that "accepts" that entry at some specified point in its body and a caller task that "calls" that entry from outside. The main body of test_comms illustrates this: S and C start in parallel with the main body of test_comms but they both stop almost immediately as they wait on their first rendez-vous points (accept Listen for the Server type and accept Send for the Client type). The main body first calls S.Listen, effectively releasing the S (Server type) task from its wait. Immediately after that, the S.Ready entry is called so that now the main body is actually waiting for S to get to its "Ready" entry - basically to finish setting up the socket. Once S got to its Ready entry, the main body can go further and the next statement calls C.Send that releases the client for its wait. The client will send therefore its data and promptly finish, while the server (that was running in parallel all this time) will finally receive the data, process it and finish as well. Go ahead, give it ago and let me know if there's anything funny going on!

To compile all the above with one neat gprbuild call, there is of course a .gpr file:

-- S.MG, 2018
 -- prototype implementation of S.MG communication protocol
 -- http://trilema.com/2018/euloras-communication-protocol-restated/

project SMG_comms is
  for Languages use ("Ada");

  for Source_Dirs use ("src");
  for Ignore_Source_Sub_Dirs use (".svn", ".git", "@*");

  for Object_Dir use "obj";
  for Exec_Dir use ".";

  for Main use ("test_comms.adb");

  package Builder is
    for Executable ("test_comms.adb") use "test_comms";
  end Builder;

  package Compiler is
    for Default_Switches ("Ada") use ("-O2");
  end Compiler;

end SMG_comms;

An example of running the test_comms executable produced by gprbuild:

smg_comms$ ./test_comms
Server started!
Client sent data last:  10
last:  10
from: 127.0.0.1
data is:
-36
 42
 42
 42
 42
 42
 42
 42
 42
 42
smg_comms$

On a slightly different note, I had a bit of an internal debate on whether it's appropriate or not to still release this as a V tree given that it's work in progress and absolutely nowhere near a reference object of any sort or even a fully working item. I came to view V however as simply a versioning system rather than a "set in stone only the end products" sort of thing - for that matter what end products anyway, it's at most, in the happiest of situations, seed products rather than end products. So all the above code is the genesis of smg_comms and it's simply a tiny, far from perfect seed that will hopefully grow into something useful while preserving in its v-tree the whole history of that growth:


  1. Due to a large extent to ave1's work on forcing it into some useful shape so that it builds on a sane, musl-based system. 

  2. The Socket_Datagram mode to be set for UDP communications is at least mentioned in the examples although the rest of differences in effectively comunicating something are left to be discovered. 

  3. My Ada reference book is Barnes, John, "Programming in Ada 2012", Cambridge University Press, 2016, ISBN 978-1-107-42481-4. 

  4. On-demand, resilient threads are crucial for Eulora's server and I'm not yet confident at all that I really fully grasp Ada's mechanisms so I'm pushing it to the front, reading on it and otherwise banging my head on it at any and all occasions - how else to figure it out faster? 

17 Comments »

  1. I am getting "file not found" for the link to smg_comms_genesis.vpatch.sig

    Comment by PeterL — September 14, 2018 @ 3:44 p.m.

  2. Fixed (it was missing my name in it).

    Comment by Diana Coman — September 14, 2018 @ 4:40 p.m.

  3. Does Ada's type system prevent something in the style of C's classic "networkInt32 = (platformInt32[3]

    Comment by DangerNorm — September 15, 2018 @ 12:07 a.m.

  4. Does something about Ada's types and casting prevent using something like the C-style "networkInt32 = (uint32)((uint8) platformInt32[3])

    Comment by DangerNorm — September 15, 2018 @ 12:50 a.m.

  5. Oh, there they are. But it looks like I have to use HTML escapes. The code is supposed to be:

    networkInt32 = (uint32)((uint8) platformInt32[3]) << 0 | (uint32)((uint8) platformInt32[2]) << 8 | (uint32)((uint8) platformInt32[1]) << 16 | (uint32)((uint8) platformInt32[0]) << 24;

    {alert("Sanitize your inputs.");}

    Comment by DangerNorm — September 15, 2018 @ 12:54 a.m.

  6. DangerNorm Meanwhile log discussions cut the flipping bits knot more directly anyway: http://btcbase.org/log/2018-09-14#1850461

    Re conversions, Ada has the unchecked.conversion (as used in the code here already) which does exactly what it says. So yes, I could use this but very carefully as it were and in particular: http://btcbase.org/log/2018-09-14#1850412

    The problem as usual is more of "what is a *correct* way to do this" rather than strictly how can I hack this.

    Comment by Diana Coman — September 17, 2018 @ 10:02 a.m.

  7. Might just be my time in C talking, but that sort of platform-endianness-independent line does feel to me more "correct" and less hacky than manually checking endianness and then looping, in that it leaves platform endianness management to the compiler where it belongs, especially in light of bi-endian architectures. Checking it manually in the style of glibc's implementations of htons etc is the micro-optimizing route, in that it sets you up for being able to skip the conversion entirely if compiling to a platform whose native endianness happens to match the protocol.

    Comment by DangerNorm — September 17, 2018 @ 1:34 p.m.

  8. It seems to me that you are focusing a bit too narrowly on one detail and missing the bigger idea. The point was that it's even better to not go through all the dance little endian -> big endian and then back at all (at least not for data; ip+port are a different story but there's no need for any loop or byte by byte copy there anyway). The issue to solve was a bit wider in scope than "how to flip the bytes" - more like "how to handle this potential issue of endianness".

    Other than that, there is no argument: the loop in here is marked already as "this will have to go, it won't do", yes.

    Comment by Diana Coman — September 18, 2018 @ 12:56 p.m.

  9. Yes, but that's what I'm getting at. My answer to the problem of endianness is to always program as if platform endianness isn't even a thing. Except in the case of implementing a compiler, this should always be possible, even at the border of serial protocols. You can pick whichever endianness you want for your own protocol, but you can't escape serialization if you plan to support both kinds of platform. (Even for non-networked applications; files are also inherently serial.)

    But, depending on if your compiler is "sufficiently smart" about this, it might be doing this "in one step" (or zero steps) automatically: https://godbolt.org/z/5QY-Ay

    Comment by DangerNorm — September 18, 2018 @ 5:55 p.m.

  10. DangerNorm why not join #trilema and the discussion there?

    To the point: indeed, that's the adopted solution, yes. Specifically: http://www.loper-os.org/?p=2557

    Comment by Diana Coman — September 20, 2018 @ 8:24 a.m.

  11. [...] communication protocol for Eulora uses CRC32 as checksum for its packages and so I found myself looking in disbelief at the fact that [...]

    Pingback by EuCrypt Chapter 14: CRC32 Implementation with Lookup Table « Ossasepia — October 10, 2018 @ 1:53 p.m.

  12. [...] ~ This is a work in progress towards an Ada implementation of Eulora's communication protocol. Start with Chapter 1.~ [...]

    Pingback by SMG Comms Chapter 2: Raw Types « Ossasepia — October 16, 2018 @ 11:20 a.m.

  13. [...] ~ This is a work in progress towards an Ada implementation of Eulora's communication protocol. Start with Chapter 1.~ [...]

    Pingback by SMG Comms Chapter 3: Packing Serpent « Ossasepia — October 18, 2018 @ 10:12 a.m.

  14. [...] ~ This is a work in progress towards an Ada implementation of Eulora's communication protocol. Start with Chapter 1.~ [...]

    Pingback by SMG Comms Chapter 4: C Wrappers for RSA and MPI « Ossasepia — October 25, 2018 @ 7:28 p.m.

  15. [...] ~ This is a work in progress towards an Ada implementation of Eulora's communication protocol. Start with Chapter 1.~ [...]

    Pingback by SMG Comms Chapter 5: RSA with OAEP from Ada « Ossasepia — October 31, 2018 @ 5:35 p.m.

  16. [...] ~ This is a work in progress towards an Ada implementation of Eulora's communication protocol. Start with Chapter 1.~ [...]

    Pingback by SMG Comms Chapter 6: Packing and Unpacking RSA « Ossasepia — November 4, 2018 @ 1:38 p.m.

  17. [...] ~ This is a work in progress towards an Ada implementation of Eulora's communication protocol. Start with Chapter 1.~ [...]

    Pingback by SMG Comms Chapter 7: Read/Write Serpent Keysets to/from Serpent Messages « Ossasepia — November 10, 2018 @ 4:06 p.m.

RSS feed for comments on this post. TrackBack URL

Leave a comment

Theme and content by Diana Coman