Ossasepia

November 22, 2018

SMG Comms Chapter 9: Files (Transfers and Requests)

Filed under: Coding, SMG_Comms — Diana Coman @ 9:42 p.m.

~ This is a work in progress towards an Ada implementation of Eulora's communication protocol. Start with Chapter 1.~

One significant goal of Eulora's communication protocol is to allow the client to request and obtain from the server absolutely *any* file that it might be missing so that there is effectively no need for an "update" of the client anymore. Two of the protocol's current messages target precisely this goal: File Transfer and File Request. The File Transfer message carries chunks of a file while the File Request carries a request from the client for files specified by their names (separated with ';'). And that means of course that one needs to do at least some sort of parsing to extract file names from that list while also handling at any time a set of unknown size containing strings of unknown length. In short: a mess.

My initial stab at the previously-mentioned mess was a rather ugly parametrized record (number of filenames) containing an array of ...parametrized records (each filename having its own length as a parameter). As you might imagine, this is not exactly the sort of thing that "fits in head" - at least not without a bigger hammer. And unsurprisingly, working with such a structure ended up in errors and trouble quite quickly (not to mention it felt about as pleasant as stabbing one's toe at every step). So I threw that first attempt to the bin and decided to store instead the filenames as one single long string (i.e. glued together into one string), alongside an array containing the starting positions of each separate filename in this long string. Basically instead of having separators in the string itself and mixed with the content, there is the content in one place and then neatly to the side the way to access directly any desired filename from the set. This keeps the protocol formatting (separators and the like) here in smg.comms where it belongs rather than pushing it higher up and at the same time it reduces the number of parameters to precisely 2: the number of filenames in there (hence the length of the array of start positions) and the total length of the resulting string (i.e. all filenames lengths added up together). The result is certainly an improvement over the first attempt but I can't say I am terribly fond of it as it is so if you have a better solution to this, go ahead and describe it in the comments below, including why and how it really is the better option - I'll gladly read it.

Using this approach of single string + start positions solved the issue of multiple variable lengths for strings. However, the File Request message still has all sorts of potential troublesome cases, including the case when the given set of filenames ends up longer than one message can carry (at write time) or there are multiple consecutive separators (at read time). My overall approach for such troubles with messages is to check whenever possible and effectively reject a message as invalid if a check fails. Anyway, working with an unknown bunch of unknown strings still remains at all times a sort of ugly spot, no matter what, so those read/write methods still look to me hairier than others so far and there isn't at the moment something specific that I can see to drastically improve them (short of fixing the size of filenames perhaps but I think that's more likely to just push the issue somewhere else as filenames will get padded/trimmed to size). Anyway, the new data structures for this chapter are defined in data_structs.ads:

  -- length of a text field (i.e. 16 bits, strictly > 0)
  subtype Text_Len is Positive range 1..2**16-1;

  -- A set of file names glued into a single string
  -- NB: there IS at least ONE filename and one character
  -- Upper limit for Text_Len is due to protocol's spec of text basic type
  type U16_Array is array (Text_Len range <> ) of Interfaces.Unsigned_16;
  type Filenames( F_No: Text_Len := 1; Sz: Text_Len := 1 ) is
    record
      -- filenames glued together into 1 single string
      S : String( 1 .. Sz ) := (others => '0');
      -- indices in S, at which each filename starts
      Starts: U16_Array( 1 .. F_No ) := (others => 1);
    end record;  

  -- A chunk of a file (for file transfer)
  type File_Chunk( Len     : Text_Len := 1;
                   Count   : Interfaces.Unsigned_16 := 1;
                   Name_Len: Text_len := 1) is
    record
      Filename: String(1..Name_Len);
      Content : Raw_Types.Octets(1..Len);
    end record;

In addition to the data structures above, this chapter adds the following:

  • Read/Write from/to Serpent Message for the File_Chunk structure (i.e. File Transfer message).
  • Read/Write from/to Serpent Message for the Filenames structure (i.e. File Request message).
  • Conversion methods from String to Octets and back. Those are private methods in the Messages packages and are meant for internal use only since messages use raw octets, while filenames and the like are meant as text/strings.
  • Read/Write from/to Octets for a 16 bits unsigned value. Similar to the conversion methods above, these are private methods in the Messages package and meant for internal use only. The reason for them to exist in the first place is that 16 bits values are relatively frequent in the protocol (counters and sizes) and their read/write requires an additional step to address potential endianness issues. Since this step is always the same and otherwise easy to forget + hard to debug if/when forgotten, it makes much more sense to have it packed together in one single procedure that can be called wherever needed.
  • Refactoring to replace all read/writes of 16-bit values by the new Read_U16/Write_U16 methods.
  • Tests for read/write of File Request and File Transfer messages.
  • Tests for converters from/to string/octets.
  • Small change to the test of pack/unpack RSA messages to ensure that "mangled" message is indeed always different from a valid package1.

The new read/write methods for File Transfer and File Request, described in messages.ads:

  ----------------- File Transfer ----------------------

  -- Writes the given File Chunk to a File Transfer type of message
  -- Chunk - chunk of file to write; contains counter, filename, content
  procedure Write_File_Transfer( Chunk   : in File_Chunk;
                                 Msg     : out Raw_Types.Serpent_Msg);

  -- The opposite of Write_File_Transfer method above.
  -- Chunk will contain the counter, filename and content
  procedure Read_File_Transfer( Msg     : in Raw_Types.Serpent_Msg;
                                Chunk   : out File_Chunk);

  ----------------- File Request  ----------------------
  -- Writes a message to request the files specified through their names
  -- Written parameter will hold the number of filenames actually written
  -- NB: this can be less than the number of filenames provided!
  -- When Written < FR.F_No, the FIRST Written filenames were written; the rest
  -- did not fit into the message and it's up to caller to decide what to do.
  procedure Write_File_Request( FR      : in Filenames;
                                Counter : in Interfaces.Unsigned_16;
                                Msg     : out Raw_Types.Serpent_Msg;
                                Written : out Natural);

  -- Reads a request for files; the opposite of Write_File_Request above
  -- Raises Invalid_Msg exception if the provided message fails checks.
  procedure Read_File_Request( Msg      : in Raw_Types.Serpent_Msg;
                               Counter  : out Interfaces.Unsigned_16;
                               FR       : out Filenames);

The implementation of those methods in messages.adb:

  ------ File Transfer ------
  procedure Write_File_Transfer( Chunk   : in File_Chunk;
                                 Msg     : out Raw_Types.Serpent_Msg) is
    Pos: Integer := Msg'First;
    U16: Interfaces.Unsigned_16;
  begin
    -- write type ID
    Msg(Pos) := File_Transfer_S_Type;
    Pos := Pos + 1;

    -- write filename as text field (size+2, text)
    -- check against overflows
    if Chunk.Name_Len > Text_Len'Last - 2 or
       Pos + Integer(Chunk.Name_Len) + 2 > Msg'Last then
      raise Invalid_Msg;
    end if;

    -- write total size: filename size + 2
    U16 := Interfaces.Unsigned_16( Chunk.Name_Len + 2 );
    Write_U16( Msg, Pos, U16 );

    -- write filename
    String_To_Octets( Chunk.Filename,
                      Msg(Pos..Pos+Integer(Chunk.Name_Len)-1) );
    Pos := Pos + Integer(Chunk.Name_Len);

    --write content
    -- check against overflow, including the 2 octets for counter at the end
    if Chunk.Len > Text_Len'Last - 2 or
       Pos + Integer(Chunk.Len) + 4 > Msg'Last then
      raise Invalid_Msg;
    end if;

    -- write total size for this text field
    U16 := Interfaces.Unsigned_16( Chunk.Len + 2 );
    Write_U16( Msg, Pos, U16 );

    -- write actual content
    Msg(Pos..Pos+Chunk.Content'Length-1) := Chunk.Content;
    Pos := Pos + Chunk.Content'Length;

    -- write counter
    Write_U16( Msg, Pos, Chunk.Count );

    -- write padding if needed
    if Pos <= Msg'Last then
      RNG.Get_Octets( Msg(Pos..Msg'Last) );
    end if;

  end Write_File_Transfer;

  -- The opposite of Write_File_Transfer method above.
  -- Counter will contain the message counter
  -- Chunk will contain the chunk counter, filename and content
  procedure Read_File_Transfer( Msg     : in Raw_Types.Serpent_Msg;
                                Chunk   : out File_Chunk) is
    Pos: Integer := Msg'First;
    U16: Interfaces.Unsigned_16;
    S_Name, E_Name: Integer; --start/end for filename in Msg
    S_Len: Text_Len; -- length of filename (needed as Text_Len anyway)
    S_Content, E_Content: Integer; --start/end for content in Msg
    Content_Len: text_Len; -- length of content (needed as Text_Len anyway)
  begin
    -- read and check type ID
    if Msg(Pos) /= File_Transfer_S_Type then
      raise Invalid_Msg;
    end if;
    Pos := Pos + 1;

    -- read filename size
    Read_U16( Msg, Pos, U16 );

    -- check for overflow and underflow; filename size >= 1
    if Pos + Integer(U16) - 2 > Msg'Last or
       U16 < 3 then
      raise Invalid_Msg;
    end if;
    U16 := U16 - 2;
    S_Len := Text_Len(U16);

    -- set start + end for reading filename later, when ready
    S_Name := Pos;
    E_Name := Pos + Integer(U16)-1;
    Pos := Pos + S_Len;

    -- read size of content
    Read_U16( Msg, Pos, U16 );
    -- check for overflow and underflow; content >=1; counter =2 octets
    if Pos + Integer(U16) - 1 > Msg'Last or
       U16 < 3 then
      raise Invalid_msg;
    end if;
    U16 := U16 - 2;
    Content_Len := Text_Len(U16);
    -- set start and end for reading content later, when ready
    S_Content := Pos;
    E_Content := Pos + Integer(U16) - 1;
    Pos := Pos + Content_Len;

    -- read counter
    Read_U16( Msg, Pos, U16 );
    -- check chunking validity i.e. if counter>0 then no padding
    if U16 /= 0 and Pos /= Msg'Last then
      raise Invalid_Msg;
    end if;

    -- create File_Chunk structure and fill it with data from Msg
    declare
      FC : File_Chunk( Len      => Content_Len,
                       Count    => U16,
                       Name_Len => S_Len);
    begin
      -- read from Msg
      FC.Content  := Msg( S_Content..E_Content );
      Octets_To_String( Msg( S_Name..E_Name ), FC.Filename);
      -- copy to output var
      Chunk := FC;
    end;

  end Read_File_Transfer;

  ---- File Requests ----
  procedure Write_File_Request( FR      : in Filenames;
                                Counter : in Interfaces.Unsigned_16;
                                Msg     : out Raw_Types.Serpent_Msg;
                                Written : out Natural) is
    Pos    : Integer := Msg'First;
    Max_Pos: Integer := Msg'Last - 2; -- 2 octets at end for counter
    Text_Sz: Integer;
    Max_Sz : Integer;
  begin
    -- write ID for File Request type
    Msg( Pos ) := File_Req_S_Type;
    Pos := Pos + 1;

    -- write Text size: filenames + separators
    -- consider fewer filenames if they don't ALL fit
    -- 2 octets are taken by size itself
    Max_Sz := Max_Pos - Pos - 1;
    Text_Sz := FR.Sz + FR.F_No - 1;
    if Text_Sz > Max_Sz then
      -- walk the array of filenames backwards and stop when they fit
      Written := FR.F_No - 1;
      -- calculate actual size written based on start of first discarded
        -- filename and (Written -1) octets for needed separators
      Text_Sz := Integer(FR.Starts(Written+1)) - FR.Starts'First +
                   (Written - 1);

      -- loop until either fits or nothing left
      while Written > 0 and Text_Sz > Max_Sz loop
        Written := Written - 1;
        Text_Sz := Integer(FR.Starts(Written+1))- FR.Starts'First +
                     (Written - 1);
      end loop;
      -- check that there is what to write, since nothing -> invalid message
      if Written = 0 then
        raise Invalid_Msg;
      end if;

    else --from if Text_Sz > Max_Sz
      -- ALL are written
      Written := FR.F_No;
    end if;

    -- write Text_Sz + 2 (i.e. TOTAL size)
    if Text_Sz + 2 > Integer(Interfaces.Unsigned_16'Last) then
      raise Invalid_Msg;
    end if;

    Write_U16( Msg, Pos, Interfaces.Unsigned_16(Text_Sz+2) );

    -- write filenames separated by Sep
    for I in 1..Written loop
      declare
        Start_Pos : Positive;
        End_Pos   : Positive;
        Len       : Positive;
      begin
        -- current start pos in FR.S
        Start_Pos := Positive( FR.Starts( FR.Starts'First + I - 1));

        -- calculate end based on start of next name or last
        if I < FR.F_No then
          End_Pos := Positive( FR.Starts( FR.Starts'First + I)) - 1;
        else
          End_Pos := FR.S'Last;
        end if;

        -- NB: this WILL fail if starting positions are not in order!
        Len := End_Pos - Start_Pos + 1;
        if Len <= 0 then
          raise Invalid_Msg;
        end if;

        --write the actual filename
        String_To_Octets( FR.S( Start_Pos..End_Pos ), Msg(Pos..Pos+Len-1) );
        Pos := Pos + Len;

        --if it's not the last one, write a separator
        if I < Written then
          Msg(Pos) := Sep;
          Pos := Pos + 1;
        end if;
      end;
    end loop;

    -- write the message counter in little endian at all times
    Write_U16( Msg, Pos, Counter );

    -- write padding if needed
    if Pos <= Msg'Last then
      Rng.Get_Octets( Msg(Pos..Msg'Last) );
    end if;
  end Write_File_Request;

  -- Reads a request for files; the opposite of Write_File_Request above
  procedure Read_File_Request( Msg      : in Raw_Types.Serpent_Msg;
                               Counter  : out Interfaces.Unsigned_16;
                               FR       : out Filenames) is
    Pos       : Integer := Msg'First;
    Max_Pos   : Integer := Msg'Last - 2; --at least 2 reserved for counter
    Text_Sz   : Integer;
    Max_Sz    : Integer := Max_Pos - Pos - 1; --text only i.e. w.o. size itself
    F_No      : Integer;
    U16       : Interfaces.Unsigned_16;
  begin
    -- read type ID and check
    if Msg(Pos) /= File_Req_S_Type then
      raise Invalid_Msg;
    end if;
    Pos := Pos + 1;

    -- read total size of filenames+separators
    Read_U16( Msg, Pos, U16 );
    Text_Sz := Integer(U16);
    -- take away the 2 octets for size itself
    Text_Sz := Text_Sz - 2;

    -- check that Text_Sz is not overflowing/underflowing
    if Text_Sz < 1 or Text_Sz > Max_Sz then
      raise Invalid_Msg;
    end if;

    -- count first the separators to know how many filenames
    -- NB: there is always at least 1 filename as Text_Sz > 0
    F_No := 1;
    for I in Pos .. Pos + Text_Sz - 1 loop
      if Msg(I) = Sep then
        F_No := F_No + 1;
      end if;
    end loop;

    -- create the output structure and discard separators
    -- text without separators should be Text_Sz - F_No + 1
    -- (because ONLY one separator between 2 filenames allowed)
    -- if it's not that => Invalid_Msg
    -- F_No and Text_Sz are not overflow (earlier check + calc)
    declare
      F     : Filenames(Text_Len(F_No), Text_Len(Text_Sz-F_No+1));
      S_Pos : Positive;
      Index : Positive;
    begin
      S_Pos := F.S'First;
      Index := F.Starts'First;
      F.Starts(Index) := Interfaces.Unsigned_16(S_Pos);

      for I in Pos .. Pos + Text_Sz - 1 loop
        -- copy over to F.S anything that is not separator
        if Msg(I) /= Sep then
          F.S( S_Pos ) := Character'Val(Msg(I));
          S_Pos := S_Pos + 1;
        else
          -- if it's separator, check and if ok, add next as start
          if I = Pos + Text_Sz or -- separator as last character is error
               Msg(I+1) = Sep or  -- 2 consecutive separators is error
               Index >= F.Starts'Last then -- too many separators is error
            raise Invalid_Msg;
          else
            Index := Index + 1;
            F.Starts( Index ) := Interfaces.Unsigned_16(S_Pos);
          end if;
        end if;
      end loop;

      -- copy the whole structure to output variable
      FR := F;
    end;

    -- read message counter now
    Pos := Pos + Text_Sz;
    Read_U16( Msg, Pos, Counter );

  end Read_File_Request;

The converters between String and Octets (messages.adb):

  -- String to Octets conversion
  procedure String_To_Octets(Str: in String; O: out Raw_Types.Octets) is
  begin
    Assert( Str'Length = O'Length );
    for I in 1..Str'Length loop
      O( O'First+I-1 ) := Character'Pos(Str(Str'First + I - 1 ));
    end loop;
  end String_To_Octets;

  -- Octets to string conversion
  -- NB: Str'Length has to be EQUAL to Octets'Length!
  procedure Octets_To_String(O: in Raw_Types.Octets; Str: out String) is
  begin
    Assert( O'Length = Str'Length );
    for I in 1..O'Length loop
      Str( Str'First+I-1 ) := Character'Val(O(O'First + I - 1 ));
    end loop;
  end Octets_To_String;

The read/write utilities for values on 16 bits (messages.adb):

  -- Write a 16 bits value to Octets at Pos; Pos increases by 2.
  procedure Write_U16( Msg: in out Raw_Types.Octets;
                       Pos: in out Natural;
                       U16: in Interfaces.Unsigned_16) is
  begin
    Msg(Pos..Pos+1) := Raw_Types.Cast(U16);
    Cast_LE(Msg(Pos..Pos+1));
    Pos := Pos + 2;
  end Write_U16;

  -- Read a 16-bits values from Octets from Pos; Pos increases by 2.
  procedure Read_U16( Msg: in Raw_Types.Octets;
                      Pos: in out Natural;
                      U16: out Interfaces.Unsigned_16) is
    O2  : Raw_Types.Octets_2;
  begin
    O2  := Msg(Pos..Pos+1);
    Cast_LE(O2);
    U16 := Raw_Types.Cast(O2);
    Pos := Pos + 2;
  end Read_U16;

The .vpatch and my signature for it can be found on my Reference Code Shelf as usual or through those links:


  1. Yes, I managed to fall during one of various testing sessions upon the one case where it was not, such are some of my talents, what can I tell you. 

2 Comments »

  1. ';' is a valid character in Linux filenames, along with all other possible octets with the exception of 0x00 and 0x2F, or '/'. If you're going to use a separator character, seems you might as well use null, being that it's already reserved. Otherwise, you're imposing an implicit constraint above and beyond those of the OS that you'll need to keep track of and check for. Maybe you plan to just never use ';' in a filename ever, but why even worry about it, when there's already a value that simply cannot appear in any valid filename?

    However, you say "protocol's current messages", so is this a live protocol that needs to be supported going forward? If so, then one does what one must, but it's an exception to document.

    Comment by DangerNorm — November 23, 2018 @ 1:16 a.m.

  2. The filenames will be what the server decides, not at all what linux considers "valid" or not. Not to mention that unix filenames are already a mess anyway (see the logs for full discussion on this) so not exactly something to look up to.

    The above being said, I am rather for no separators at all + fixed size => loads of problems simply don't exist anymore. However, it's not yet fully clear if this will fit the game's needs and it's not yet burning, so it's still being considered essentially.

    Comment by Diana Coman — November 23, 2018 @ 2:14 p.m.

RSS feed for comments on this post. TrackBack URL

Leave a comment

Theme and content by Diana Coman