More Direct than Direct_IO or Choosing Ada over GNAT



April 27th, 2023 by Diana Coman

Back when I started programming in Ada, I trusted the GNAT implementation of the Ada standard, mostly because I had no choice1. A few years and a lot of experience with said GNAT implementation later, I know for a fact that the I/O packages of GNAT such as Direct_IO and Sequential_IO are broken and merely obfuscating rather than helping in any significant way.

By the time I discovered in 2021 just how broken and uselessly complex these packages truly were, there were already several places where the game's client relied on them. Unpleasant discovery for sure but even so, given that there was plenty more pressing work to do and I had managed at least to restrict the use of such packages enough so that their bugs didn't get a chance to manifest in my code at all, I left the existing code as it was and simply avoided using these packages in code written *from that point on*. After all, the client is anyway open source and thus easily changed by anyone who cares enough about it - if anything, this sort of known and well understood issue makes exactly for a low hanging fruit to pick, something that can even help newcomers to get more easily and more satisfyingly involved.

Recently though, as other parts get finally integrated and the client is getting ever closer to release, I reviewed this matter again, especially as VaMP came into the picture as well. As a result, it turns out that there still is a place where I can't quite leave hanging this exact sort of fruit, namely when it comes to reading and writing cryptographic keys to the disk. First, this is quite clearly a sensitive operation and as such one where the fewer dependencies and the greater the clarity the better at all times. Second, this is integrated into the client indeed but it's part of a base layer that is not really client-specific and as such I would really keep it as clear from any unwanted GNAT dependencies as I can2. So I took the time to review again all the i/o code and then to sort it out in the most straightforward way possible.

The solution turns out to be both short and sweet - all the sweeter for being in fact so obviously well supported by Ada as a language and by all the rest of the code that I have implemented so far. All it took was one short new package that takes advantage of the very thin wrappers I wrote for the read/write C library functions (over which GNAT wraps layers upon layers of obfuscation more than usefulness) and uses otherwise direct memory mapping to convert between any given type to its raw representation and back.

Such direct memory mapping for composed types relies in turn on a very tight specification of how such type should be represented in memory but this was absolutely no trouble at all for me currently since... I had already taken the time when I defined the needed types to fully specify their representation as well! So there it was, my own work done well in the past quite directly and obviously paying its dividends in the present and compounding the dividends for the future, too.

As for the relevant code, here's the new mmap_io package, helpfully formatted already by my own VaMP3 for publishing on the blog:

eucore/src/base_layer/src/mmap_io.adb

6_1 
	-- DC, 2023
6_2 

6_3 
with Ada.Directories; use Ada.Directories;
6_4 
with Interfaces.C; use Interfaces.C;
6_5 
with System;
6_6 

6_7 
with Raw_Types; use Raw_Types;
6_8 
with Raw_IO; use Raw_IO;
6_9 

6_10 
package body MMap_IO is
6_11 
	sz: constant size_t := T'Size / System.Storage_Unit;
6_12 
	procedure Write(tval: in T; fname: in String; append: in Boolean) is
6_13 
		local: aliased T := tval;
6_14 
		--for local'Size use T'Size; -- this would be static as required only for scalar types
6_15 
		data: aliased Octets_Buffer_Pkg.Elem_Array(0..sz-1);
6_16 
		for data'Address use local'Address;
6_17 
	begin
6_18 
		Write_Octets(fname, data, append);
6_19 
	end Write;
6_20 

6_21 

6_22 
	function Read(fname: in String) return T is
6_23 
		local: aliased T;
6_24 
		data: aliased Octets_Buffer_Pkg.Elem_Array(0..sz-1);
6_25 
		for data'Address use local'Address;
6_26 
		ptr: Octets_Buffer_Pkg.Elem_Array_Pointer;
6_27 
		fsz: size_t;
6_28 
	begin
6_29 
		if not Exists(fname) then
6_30 
			raise Failed_IO with "inexistent/inaccessible file " & fname;
6_31 
		end if;
6_32 
		fsz:= size_t(Size(fname));
6_33 
		if fsz < sz then
6_34 
			raise Failed_IO with "file " & fname & " too short (" & fsz'Image & " vs expected " & sz'Image & ")";
6_35 
		end if;
6_36 
		ptr:= new Octets_Buffer_Pkg.Elem_Array(0..sz-1);
6_37 
		Read_Octets(fname, ptr);
6_38 
		data := ptr.all;
6_39 
		Octets_Buffer_Pkg.Free(ptr);
6_40 
		return local;
6_41 
	end Read;
6_42 

6_43 
end MMap_IO;

eucore/src/base_layer/src/mmap_io.ads

7_1 
	-- DC, 2023
7_2 
	-- memory-mapped I/O for types that are explicitly and fully specified with representation clause given that they are read/written as raw octets directly memory-mapped.
7_3 

7_4 
generic
7_5 
	-- any type but the user of this package is responsible to ensure that a direct memory mapping works correctly and reliably for T both ways and at all times
7_6 
	type T is private;
7_7 
package MMap_IO is
7_8 
	-- write the value tval of type T to file fname either appending or overwriting (append=false)
7_9 
	-- this creates the file if/when needed
7_10 
	-- raises exception on error
7_11 
	procedure Write(tval: in T; fname: in String; append: in Boolean);
7_12 
	-- reads one value of type T from given file fname
7_13 
	-- raises exception on error (eg inaccessible file, wrong size, failed mapping etc)
7_14 
	function Read(fname: in String) return T;
7_15 

7_16 
	-- reads the idx-th value of type T from file fname
7_17 
	-- function Read(fname: in String; idx: in Natural) return T;
7_18 
end MMap_IO;

With the above done, the actual writing and reading of rsa keys was actually *simpler* than it was when relying on direct_io and the likes. As a result, the udpated code is shorter by about 30% - the relevant keys_io.adb implementation went from 471 lines when using Direct_IO to 152 lines when using my own MMap_IO and this doesn't even take into account the dropped dependencies and how much additional code they brought in.

For anyone interested in using the client whether now or at any future date, the good news from the above is that the previously low hanging fruit of switching client i/o to a more straightforward non-gnat implementation is now even lower hanging, seeing how there is the MMap_IO package to rely on and/or to use as example for the case where something slightly different might serve better. To further help in this vein, note that the client still makes use of Sequential_IO mostly for the 'torrents' part, meaning the read/write of files obtained at run time from the server - essentially game assets of all sorts.

Possibly a slightly different package would serve better to replace that use for game assets since the MMap_IO package isn't aimed specifically at sequential i/o - one might perhaps make a generic package receiving both the file and the type desired, reading the whole in memory and operating from there. It would certainly be a faster and more reliable solution that GNAT's imagined 'control' over multiple simultaneous accesses to the same file on disk. I'll leave this for another day though and possibly, even preferably for another person, too, so feel free to pick it up if it's of any interest to you.

If you set to work and have something to show or to ask, I'll be happy to hear of it as well, so use the comments below confidently, there's nothing to lose for it for sure.


  1. Perhaps it's truly very naive to trust this way but at times it is also unavoidable - what you don't know yet, you have to take on trust, there is no third way available. At best, you get perhaps the choice to trust a person based on what you know *of them* but at worst, as in this case, all you 'get' is to trust an unknown entity, pretty much, the 'code itself' aka all 360M of it or an 'organization' aka Adacore with all its changing history and multitude of people that you can't ever know or get to know in any meaningful way anyway. So you trust, if and when you must but hopefully you then work as well towards reducing your own need for such blind trust. As you learn more, it's worth remembering to revisit, reevaluate and correct or adjust misplaced trust, where necessary - this is exactly what this article describes and documents, my reevaluation and correction of previously assigned trust, as I got to know better what it was exactly that I ended up relying on. 

  2. Especially since on review, I realised that even Direct_IO and Sequential_IO packages from GNAT *still* rely on the ugly and gnarly "streams" even though they avoid the broken 'finalization' and although there is both no need for such reliance nor any use for it. The very code in question states clearly in places that the streams abstraction is meaningless for the task at hand but nevertheless it's brought in and adhered to because of the need to fit a predefined form, a need of comformance and uniformity across the board basically, that's it. For instance, directly from s-direio.adb:

    -- The following is the required overriding for Stream.Read, which is
    -- not used, since we do not do Stream operations on Direct_IO files.

    In other words, it's most likely a deeper issue at core - the GNAT implementation values uniformity and ultimately fungibility above everything else as it sets purposefully to shield the user from the requirement of deeper understanding and actual grasp. It's a perspective entirely at odds with my own and so it's no surprise its fruits are unpalatable to me.

     

  3. Compounding dividends yet again, see? 

Comments feed: RSS 2.0

Leave a Reply