Ossasepia

February 8, 2018

EuCrypt Chapter 9: Byte Order and Bit Disorder in Keccak

Filed under: EuCrypt — Diana Coman @ 3:00 p.m.

~ This is part of the EuCrypt series. Start with Introducing EuCrypt. ~

The potential bit disorder trouble with Keccak highlighted at the end of the previous chapter calls for some decision to be made since a hash function won't be of much help if bits come out in different orders from different implementations. So now and here is your time to have your say on this matter - I'll first describe the changes made for this chapter, since they are directly linked to the issue anyway and then I'll lay out the current options as I see them. You are warmly invited to tell me what's wrong with them, come up with better ones, compare them or shred them to bits or... shut up and then live with them.

Strictly speaking, this chapter simply brings in a few modifications and additions to the previous "word-level" version of the Keccak sponge, introduced in Chapter 7. The goal is to make this relatively fast1 Keccak (compared to the bit-level version from Chapter 8) run with the same results on Big Endian as well as Little Endian machines. As I don't actually have a Big Endian machine at the moment and the virtualisation option with Qemu proved a HUGE headache, I'll be grateful for any feedback from someone who can run the code and tests provided (or any other tests they choose) on a Big Endian platform.

Before digging into the code itself, it's quite useful to review a bit the Endianness issue. In itself, big/little endian refers to the order in which octets (bytes) are stored in memory: either most significant octet first (big endian) or least significant octet first (little endian). So in principle as long as one doesn't address different octets directly, there is no problem. However, the compounded trouble with Keccak is that the darned specification of the Keccak state and its transformations in fact has unstated assumptions about both octet ordering and bit ordering! So on top of octet (byte) order, one needs to make sure that bit order assumptions also hold at all times and to say that this gives me a headache is an understatement. Essentially, Keccak's specification relies (implicitly, not explicitly...) on Big Endian octet (byte) ordering when describing the transformations but it uses Little Endian octet ordering and LSB (least significant *bit*) ordering for the Keccak state so that 0x80 for instances is represented as 0000 0001. Oh, the joys of consistent, crystal-clear and totally-not-confusing specifications!

Just about the only bit ray of light in this whole mess is that most of Keccak's transformations themselves are actually immune to the specific byte/bit order in use (as long as it's consistently used, of course). Nevertheless, there is potentially big trouble whenever the Z dimension of the Keccak state is directly involved and especially for the likes of the Iota transformation that uses 64-bit pre-defined values (the round constants). To address those issues, the current approach is as follows:

Reflecting the above, the .vpatch for this chapter adds a method for flipping octets of the input stream and checks for local bit order whenever changing from bits to words or the other way around. Moreover, the conversion methods from bits to words and the other way around are updated to match the LSB bit-order that Keccak supposes. The relevant parts are in eucrypt/smg_keccak/smg_keccak.adb:

  -- convert from a bitstream of ZWord size to an actual ZWord number
  function BitsToWord( BWord: in Bitword ) return ZWord is
    W    : ZWord;
    Bits: Bitword;
  begin
    -- just copy octets if machine is little endian
    -- flip octets if machine is big endian
    if Default_Bit_Order = Low_Order_First then
      Bits := BWord;
    else
      Bits := FlipOctets( BWord );
    end if;
    -- actual bits to word conversion
    W := 0;
    -- LSB bit order (inside octet) as per Keccak spec
    for I in reverse Bitword'Range loop
      W := Shift_Left( W, 1 ) + ZWord( Bits( I ) );
    end loop;

    return W;
  end BitsToWord;

  -- convert from a ZWord (lane of state) to a bitstream of ZWord size
  function WordToBits( Word: in ZWord ) return Bitword is
    Bits: Bitword := (others => 0);
    W: ZWord;
  begin
    W := Word;
    for I in Bitword'Range loop
      Bits( I ) := Bit( W mod 2 );
      W := Shift_Right( W, 1 );
    end loop;

    -- flip octets if machine is big endian
    if Default_Bit_Order = High_Order_First then
      Bits := FlipOctets( Bits );
    end if;

    return Bits;
  end WordToBits;

  -- flip given octets (i.e. groups of 8 bits)
  function FlipOctets( BWord : in Bitword ) return Bitword is
    Bits : Bitword;
  begin
    -- copy groups of 8 octets changing their order in the array
    -- i.e. 1st octet in BWord becomes last octet in Bits and so on
    for I in 0 .. ( Bitword'Length / 8 - 1 ) loop
      Bits ( Bits'First  + I * 8     .. Bits'First + I * 8 + 7 ) :=
      BWord( BWord'Last  - I * 8 - 7 .. BWord'Last - I * 8);
    end loop;
    return Bits;
  end FlipOctets;

As you might notice in the above, there are a few other small changes to the conversion functions, mainly replacing direct divisions/multiplications by powers of 2 with corresponding shifts - since this implementation uses anyway rotation of bits, there really is no reason not to take advantage of bit shifting too, where possible.

In addition to the code itself, there are of course new tests as well, mainly for the bit flipping:

  procedure test_flip is
    B: constant Bitword := (1, 0, 1, 1, 1, 1, 0, 0,
                            1, 1, 1, 0, 0, 0, 0, 1,
                            0, 1, 1, 0, 0, 0, 1, 0,
                            1, 1, 1, 1, 1, 1, 1, 1,
                            1, 1, 0, 1, 1, 0, 0, 1,
                            0, 0, 0, 0, 0, 0, 0, 0,
                            0, 0, 1, 1, 0, 0, 0, 1,
                            0, 0, 0, 1, 1, 0, 0, 0);
    Expected: Bitword :=   (0, 0, 0, 1, 1, 0, 0, 0,
                            0, 0, 1, 1, 0, 0, 0, 1,
                            0, 0, 0, 0, 0, 0, 0, 0,
                            1, 1, 0, 1, 1, 0, 0, 1,
                            1, 1, 1, 1, 1, 1, 1, 1,
                            0, 1, 1, 0, 0, 0, 1, 0,
                            1, 1, 1, 0, 0, 0, 0, 1,
                            1, 0, 1, 1, 1, 1, 0, 0);
    Output : Bitword;
  begin
    Output := FlipOctets( B );
    if Output /= Expected then
      Put_Line( "FAILED: flip octets" );
      Put_Line( "Expected: " );
      for I of Expected loop
        Put(Bit'Image(I));
      end loop;
      new_line(1);
      Put_Line( "Output: " );
      for I of Output loop
        Put(Bit'Image(I));
      end loop;
      new_line(1);
    else
      Put_Line( "PASSED: flip octets" );
    end if;
  end test_flip;

In addition to the new test for flipping, the older tests for the sponge are updated to be directly comparable with those from the bit-level version of Keccak so that one can easily check that the output of the two implementations is indeed the same. So the current test_sponge function is the following:

  procedure test_sponge is
    Bitrate   : constant Keccak_Rate := 1344;
    Input     : Bitstream(1..5) := (1, 1, 0, 0, 1);
    Hex       : array(0..15) of Character := ("0123456789ABCDEF");
    C         : Natural;
    HexPos    : Natural;
    Error     : Natural;
    Pos       : Natural;
    ExpHex    : constant String :=
              "CB7FFB7CE7572A06C537858A0090FC2888C3C6BA9A3ADAB4"&
              "FE7C9AB4EFE7A1E619B834C843A5A79E23F3F7E314AA597D"&
              "9DAD376E8413A005984D00CF954F62F59EF30B050C99EA64"&
              "E958335DAE684195D439B6E6DFD0E402518B5E7A227C48CF"&
              "239CEA1C391241D7605733A9F4B8F3FFBE74EE45A40730ED"&
              "1E2FDEFCCA941F518708CBB5B6D5A69C30263267B97D7B29"&
              "AC87043880AE43033B1017EFB75C33248E2962892CE69DA8"&
              "BAF1DF4C0902B16C64A1ADD42FF458C94C4D3B0B32711BBA"&
              "22104989982543D1EF1661AFAF2573687D588C81113ED7FA"&
              "F7DDF912021FC03D0E98ACC0200A9F7A0E9629DBA33BA0A3"&
              "C03CCA5A7D3560A6DB589422AC64882EF14A62AD9807B353"&
              "8DEE1548194DBD456F92B568CE76827F41E0FB3C7F25F3A4"&
              "C707AD825B289730FEBDFD22A3E742C6FB7125DE0E38B130"&
              "F3059450CA6185156A7EEE2AB7C8E4709956DC6D5E9F99D5"&
              "0A19473EA7D737AC934815D68C0710235483DB8551FD8756"&
              "45692B4E5E16BB9B1142AE300F5F69F43F0091D534F372E1"&
              "FFC2E522E71003E4D27EF6ACCD36B2756FB5FF02DBF0C96B"&
              "CAE68E7D6427810582F87051590F6FB65D7B948A9C9D6C93"&
              "AF4562367A0AD79109D6F3087C775FE6D60D66B74F8D29FB"&
              "4BA80D0168693A748812EA0CD3CA23854CC84D4E716F4C1A"&
              "A3B340B1DED2F304DFDBACC1D792C8AC9A1426913E3F67DB"&
              "790FD5CFB77DAA29";
    Output    : Bitstream( 1 .. ExpHex'Length * 4 );
    HexString : String( 1 .. ExpHex'Length );
  begin
    Put_Line("---sponge test---");
    Sponge(Input, Bitrate, Output);
    Put_Line("Input is:");
    for I of Input loop
      Put(Bit'Image(I));
    end loop;
    new_line(1);

    Put_Line("Output is:");
    for I of Output loop
      Put(Bit'Image(I));
    end loop;
    new_line(1);

    Error := 0;
    for I in 1..Output'Length/4 loop
      Pos := Output'First + (I-1)*4;
      C := Natural( Output( Pos ) ) +
           Natural( Output( Pos + 1 ) ) * 2 +
           Natural( Output( Pos + 2 ) ) * 4 +
           Natural( Output( Pos + 3 ) ) * 8;
      HexPos := I + 2 * ( I mod 2 ) - 1;
      Hexstring(HexPos) := Hex( C );
      if Hexstring(HexPos) /= ExpHex(HexPos) then
        Error := Error + 1;
      end if;
    end loop;
    Put_Line("Expected: ");
    Put_Line(ExpHex);
    Put_Line("Obtained: ");
    Put_Line(Hexstring);
    Put_Line("Errors found: " & Natural'Image(Error));

  end test_sponge;

The .vpatch for all the above and its signature can be found as usual on my Reference Code Shelf as well as directly from the links below:

To make it perfectly clear before I even get into any options regarding the bit & byte mess: the whole trouble arises essentially because Keccak's specification can't stick to one thing and one thing only i.e. either "this is a bit-level futzing" or "this is a byte level fizzing." So instead it ends up as a bit-byte-bit level-hopping mad tzizzing: theoretically it works at bit-level since the state and transformations really are defined at bit level; in practice and *at the same time* the whole thing is however specified and discussed at byte (octet) level with the Z dimension of the state pretty much ignored (since that is the bit-level basically) and algorithms duly given at... word level even (speed! implementation concern! Yes, yes, but why, WHY can't you keep the two well separated and labeled as such? WHY?). Moreover, note that the padding rule for instance is specified at bit-level: "10*1" so 2 bits of 1 at the ends with as many 0s as needed in between them so that the whole input is brought to a length that is a multiple of the selected block (Keccak rate) length. But in test vectors this ends up however as 0x80 because of the unstated assumptions of bit order in the Keccak state - basically, why not, we have 2 bit orders so why not use them, right? MOREOVER, the whole thing is anyway meant to work as a hashing of messages hence of characters hence of... octets (bytes)!2 So on one hand there are test vectors with all sort of number of bits as input, on the other hand you'll likely want it to handle *consistently* inputs that are inescapably in multiples of 8 bits and nothing else. "Oh, but this is solved already" you'll say - sure, it is of sorts, basically by NIST having on one hand their own whole damned Appendix for SHA-3 just to sort-of, ass-backwards explain the absorb into a state because it's so unclear due to bit order and on the other hand by Keccak having one padding rule while SHA-3 has yet another padding rule (hey, we put in a "delimiter" which is 01 for some reason). What, isn't that clear? NO? But how surprising!

To bring back some sanity, let's see a few options:

1. Input/Output size & padding:
1.1 Fix both input and output at octet (byte) level, reflecting the intended reality of 8 bits/ character anyway. In this case:

  • valid input for Keccak is a multiple of 8 bits;
  • valid output from Keccak is a multiple of 8 bits;
  • Keccak rate (block size i.e. maximum number of bits absorb/squeezed between two scramblings of the Keccak state) is a multiple of 8 bits;
  • input is padded with 10*1 up to closest multiple of block length (Keccak rate). Consequently, the length of padding will be a minimum of 8 bits and a maximum of block length bits.

1.2 Keep input and output at bit level, reflecting the bit-level nature of the Keccak permutation. In this case:

  • valid input for Keccak is any number of bits > 0
  • valid output from Keccak is any number of bits > 0
  • Keccak rate (block size) is any number of bits < length of Keccak state in bits
  • input is padded with 10*1 up to closest multiple of block length (Keccak rate). Consequently, the length of padding will be a minimum of 2 bits and a maximum of block length bits. In this case however the conversion from characters to bit streams needs to be specified clearly too: does one still consider the whole octet anyway or does one discard any trailing 0s for instance?

2. Input bit and byte (octet) order:
2.1 Input is first padded with the rule 10*1 and *the result* considered to have Little Endian byte order (hence octets will be flipped on Big Endian iron) and LSB bit order (hence absorbed bit by bit in order into the Keccak state).
2.2 The padding itself is flipped first to LSB (so the end octet if it's one octet full of padding will be 0x80 rather than 0x01) and then attached to the input that is considered as a result to have Little Endian byte order (hence octets will be flipped on Big Endian iron) and LSB bit order (hence absorbed bit by bit in order into the Keccak state).

3. Output bit and byte (octet) order:
3.1 Output is extracted bit by bit from the state, meaning that the output will have LSB bit order. For example, Keccak will output 1101 0011 that is LSB for the corresponding 1100 1011 MSB (or 0xCB). Moreover, if requested output is NOT a multiple of 8 bits, the "stray" bits at the end will be the least significant from the state's relevant octet. This is what the current SMG implementations do. Essentially this makes the absorb/squeeze quite straightforward and it leaves the interpretation of bits up to the user of Keccak, so outside the permutations themselves.
3.2 Output is extracted octet by octet from the state, meaning that the output will have MSB bit order. For example, Keccak will output 1100 1011 in the case above at 3.1 (0xCB). Note that in this case an output that is NOT a multiple of 8 bits will end in the most significant bits of the relevant octet in the state. As a concrete example: 4 bits of output in the same 0xCB case used before will be "1100" with this option but "1101" with the previous option from 3.1.

There are of course further options that go deeper into the internals of Keccak. For instance, it is certainly possible to use MSB bit ordering in the Keccak state and change accordingly the relevant transformations. However, such approaches are more time-consuming at this stage and I don't see the clear benefit that would justify that. Nevertheless, if you see such benefit, kindly argue your case in the comments section below. Once again, now it's the time to provide any input on this because once it's fixed that's how it stays and we'll have to live with it as it is.


  1. This is not even by far the fastest implementation possible. At the very least, a faster implementation would unroll the loops and try to parallelize tasks whenever possible. 

  2. And NO, I shall not even entertain the additional madness of Unicode and other Unicorns, thank you. 

February 1, 2018

EuCrypt Chapter 8: Bit-Level Keccak Sponge

Filed under: EuCrypt — Diana Coman @ 9:45 p.m.

~ This is part of the EuCrypt series. Start with Introducing EuCrypt. ~

Implementing the Keccak Sponge at bit-level turns out to be a more enjoyable experience than the previous contortions for a "word"-level (64 bits to be precise) version of the sponge. The implementation itself is more straightforward and the resulting code really is way clearer and easier to follow, especially as it takes advantage of Ada's very convenient modular types. And as a bonus side effect, working at bit level means also that there really is no need anymore for importing those gnat-specific methods that I never wanted in the first place. As to the "but it's going to be slooooooow!" worries, I'll leave those for later: at this stage the goal is to have a reference implementation that is first of all clear and easy to understand. Once this is available, I can look into the speed issue and evaluate if needed the degree to which it really is an issue for Eulora's needs at any rate. Not to mention the fact that a bit-level Keccak does not in any way mean that a word-level Keccak cannot be made as well.

Without further delay, let's see what this patch adds: first of all a new component for EuCrypt, namely smg_bit_keccak. I decided to keep this bit-level implementation separate from the word-level one because I see this implementation as an alternative rather than a replacement necessarily. As a result, this chapter's vpatch will fork directly from EuCrypt's genesis yet again and to make this clear, I edited the eucrypt/README file first, adding the full list of current components of EuCyrpt:

Components:
1. mpi
  Arbitrary length integers and operations.
  Implemented in C.

2. smg_bit_keccak
  Bit-level implementation of the Keccak sponge according to The Keccak Reference v 3.0.
  Implemented in Ada.

3. smg_keccak
  Word (64 bits) level implementation of the Keccak sponge according to The Keccak Reference v 3.0.
  Implemented in Ada.

4. smg_serpent
  Serpent hash method.
  Implemented in Ada.

5. smg_rsa
  RSA implementation using TMSR specification.
  Implemented in C.

6. smg_comm
  Communications for Eulora (server <-> client). Relies on all the other components.

The description of the SMG_Bit_Keccak package is quite similar to that of the previous SMG_Keccak. The main difference is that the lanes of a Keccak state (i.e. the Z dimension) are now arrays of bits rather than values modulo 2^Z_Length. Reflecting this, the Bitword type is an array of Bit with index of ZCoord type specifically. The Sponge procedure itself has the very same signature since it still receives a Bitstream and a Keccak_Rate as input, while spitting a different Bitstream as output. There is no "Plane" type anymore because it is not actually needed when working at bit-level. The resulting public part of the SMG_Bit_Keccak package definition in eucrypt/smg_bit_keccak/smg_bit_keccak.ads:

 -- S.MG bit-level implementation of Keccak-f permutations
 -- (Based on The Keccak Reference, Version 3.0, January 14, 2011, by
 --   Guido Bertoni, Joan Daemen, Michael Peeters and Gilles Van Assche)

 -- S.MG, 2018

package SMG_Bit_Keccak is
	pragma Pure(SMG_Bit_Keccak);  --stateless, no side effects -> can cache calls

  --knobs (can change as per keccak design but fixed here for S.MG purposes)--
  Keccak_L: constant := 6;  --gives keccak z dimension of 2^6=64 bits and
                            --therefore keccak function 1600 with current
                            --constants (5*5*2^6)

  --constants: dimensions of keccak state and number of rounds
  XY_Length: constant := 5;
  Z_Length: constant := 2 ** Keccak_L;
  Width: constant := XY_Length * XY_Length * Z_Length;
  N_Rounds: constant := 12 + 2 * Keccak_L;

  --types
  type XYCoord is mod XY_Length;
  type ZCoord is mod Z_Length;
  type Round_Index is mod N_Rounds;

  type Bit is mod 2;
  type Bitstream is array( Natural range <> ) of Bit; -- any length; message
  type Bitword is array( ZCoord ) of Bit; -- a keccak "word" of bits

  type State is array( XYCoord, XYCoord ) of Bitword; -- the full keccak state

  type Round_Constants is array(Round_Index) of Bitword; --magic keccak values

  -- rate can be chosen by caller at each call, between 1 and width of state
  -- higher rate means sponge "eats" more bits at a time but has fewer bits in
  --   the "secret" part of the state (i.e. lower capacity)
  subtype Keccak_Rate is Positive range 1..Width;  -- capacity = width - rate

  -- public function, the sponge itself
  -- Keccak sponge structure using Keccak_Function, Pad and a given bitrate;
  -- Input - the stream of bits to hash (the message)
  -- Block_Len - the bitrate to use; this is effectively the block length
  --             for splitting Input AND squeezing output between scrambles
  -- Output - a bitstream of desired size for holding output
  procedure Sponge(Input      : in Bitstream;
                   Block_Len  : in Keccak_Rate;
                   Output     : out Bitstream);

In the private part of SMG_Bit_Keccak, there are 3 new methods, namely Next_Pos, First_Pos and BWRotate_Left. As you might guess, BWRotate_Left rotates a given Bitword to the left by the specified number of bits. This effectively replaces the previous gnat-specific Rotate_Left method and is used by one of the Keccak transformations of state. The First_Pos effectively sets the X, Y, Z coordinates to point to the first bit of the Keccak state. It's implemented as a method on its own because it is used in several places and moreover because the "first" position in the cuboid is at the end of the day a matter of convention. Similarly, Next_Pos receives a set of 3 values (X, Y, Z) and changes those to point to the *next* bit in the Keccak state. Once again, what constitutes "next" is a matter of convention - basically it depends on the direction in which one moves along the Z, Y and X dimensions.

private
  -- these are internals of the keccak implementation, not meant to be directly
  --  accessed/used
  -- moving one bit forwards in Keccak state
  procedure Next_Pos( X : in out XYCoord;
                      Y : in out XYCoord;
                      Z : in out ZCoord
                    );
  -- set coordinates to first bit of Keccak state
  procedure First_Pos( X : out XYCoord;
                       Y : out XYCoord;
                       Z : out ZCoord
                     );

  -- operations with Bitwords
  function BWRotate_Left( Input: in Bitword;
                          Count: in Natural)
                          return Bitword;

The rest of the SMG_Bit_Keccak package contains the SqueezeBlock and AbsorbBlock helper methods, the 5 Keccak transformations of state (Theta, Rho, Pi, Chi and Iota) and the Keccak function that does a full scramble of state by calling all the transformations together in the correct order and with the corresponding constants for each round. The only difference with respect to the word-level implementation is the way in which the round constants are given: here they are directly given as Bitword so arrays of bits. Moreover, the order of the bits in the array corresponds to the convention adopted for the Z dimension in this implementation:

  -- this will squeeze Block'Length bits out of state S
  -- NO scramble of state in here!
  -- NB: make SURE that Block'Length is the correct bitrate for this sponge
  -- in particular, Block'Length should be a correct bitrate aka LESS than Width
  procedure SqueezeBlock( Block: out Bitstream; S: in State);

  -- This absorbs into sponge the given block, modifying the state accordingly
  -- NO scramble of state in here so make sure the whole Block fits in state!
  -- NB: make SURE that Block'Length is *the correct bitrate* for this sponge
  -- in particular, Block'Length should be a correct bitrate aka LESS than Width
  procedure AbsorbBlock( Block: in Bitstream; S: in out State );

  -- Keccak magic bitwords
  RC : constant Round_Constants :=
    (
--   16#0000_0000_0000_0001#, round 0
     (0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,1),
--   16#0000_0000_0000_8082#, round 1
     (0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 1,0,0,0, 0,0,1,0),
--   16#8000_0000_0000_808A#, round 2
     (1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 1,0,0,0, 1,0,1,0),

--   16#8000_0000_8000_8000#, round 3
     (1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0),

--   16#0000_0000_0000_808B#, round 4
     (0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 1,0,0,0, 1,0,1,1),

--   16#0000_0000_8000_0001#, round 5
     (0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,1),

--   16#8000_0000_8000_8081#, round 6
     (1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 1,0,0,0, 0,0,0,1),

--   16#8000_0000_0000_8009#, round 7
     (1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 1,0,0,1),

--   16#0000_0000_0000_008A#, round 8
     (0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 1,0,0,0, 1,0,1,0),

--   16#0000_0000_0000_0088#, round 9
     (0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 1,0,0,0, 1,0,0,0),

--   16#0000_0000_8000_8009#, round 10
     (0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 1,0,0,1),

--   16#0000_0000_8000_000A#, round 11
     (0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 1,0,1,0),

--   16#0000_0000_8000_808B#, round 12
     (0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 1,0,0,0, 1,0,1,1),

--   16#8000_0000_0000_008B#, round 13
     (1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 1,0,0,0, 1,0,1,1),

--   16#8000_0000_0000_8089#, round 14
     (1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 1,0,0,0, 1,0,0,1),

--   16#8000_0000_0000_8003#, round 15
     (1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,1,1),

--   16#8000_0000_0000_8002#, round 16
     (1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,1,0),

--   16#8000_0000_0000_0080#, round 17
     (1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 1,0,0,0, 0,0,0,0),

--   16#0000_0000_0000_800A#, round 18
     (0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 1,0,1,0),

--   16#8000_0000_8000_000A#, round 19
     (1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 1,0,1,0),

--   16#8000_0000_8000_8081#, round 20
     (1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 1,0,0,0, 0,0,0,1),

--   16#8000_0000_0000_8080#, round 21
     (1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 1,0,0,0, 0,0,0,0),

--   16#0000_0000_8000_0001#, round 22
     (0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,1),

--   16#8000_0000_8000_8008#, round 23
     (1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
      1,0,0,0, 0,0,0,0, 0,0,0,0, 1,0,0,0)
    );

  -- Keccak transformations of the internal state
  function Theta ( Input       : in State ) return State;
  function Rho   ( Input       : in State ) return State;
  function Pi    ( Input       : in State ) return State;
  function Chi   ( Input       : in State ) return State;
  function Iota  ( Round_Const : in Bitword; Input : in State ) return State;

  -- Keccak function with block width currently 1600 (Width constant above)
  -- It simply applies *all* keccak transformations in the correct order, using
  -- the keccak magic numbers (round constants) as per keccak reference
  function Keccak_Function(Input: in State) return State;

end SMG_Bit_Keccak;

The actual implementation of the bit-level Keccak is in eucrypt/smg_bit_keccak/smg_bit_keccak.adb:

 -- S.MG, 2018

package body SMG_Bit_Keccak is

  -- public function, sponge
  procedure Sponge( Input      : in Bitstream;
                    Block_Len  : in Keccak_Rate;
                    Output     : out Bitstream) is
    Internal  : State := (others => (others => (others => 0)));
  begin
    --absorb input into sponge in a loop on available blocks, including padding
    declare
      -- number of input blocks after padding (between 2 and block_len bits pad)
      Padded_Blocks : constant Positive := 1 + (Input'Length + 1) / Block_Len;
      Padded        : Bitstream ( 1 .. Padded_Blocks * Block_Len );
      Block         : Bitstream ( 1 .. Block_Len );
    begin
      -- initialise Padded with 0 everywhere
      Padded := ( others => 0 );
      -- copy and pad input with rule 10*1
      Padded( Padded'First .. Padded'First + Input'Length - 1 ) := Input;
      Padded( Padded'First + Input'Length )                     := 1;
      Padded( Padded'Last )                                     := 1;

      -- loop through padded input and absorb block by block into sponge
      -- padded input IS a multiple of blocks, so no stray bits left
      for B in 0 .. Padded_Blocks - 1 loop
        -- first get the current block to absorb
        Block   := Padded( Padded'First + B * Block_Len ..
                           Padded'First + (B+1) * Block_Len - 1 );
        AbsorbBlock( Block, Internal );
        -- scramble state with Keccak function
        Internal := Keccak_Function( Internal );

      end loop; -- end absorb loop for blocks
    end; -- end absorb stage

    --squeeze required bits from sponge in a loop as needed
    declare
      -- full blocks per output
      BPO     : constant Natural := Output'Length / Block_Len;
      -- stray bits per output
      SPO     : constant Natural := Output'Length mod Block_Len;
      Block   : Bitstream( 1 .. Block_Len );
    begin
      -- squeeze block by block (if at least one full block is needed)
      for I in 0 .. BPO - 1 loop
        SqueezeBlock( Block, Internal );
        Output( Output'First + I * Block_Len ..
                Output'First + (I + 1) * Block_Len -1) := Block;

        -- scramble state
        Internal := Keccak_Function( Internal );
      end loop;  -- end squeezing full blocks

      -- squeeze any partial block needed (stray bits)
      if SPO > 0 then
        SqueezeBlock( Block, Internal );
        Output( Output'Last - SPO + 1 .. Output'Last ) :=
                Block( Block'First .. Block'First + SPO - 1 );
      end if; -- end squeezing partial last block (stray bits)

    end; -- end squeeze stage

  end Sponge;

  -- helper procedures for sponge absorb/squeeze

  -- NO scramble here, this will absorb ALL given block, make sure it fits!
  procedure AbsorbBlock( Block: in Bitstream; S: in out State ) is
    X, Y                  : XYCoord;
    Z                     : ZCoord;
  begin
    -- xor current block, bit by bit, into first Block'Length bits of state
    First_Pos( X, Y, Z);
    for B of Block loop
      -- xor this bit into the state
      S( X, Y )( Z ) := S( X, Y )( Z ) + B;
      -- move to next bit of the state
      Next_Pos( X, Y, Z );
    end loop;
  end AbsorbBlock;

  -- NO scramble here, this will squeeze Block'Length bits out of *same* state S
  procedure SqueezeBlock( Block: out Bitstream; S: in State) is
    X, Y    : XYCoord;
    Z       : ZCoord;
  begin
    -- start with first position of the state
    First_Pos( X, Y, Z );
    -- squeeze bit by bit, as many bits as needed to fill Block
    for I in Block'Range loop
      -- squeeze current bit from state
      Block( I ) := S( X, Y )( Z );
      -- advance to next bit of state
      Next_Pos( X, Y, Z);
    end loop;
  end SqueezeBlock;

  -- moving one bit forwards in Keccak state
  procedure Next_Pos( X : in out XYCoord;
                      Y : in out XYCoord;
                      Z : in out ZCoord
                    ) is
  begin
    Z := Z - 1;
    if Z = ZCoord'Last then
      X := X + 1;
      if X = XYCoord'First then
        Y := Y + 1;
      end if;
    end if;
  end Next_Pos;

  -- position of first bit in Keccak state
  procedure First_Pos( X : out XYCoord;
                       Y : out XYCoord;
                       Z : out ZCoord
                     ) is
  begin
    X := XYCoord'First;
    Y := XYCoord'First;
    Z := ZCoord'Last;
  end First_Pos;

  -- operations with Bitwords
  function BWRotate_Left( Input: in Bitword;
                          Count: in Natural)
                          return Bitword is
    Output  : Bitword;
    Advance : constant ZCoord := ZCoord( Count mod Z_Length );
  begin
    for I in ZCoord loop
      Output( I ) := Input( I + Advance );
    end loop;
    return Output;
  end BWRotate_Left;

  -- Keccak transformations of the internal state
  function Theta ( Input       : in State) return State is
    Output : State;
    S1, S2 : Bit;
  begin
    for X in XYCoord loop
      for Y in XYCoord loop
        for Z in ZCoord loop
          S1 := 0;
          S2 := 0;
          for Y1 in XYCoord loop
            S1 := S1 + Input( X - 1, Y1 )( Z );
            -- Z direction is opposite to the one assumed in the ref so Z + 1
            S2 := S2 + Input( X + 1, Y1 )( Z + 1 );
          end loop;
          Output( X, Y )(Z) := Input( X, Y )( Z ) + S1 + S2;
        end loop;
      end loop;
    end loop;

    return Output;
  end Theta;

  function Rho   ( Input       : in State) return State is
    Output      : State;
    X, Y, Old_Y : XYCoord;
  begin
    Output( 0, 0) := Input( 0, 0);
    X := 1;
    Y := 0;

    for T in 0 .. 23 loop
      Output(X, Y) := BWRotate_Left(Input(X,Y), (T+1)*(T+2)/2);
      Old_Y := Y;
      Y := 2 * X + 3 * Y;
      X := Old_Y;
    end loop;
    return Output;
  end Rho;

  function Pi    ( Input       : in State) return State is
    Output : State;
  begin
    for X in XYCoord loop
      for Y in XYCoord loop
        Output( Y, 2 * X + 3 * Y ) := Input( X, Y );
      end loop;
    end loop;

    return Output;
  end Pi;

  function Chi   ( Input       : in State) return State is
    Output : State;
  begin
    for Y in XYCoord loop
      for X in XYCoord loop
        for Z in ZCoord loop
          Output(X, Y)(Z) :=   Input( X, Y )( Z ) +
                             ( Input( X + 1, Y )( Z ) + 1 ) *
                             ( Input( X + 2, Y )( Z )     );
        end loop;
      end loop;
    end loop;

    return Output;
  end Chi;

  function Iota  ( Round_Const : in Bitword; Input : in State) return State is
    Output : State;
  begin
    Output := Input;
    for Z in ZCoord loop
      Output( 0, 0 )(Z) := Input( 0, 0 )( Z ) + Round_Const( Z );
    end loop;
    return Output;
  end Iota;

  function Keccak_Function(Input: in State) return State is
    Output: State;
  begin
    Output := Input;
    for I in Round_Index loop
      Output := Iota(RC(I), Chi(Pi(Rho(Theta(Output)))));
    end loop;

    return Output;
  end Keccak_Function;

end SMG_Bit_Keccak;

Note in the above that the Sponge, AbsorbBlock and SqueezeBlock methods are rather simpler than they used to be. Absorbing/squeezing one block is now simply a matter of reading bit by bit from the Keccak state from the starting position and up to the block length. While one could arguably read Bitword by Bitword rather than bit by bit, I kept it bit by bit for clarity for now. Moreover, when absorbing a block, the xor would still need to be performed bit by bit essentially since the Z dimension of the state is defined as an array of bits rather than a value.

The Next_Pos procedure above takes advantage of the fact that all coordinates in a Keccak sponge are modular types so they will automatically wrap-around as needed. Consequently, to advance one position further, it's enough to decrease the Z coordinate (by convention movement is in the negative direction of the Z axis here) and then, if needed, to increase X in order to move on to a different Bitword and/or possibly Y as well in order to move on to a different plane of the cuboid too.

The First_Pos procedure simply sets X, Y and Z to match the convention that movement in a Keccak state happens in the positive direction of the X and Y axes but in the negative direction of the Z axis.

The Theta transformation of the state is a direct implementation of Theta's definition, working directly bit by bit. By contrast, the implementation from the previous chapter used the algorithm given in the reference paper, which took advantage of the fact that lanes were represented as values. However, this bit-level implementation does not gain anything from using that algorithm (on the contrary, it would end up with even more operations) so the direct implementation of Theta's definition is preferred. Note that the "Z-1" from Theta's definition effectively means "the bit before Z", which translates to "Z+1" with the current convention for movement on the Z axis. The rest of the Keccak transformations remain quite similar to the previous implementation.

As usual, there are some automated tests using existing test vectors for the transformations as well as for the sponge itself. The long of it is in eucrypt/smg_bit_keccak/tests/smg_bit_keccak-test.adb:

with SMG_Bit_Keccak; use SMG_Bit_Keccak;
with Ada.Exceptions; use Ada.Exceptions;
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Strings.Fixed; use Ada.Strings.Fixed;
with Interfaces; use Interfaces;

procedure SMG_Bit_Keccak.Test is
  --types
  type Keccak_Perms is ( None, Theta, Rho, Pi, Chi, Iota );
  type Test_Vector  is array( Keccak_Perms ) of State;
  type Test_Round   is array( Round_Index ) of Test_Vector;
  subtype Hexstring is String( 1 .. Z_Length / 4 ); --word as hex string
  subtype Bitstring is String( 1 .. Z_Length ); -- word as binary string
  type Bithex       is array( 0 .. 3 ) of Bit;

  -- helper methods
  procedure HexCharToBit( H : in Character; B: out Bithex) is
  begin
    case H is
      when '0' => B := (0, 0, 0, 0);
      when '1' => B := (0, 0, 0, 1);
      when '2' => B := (0, 0, 1, 0);
      when '3' => B := (0, 0, 1, 1);
      when '4' => B := (0, 1, 0, 0);
      when '5' => B := (0, 1, 0, 1);
      when '6' => B := (0, 1, 1, 0);
      when '7' => B := (0, 1, 1, 1);
      when '8' => B := (1, 0, 0, 0);
      when '9' => B := (1, 0, 0, 1);
      when 'A' => B := (1, 0, 1, 0);
      when 'B' => B := (1, 0, 1, 1);
      when 'C' => B := (1, 1, 0, 0);
      when 'D' => B := (1, 1, 0, 1);
      when 'E' => B := (1, 1, 1, 0);
      when 'F' => B := (1, 1, 1, 1);
      when others => null;
    end case;
  end HexCharToBit;

  function HexToBitword( H: in Hexstring ) return Bitword is
    BW         : Bitword;
    B1, B2     : Bithex;
    PosH, PosB : Natural;
  begin
    -- read the hexstring octet by octet
    for I in 1 .. Z_Length / 8 loop
      PosH := Integer(H'First) + (I - 1) * 2;
      HexCharToBit( H(PosH), B1 );
      HexCharToBit( H(PosH + 1), B2 );

      PosB := Integer(BW'First) + (I - 1) * 8;
      for J in 0 .. 3 loop
        BW ( ZCoord(PosB + J) ) := B1(J);
        BW ( ZCoord(PosB + 4 + J) ) := B2(J);
      end loop;
    end loop;
    return BW;
  end HexToBitword;

  -- prints one bitword as an array of bits
  procedure print_bitword( B: in Bitword ) is
    bstr: Bitstring;
  begin
    for I in ZCoord loop
      if B( I ) > 0 then
        bstr( Bitstring'First + Integer(I) ) := '1';
      else
        bstr( Bitstring'First + Integer(I) ) := '0';
      end if;
    end loop;
    Put(bstr);
  end print_bitword;

  -- prints a keccak state, bitword by bitword
  procedure print_state( S: in State; Title: in String) is
  begin
    Put_Line("---------" & Title & "---------");
    for Y in XYCoord loop
      for X in XYCoord loop
        Put( "S(" & XYCoord'Image(X) & ", " & XYCoord'Image(Y) & ")= ");
        print_bitword( S( X, Y ) );
        new_line(1);
      end loop;
    end loop;
  end print_state;

  function read_state(File: in FILE_TYPE; Oct: Positive :=8) return State is
    S: State;
    Line1: String := "0000000000000000 " &
                     "0000000000000000 " &
                     "0000000000000000 " &
                     "0000000000000000 " &
                     "0000000000000000";
    StartPos, EndPos: Positive;
    Len: Positive := Oct*2;
    HStr: Hexstring;
  begin
    for Y in XYCoord loop
      Line1 := Get_Line(File);
      StartPos := Line1'First;
      EndPos := StartPos + Len-1;

      for X in XYCoord loop
        HStr := Line1( StartPos .. EndPos );
        S( X, Y ) := HexToBitword(HStr);
        StartPos := EndPos + 2;	--one space to skip
        EndPos := StartPos + Len - 1;
      end loop;
    end loop;
    return S;
  end read_state;

  --reads a full test round from specified file (pre-defined format)
  function read_from_file (filename : in String;
                           T        : out Test_Round)
                           return Boolean is
    file: FILE_TYPE;
    InputMarker: String := "lanes as 64-bit words:";
    octets: Positive := 8;
    RoundNo: Round_Index;
  begin
    -- try to open the input file
    begin
      open(file, In_File, filename);
    exception
      when others =>
        Put_Line(Standard_Error,
                 "Can not open the file '" & filename & "'. Does it exist?");
        return False;
    end;

  -- find & read input state first
    RoundNo := -1;
    loop
      declare
        Line: String := Get_Line(file);
      begin
        --check if this is test data of any known kind
        if index(Line, InputMarker, 1) > 0 then
          T(0)(None) := read_state(file, octets);
          print_state(T(0)(None), "Read Input State");
        elsif index(Line, "Round ", 1) > 0 then
          RoundNo := RoundNo +1;
        elsif index(Line, "theta", 1) > 0 then
          T(RoundNo)(Theta) := read_state(file, octets);
          if (RoundNo > 0) then
            T(RoundNo)(None) := T(RoundNo-1)(Iota);  -- previous state as input
          end if;
        elsif index(Line, "rho", 1) > 0 then
          T(RoundNo)(Rho) := read_state(file, octets);
        elsif index(Line, "pi", 1) > 0 then
          T(RoundNo)(Pi) := read_state(file, octets);
        elsif index(Line, "chi", 1) > 0 then
          T(RoundNo)(Chi) := read_state(file, octets);
        elsif index(Line, "iota", 1) > 0 then
          T(RoundNo)(Iota) := read_state(file, octets);
        end if;
        exit when End_Of_File(file);
      end;
    end loop;
    Close(file);
    return True;
  end read_from_file;

  -- performs one single round of Keccak, step by step
  -- each permutation is tested separately
  -- test fails with exception raised at first output not matching expected
  procedure test_one_round(T: Test_Vector; Round: Round_Index) is
    Input: State;
    Expected: State;
    Output: State;
    Test_One_Round_Fail: Exception;
  begin
    Input := T(None);
    for I in Keccak_Perms range Theta..Iota loop
      Expected := T(I);
      case I is
        when Theta  => Output := SMG_Bit_Keccak.Theta(Input);
        when Rho    => Output := SMG_Bit_Keccak.Rho(Input);
        when Pi     => Output := SMG_Bit_Keccak.Pi(Input);
        when Chi    => Output := SMG_Bit_Keccak.Chi(Input);
        when Iota   => Output := SMG_Bit_Keccak.Iota(RC(Round), Input);
        when others => null;
      end case;

      if (Output /= Expected) then
        print_state(Output, "----------real output-------");
        print_state(Expected, "----------expected output--------");
        raise Test_One_Round_Fail;
      else
        Put_Line("PASSED: " & Keccak_Perms'Image(I));
      end if;
      -- get ready for next permutation
      Input := Expected;
    end loop;
  end test_one_round;

  procedure test_bwrotate_left( Input    : in Bitword;
                                N        : Positive;
                                Expected : in Bitword) is
    Output: Bitword;
  begin
    Output := BWRotate_Left( Input, N );
    if Output /= Expected then
      Put_Line("FAIL: test bitword rotate left");
      Put_Line("Output:");
      print_bitword( Output );
      Put_Line("Expected:");
      print_bitword( Expected );
    else
      Put_Line("PASS: test bitword rotate left");
    end if;
  end test_bwrotate_left;

  procedure test_keccak_function(T: in Test_Round) is
    S: State;
  begin
    Put_Line("---Full Keccak Function test---");
    S := Keccak_Function(T(Round_Index'First)(None));
    if S /= T(Round_Index'Last)(Iota) then
      Put_Line("FAILED: full keccak function test");
    else
      Put_Line("PASSED: full keccak function test");
    end if;
  end test_keccak_function;

  procedure test_sponge is
    Bitrate   : constant Keccak_Rate := 1344;
    Input1    : Bitstream( 1 .. 5 ) := (1, 1, 0, 0, 1);
    Input2    : Bitstream( 1 .. 30) := (1, 1, 0, 0,
                                        1, 0, 1, 0,
                                        0, 0, 0, 1,
                                        1, 0, 1, 0,
                                        1, 1, 0, 1,
                                        1, 1, 1, 0,
                                        1, 0, 0, 1,
                                        1, 0);
    Hex       : array(0..15) of Character := ("0123456789ABCDEF");
    C         : Natural;
    ExpHex1   : constant String :=
              "CB7FFB7CE7572A06C537858A0090FC2888C3C6BA9A3ADAB4"&
              "FE7C9AB4EFE7A1E619B834C843A5A79E23F3F7E314AA597D"&
              "9DAD376E8413A005984D00CF954F62F59EF30B050C99EA64"&
              "E958335DAE684195D439B6E6DFD0E402518B5E7A227C48CF"&
              "239CEA1C391241D7605733A9F4B8F3FFBE74EE45A40730ED"&
              "1E2FDEFCCA941F518708CBB5B6D5A69C30263267B97D7B29"&
              "AC87043880AE43033B1017EFB75C33248E2962892CE69DA8"&
              "BAF1DF4C0902B16C64A1ADD42FF458C94C4D3B0B32711BBA"&
              "22104989982543D1EF1661AFAF2573687D588C81113ED7FA"&
              "F7DDF912021FC03D0E98ACC0200A9F7A0E9629DBA33BA0A3"&
              "C03CCA5A7D3560A6DB589422AC64882EF14A62AD9807B353"&
              "8DEE1548194DBD456F92B568CE76827F41E0FB3C7F25F3A4"&
              "C707AD825B289730FEBDFD22A3E742C6FB7125DE0E38B130"&
              "F3059450CA6185156A7EEE2AB7C8E4709956DC6D5E9F99D5"&
              "0A19473EA7D737AC934815D68C0710235483DB8551FD8756"&
              "45692B4E5E16BB9B1142AE300F5F69F43F0091D534F372E1"&
              "FFC2E522E71003E4D27EF6ACCD36B2756FB5FF02DBF0C96B"&
              "CAE68E7D6427810582F87051590F6FB65D7B948A9C9D6C93"&
              "AF4562367A0AD79109D6F3087C775FE6D60D66B74F8D29FB"&
              "4BA80D0168693A748812EA0CD3CA23854CC84D4E716F4C1A"&
              "A3B340B1DED2F304DFDBACC1D792C8AC9A1426913E3F67DB"&
              "790FD5CFB77DAA29";
    ExpHex2   : constant String :=
              "35F4FBA9D29E833B1DB17CA2077C11B3348C8AF2A29344AE"&
              "6AAA1F63FC4536CE795C54F0359953B97CEA27491691E93E"&
              "E4829EAB388211E6E8BD3EDA74366D0947DFA3D65D127593"&
              "0AFC42884B7324717DCB003D7B3B5C2E92B84F478CC8DBB5"&
              "174EB4BAC6207BD22E56FCC6E5FB11BC598FDBE6208913CE"&
              "34BC03837FDBFCDFF9407D948531B5FC7FFE7029F30E7EDC"&
              "F9282F0A630FA99839776F5EEA485449F62E421552AF9571";
    HexStr1   : String( 1 .. ExpHex1'Length );
    Output1   : Bitstream( 1 .. ExpHex1'Length * 4 );
    HexStr2   : String( 1 .. ExpHex2'Length );
    Output2   : Bitstream( 1 .. ExpHex2'Length * 4 );
    Error     : Natural;
    Pos       : Natural;
    HexPos    : Natural;
  begin

  -- test 1
    Put_Line("---sponge test 1---");
    Sponge(Input1, Bitrate, Output1);
    Put_Line("Input is:");
    for I of Input1 loop
      Put(Bit'Image(I));
    end loop;
    new_line(1);

    Put_Line("Output is:");
    for I of Output1 loop
      Put(Bit'Image(I));
    end loop;
    new_line(1);

    Error := 0;
    for I in 1..Output1'Length/4 loop
      Pos := Output1'First + (I-1)*4;
      C := Natural( Output1( Pos ) ) +
           Natural( Output1( Pos + 1 ) ) * 2 +
           Natural( Output1( Pos + 2 ) ) * 4 +
           Natural( Output1( Pos + 3 ) ) * 8;
      HexPos := I + 2 * ( I mod 2 ) - 1;
			Hexstr1( HexPos ) := Hex(C);
      if Hexstr1( HexPos ) /= ExpHex1( HexPos ) then
        Error := Error + 1;
      end if;
    end loop;
    Put_Line("Expected: ");
    Put_Line(ExpHex1);
    Put_Line("Obtained: ");
    Put_Line(Hexstr1);
    Put_Line("Errors found: " & Natural'Image(Error));

  -- test 2
    Put_Line("---sponge test 2---");
    Sponge(Input2, Bitrate, Output2);
    Put_Line("Input is:");
    for I of Input2 loop
      Put(Bit'Image(I));
    end loop;
    new_line(1);

    Put_Line("Output is:");
    for I of Output2 loop
      Put(Bit'Image(I));
    end loop;
    new_line(1);

    Error := 0;
    for I in 1..Output2'Length/4 loop
      Pos := Output2'First + (I-1)*4;
      C := Natural( Output2( Pos ) ) +
           Natural( Output2( Pos + 1 ) ) * 2 +
           Natural( Output2( Pos + 2 ) ) * 4 +
           Natural( Output2( Pos + 3 ) ) * 8;
      HexPos := I + 2 * ( I mod 2 ) - 1;
			Hexstr2( HexPos ) := Hex(C);
      if Hexstr2( HexPos ) /= ExpHex2( HexPos ) then
        Error := Error + 1;
      end if;
    end loop;
    Put_Line("Expected: ");
    Put_Line(ExpHex2);
    Put_Line("Obtained: ");
    Put_Line(Hexstr2);
    Put_Line("Errors found: " & Natural'Image(Error));

  end test_sponge;

  -- end of helper methods

	--variables
  T     : Test_Round;
  BW, E : Bitword;
begin
  BW:=(0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
       0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,1,0);
  E:=(0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
       0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 1,0,0,0);
  test_bwrotate_left(BW, 2, E);

  Put_Line("-----Testing with zero state as input------");
  if (not read_from_file("testvectorszero.txt", T)) then
    return;
  end if;

  for I in Round_Index loop
    Put_Line("---round " & Round_Index'Image(I) & "---");
    test_one_round(T(I), I);
  end loop;

  -- test also Keccak_Function as a whole --
  test_keccak_function(T);

  Put_Line("-----Testing with non-zero state as input------");
  if (not read_from_file("testvectorsnonzero.txt", T)) then
    return;
  end if;

  for I in Round_Index loop
    Put_Line("---round " & Round_Index'Image(I) & "---");
    test_one_round(T(I), I);
  end loop;

  -- test also Keccak_Function as a whole --
  test_keccak_function(T);

  -- test Sponge construction
  test_sponge;

end SMG_Bit_Keccak.Test;

To test both Keccak transformations and the sponge construction itself, I used this time only data from the existing test vectors for Keccak. While the testing of the Keccak transformations themselves did not provide any surprises, the testing of the sponge did provide a bit of a headache due to the fact that existing test data is really meant at octet (byte) level rather than bit level - both as input and as output of the sponge. While the "input bits" are given explicitly, the output is specified only in hexadecimal and it turns out that the squeezing from Keccak is *also* meant to be one octet at a time rather than 1 bit at a time. This wouldn't be a problem, of course, if it weren't for the different order in which the bits end up in the output stream depending on how you squeeze them from the state. To give an example: the "octet" 0011 0101 (or 35 in hex) is squeezed as 10101100, basically back to front.

At the moment this slight issue of bit order is "handled" outside of Keccak itself based on the reasoning that octet-level interpretation is outside of Keccak itself and therefore entirely up to the user of Keccak - they can use any rule they want regarding the value represented by 8 (or any other number) of squeezed bits. As long as the output bits from Keccak are correct and in consistent order, the implementation is correct from my point of view. However, there are of course a few other potential approaches to this and I'm still considering them. For now though, here is the .vpatch with the full bit-level Keccak transformations+sponge as described above, together with the corresponding signature:

January 25, 2018

EuCrypt Chapter 7: Keccak Sponge

Filed under: EuCrypt — Diana Coman @ 4:40 p.m.

~ This is part of the EuCrypt series. Start with Introducing EuCrypt. ~

Using the Keccak transformations from Chapter 6, I can now finally implement the actual Keccak sponge1 that is useful for EuCrypt mainly as a hashing function. To start with, I should say that "sponge" really doesn't strike me as a very useful name or sort of metaphor to use - the whole thing is still essentially a Rubik's cuboid of bits that can be loaded with input bits (through a xor operation), scrambled (by means of the set of transformations discussed in the previous chapter) and read as needed. I get it that Rubik's is trademarked and one wants a "new" name/metaphor and what not. I also totally agree that an actual sponge-organism absorbing/sucking and spitting bits would make at least a livelier sort of hash function but Keccak is what it is and that's a very cold mechanism of scrambling bits (in a 3D sort of way at least, granted) repeatedly. Anyway, to preserve history and make it easy to follow the match between implementation and reference document, I'll use the authors' terminology to denote the thing although I will keep using the Rubik's cube analogy as needed, to explain how it actually works.

The Keccak sponge takes as input a stream of bits of arbitrary length and spits as output another stream of bits of any specific length that is requested. For this reason, the ideal Keccak sponge implementation does not care at all about endianness: bits are absorbed as they come, whether most significant or least significant first. This approach has the significant advantage of clarity: one can easily follow the stream of bits as they make it into the sponge's structure and then as they are squeezed out. Moreover, there is no need to cater explicitly for big endian or little endian machines as such. The choice of Ada as programming language for this implementation makes it also particularly straightforward to model precisely what is described i.e. a transformation at bit level rather than octet level, hence unconcerned with endianness. However, it turns out that my current implementation is not yet there because at the moment it relies on the round constants (for the iota transformation in the previous chapter) as numbers (hence with bit-level representation dependent on endianness) rather than bit streams. Consequently, I'll have to go back over this in next chapter and change everything to bit-level operation. For now though, let's see what the current sponge looks like anyway.

The Keccak sponge itself comes with an important parameter, namely bitrate: this represents the number of bits that the sponge can absorb or squeeze in one single iteration, meaning without needing to scramble the state. The way the sponge works is that it first pads the input stream with 10*1 (so a minimum of 2 bits set to 1 with as many or as few 0 in between as needed) so that the length of the input stream is a multiple of the requested bitrate. Then it splits the padded input stream into chunks of bitrate length and it proceeds to "absorb" those chunks one by one (i.e. xor in the first part of its state), scrambling the state after each chunk. When all the input stream has been absorbed, the sponge moves onto the "squeezing" stage where it simply reads bits from the first part of its state, scrambling the state again after each bitrate bits read. Essentially, the bitrate parameter stands for the number of bits of internal state that are used directly for absorbing and squeezing between two consecutive scramblings of the Keccak state.

The bitrate parameter decides indirectly a sponge's "capacity". This is the remaining number of bits of the state and it can be calculated as width - bitrate. The capacity of a sponge is the number of bits that are never directly read at a squeeze nor directly xored with input bits when absorbing input. The capacity bits contribute otherwise to the scrambling of state at each iteration of the sponge and they represent therefore the "secret" part of a sponge's internal state. Consequently, there is a tradeoff between a higher bitrate that would make the Keccak hashing faster in principle and a higher capacity that increases the "secret" part of the sponge. Note that a sponge's bitrate is a matter of choice at each and every use of the sponge - there is no reason really for this to be fixed necessarily by the Keccak implementation itself. Consequently, my current approach is to have it as a parameter of the Sponge procedure itself, to be decided on by the calling code, at each call. Because this bitrate is effectively used to split the input stream into blocks of equal length, I'm calling the parameter Block_Len, specifying however that it has to be a value of the Keccak_Rate type that models the constraints of a valid bitrate (0 < bitrate < width).

To model the Keccak sponge with its specific input/output requirements, I first define a few new types and subtypes, in eucrypt/smg_keccak/smg_keccak.ads:

  -- rate can be chosen by caller at each call, between 1 and width of state
  -- higher rate means sponge "eats" more bits at a time but has fewer bits in
  --   the "secret" part of the state (i.e. lower capacity)
  subtype Keccak_Rate is Positive range 1..Width;  -- capacity = width - rate

  type Bit is mod 2;
  type Bitstream is array( Natural range <> ) of Bit; -- any length; message
  subtype Bitword is Bitstream( 0..Z_Length - 1 ); -- bits of one state "word"

Note in the above the Bitword subtype of Bitstream: this is a stream of bits of precisely same length as the Z dimension of the Keccak cuboid (EuCrypt uses 64 bits as the value of the Z_Length constant in smg_keccak code). In other words, a Bitword is the bistream equivalent of the numerical "ZWord" value stored at any (X,Y) position of the Keccak sponge. Consequently, there is a need for some ways to convert between the two (Bitword to ZWord and back):

  -- type conversions
  function BitsToWord( Bits: in Bitword ) return ZWord;
  function WordToBits( Word: in ZWord ) return Bitword;

The type conversions are provided as above in the public part of the SMG_Keccak package for now, since they might conceivably be useful to users of Keccak too, at some point. However, as I'll change the Keccak implementation to get rid of multi-octet numbers and work simply at bit level *everywhere*, it's quite likely that those methods will simply be discarded as they are not needed anymore. For the time being though, once those conversions are in place, there is only the sponge function itself to define immediately afterwards:

  -- public function, the sponge itself
  -- Keccak sponge structure using Keccak_Function, Pad and a given bitrate;
  -- Input - the stream of bits to hash (the message)
  -- Block_Len - the bitrate to use; this is effectively the block length
  --             for splitting Input AND squeezing output between scrambles
  -- Output - a bitstream of desired size for holding output
  procedure Sponge(Input      : in Bitstream;
                   Block_Len  : in Keccak_Rate;
                   Output     : out Bitstream);

Finally, in the same eucrypt/smg_keccak/smg_keccak.ads, I added to the private part of the SMG_Keccak package two new procedures, SqueezeBlock and AbsorbBlock that are used by the Sponge function. Note that these two procedures are not really meant for external use, mainly because they represent intermediate steps in the sponge operation and therefore are not of much use by themselves.

  -- this will squeeze Block'Length bits out of state S
  -- NO scramble of state in here!
  -- NB: make SURE that Block'Length is the correct bitrate for this sponge
  -- in particular, Block'Length should be a correct bitrate aka LESS than Width
  procedure SqueezeBlock( Block: out Bitstream; S: in State);

  -- This absorbs into sponge the given block, modifying the state accordingly
  -- NO scramble of state in here so make sure the whole Block fits in state!
  -- NB: make SURE that Block'Length is *the correct bitrate* for this sponge
  -- in particular, Block'Length should be a correct bitrate aka LESS than Width
  procedure AbsorbBlock( Block: in Bitstream; S: in out State );

The detailed implementations of all the above new procedures and functions are in eucrypt/smg_keccak/smg_keccak.adb:

-- public function, sponge
  procedure Sponge( Input      : in Bitstream;
                    Block_Len  : in Keccak_Rate;
                    Output     : out Bitstream) is
    Internal  : State := (others => (others => 0));
  begin
    --absorb input into sponge in a loop on available blocks, including padding
    declare
      -- number of input blocks after padding (between 2 and block_len bits pad)
      Padded_Blocks : constant Positive := 1 + (Input'Length + 1) / Block_Len;
      Padded        : Bitstream ( 1 .. Padded_Blocks * Block_Len );
      Block         : Bitstream ( 1 .. Block_Len );
    begin
      -- initialise Padded with 0 everywhere
      Padded := ( others => 0 );
      -- copy and pad input with rule 10*1
      Padded( Padded'First .. Padded'First + Input'Length - 1 ) := Input;
      Padded( Padded'First + Input'Length )                     := 1;
      Padded( Padded'Last )                                     := 1;

      -- loop through padded input and absorb block by block into sponge
      -- padded input IS a multiple of blocks, so no stray bits left
      for B in 0 .. Padded_Blocks - 1 loop
        -- first get the current block to absorb
        Block   := Padded( Padded'First + B * Block_Len ..
                           Padded'First + (B+1) * Block_Len - 1 );
        AbsorbBlock( Block, Internal );
        -- scramble state with Keccak function
        Internal := Keccak_Function( Internal );

      end loop; -- end absorb loop for blocks
    end; -- end absorb stage

    --squeeze required bits from sponge in a loop as needed
    declare
      -- full blocks per output
      BPO     : constant Natural := Output'Length / Block_Len;
      -- stray bits per output
      SPO     : constant Natural := Output'Length mod Block_Len;
      Block   : Bitstream( 1 .. Block_Len );
    begin
      -- squeeze block by block (if at least one full block is needed)
      for I in 0 .. BPO - 1 loop
        SqueezeBlock( Block, Internal );
        Output( Output'First + I * Block_Len ..
                Output'First + (I + 1) * Block_Len -1) := Block;

        -- scramble state
        Internal := Keccak_Function( Internal );
      end loop;  -- end squeezing full blocks

      -- squeeze any partial block needed (stray bits)
      if SPO > 0 then
        SqueezeBlock( Block, Internal );
        Output( Output'Last - SPO + 1 .. Output'Last ) :=
                Block( Block'First .. Block'First + SPO - 1 );
      end if; -- end squeezing partial last block (stray bits)

    end; -- end squeeze stage
  end Sponge;

  -- convert from a bitstream of ZWord size to an actual ZWord number
  -- first bit of bitstream will be most significant bit of ZWord
  function BitsToWord( Bits: in Bitword ) return ZWord is
    W: ZWord;
    P: Natural;
  begin
    W := 0;
    P := 0;
    for I in reverse Bitword'Range loop
      W := W + ZWord( Bits(I) ) * ( 2**P );
      P := P + 1;
    end loop;
    return W;
  end BitsToWord;

  -- convert from a ZWord (lane of state) to a bitstream of ZWord size
  -- most significant bit of ZWord will be left most bit of bitstream
  function WordToBits( Word: in ZWord ) return Bitword is
    Bits: Bitword := (others => 0);
    W: ZWord;
  begin
    W := Word;
    for I in reverse Bitword'Range loop
      Bits( I ) := Bit( W mod 2 );
      W := W / 2;
    end loop;
    return Bits;
  end WordToBits;

-- helper procedures for sponge absorb/squeeze
  -- NO scramble here, this will absorb ALL given block, make sure it fits!
  procedure AbsorbBlock( Block: in Bitstream; S: in out State ) is
    WPB: constant Natural := Block'Length / Z_Length;   -- words per block
    SBB: constant Natural := Block'Length mod Z_Length; -- stray bits
    FromPos, ToPos        : Natural;
    X, Y                  : XYCoord;
    Word                  : ZWord;
    BWord                 : Bitword;
  begin
    -- xor current block into first Block'Length bits of state
    -- a block can consist in more than one word
    X := 0;
    Y := 0;
    for I in 0..WPB-1 loop
      FromPos := Block'First + I * Z_Length;
      ToPos   := FromPos + Z_Length - 1;
      Word := BitsToWord( Block( FromPos .. ToPos ) );
      S( X, Y ) := S( X, Y ) xor Word;
      -- move on to next word in state
      X := X + 1;
      if X = 0 then
        Y := Y + 1;
      end if;
    end loop;
    -- absorb also any remaining bits from block
    if SBB > 0 then
      ToPos := Block'Last;
      FromPos := ToPos - SBB + 1;
      BWord := (others => 0);
      BWord(Bitword'First .. Bitword'First + SBB - 1) := Block(ToPos..FromPos);
      Word := BitsToWord( BWord );
      S( X, Y ) := S( X, Y ) xor Word;
    end if;
  end AbsorbBlock;

  -- NO scramble here, this will squeeze Block'Length bits out of *same* state S
  procedure SqueezeBlock( Block: out Bitstream; S: in State) is
    X, Y    : XYCoord;
    BWord   : Bitword;
    FromPos : Natural;
    Len     : Natural;
  begin
    X := 0;
    Y := 0;
    FromPos := Block'First;

    while FromPos <= Block'Last loop
      BWord := WordToBits( S(X, Y) );

      X := X + 1;
      if X = 0 then
        Y := Y + 1;
      end if;

      -- copy full word if it fits or
      --   only as many bits as are still needed to fill the block
      Len := Block'Last - FromPos + 1;
      if Len > Z_Length then
        Len := Z_Length;
      end if;

      Block(FromPos..FromPos+Len-1) := BWord(BWord'First..BWord'First+Len-1);
      FromPos := FromPos + Len;
    end loop;
  end SqueezeBlock;

As usual, there are a few tests added to eucrypt/smg_keccak/tests/smg_keccak-test.adb. First, there is a simple test of the Keccak function itself (the one that puts together the transformations for a full scramble of the state). Second, there are tests of the two conversion functions from Bitword to ZWord and the other way around. Finally, there are tests for the sponge itself. The test data for the sponge was obtained by running the transformations separately according to the reference paper. The additional methods in the test file are the following:

 procedure print_bitstream(B: in Bitstream; Title: in String) is
    Hex       : array(0..15) of Character := ("0123456789ABCDEF");
    HexString : String(1..B'Length/4);
    C         : Natural;
    Pos       : Natural;
  begin
    for I in 1..B'Length/4 loop
      Pos := (I-1)*4 + B'First;
      C := Natural( B(Pos) ) * 8 +
           Natural( B(Pos + 1) ) * 4 +
           Natural( B(Pos + 2) ) * 2 +
           Natural( B(Pos + 3) );
			HexString(I) := Hex(C);
    end loop;
    Put_Line("---" & Title & "---");
    Put_Line(HexString);
  end print_bitstream;

  procedure test_bits_to_word_conversion is
    bits: Bitword := (others => 0);
    obtained_bits: Bitword := (others => 0);
    expected: ZWord;
    obtained: ZWord;
  begin
    expected := 16#E7DDE140798F25F1#;
    bits := (1,1,1,0, 0,1,1,1, 1,1,0,1, 1,1,0,1, 1,1,1,0, 0,0,0,1, 0,1,0,0,
             0,0,0,0, 0,1,1,1, 1,0,0,1, 1,0,0,0, 1,1,1,1, 0,0,1,0, 0,1,0,1,
             1,1,1,1, 0,0,0,1);
    obtained := BitsToWord(bits);
    obtained_bits := WordToBits(expected);

    if obtained /= expected then
      Put_Line("FAIL: bits to word");
      Put_Line("Expected: " & ZWord'Image(expected));
      Put_Line("Obtained: " & ZWord'Image(obtained));
    else
      Put_Line("PASSED: bits to word");
    end if;

    if obtained_bits /= bits then
      Put_Line("FAIL: word to bits");
      Put("Expected: ");
      for I in Bitword'Range loop
        Put(Bit'Image(bits(I)));
      end loop;
      Put_Line("");
      Put_Line("Obtained: ");
      for I in Bitword'Range loop
        Put(Bit'Image(obtained_bits(I)));
      end loop;
      Put_Line("");
    else
      Put_Line("PASSED: word to bits");
    end if;
  end test_bits_to_word_conversion;

  procedure test_sponge is
    Bitrate   : constant Keccak_Rate := 1344;
    Input     : Bitstream(1..5) := (1, 1, 0, 0, 1);
    Output    : Bitstream(1..Bitrate*2);
    Hex       : array(0..15) of Character := ("0123456789ABCDEF");
    HexString : String(1..Bitrate/2);
    C         : Natural;
    ExpHex    : String(1..Bitrate/2);
    Error     : Natural;
    Pos       : Natural;
  begin
    ExpHex := "B57B7DAED6330F79BA5783C5D45EABFFA1461FAC6CEA09BD"&
              "AAC114F17E23E5B349EECBC907E07FA36ECF8374079811E8"&
              "5E49243D04182C389E68C733BE698468423DB9891D3A7B10"&
              "320E0356AB4AB916F68C0EA20365A1D4DBA48218CA89CBB8"&
              "6D08A34E04544D4100FFE9CB138EADC2D3FC0E8CC2BC15A7"&
              "5B950776970BFC310F33BF609630D73CAD918CF54657589E"&
              "42CF7CBF20DE677D2AB7E49389F6F6C3B3FE2992905325CE"&
              "60931C1515043595ADC1619CB7E034EF52BDC485D03B7FDD"&
              "7345E849FFB4C4426195C8D88C1E7BF9ADA41B92E006C3DA"&
              "F1ED0FD63ADD9408A3FC815F727457692727637687C1F79D"&
              "837DE20798E64C878181C02DF56A533F684459E8A03C8EF6"&
              "234854531110E6CD9BDEFEA85E35C802B1ACDDF29C9332E2"&
              "53C0FA72F3ED1ABA274838CFE6EF8BD572E89E1C2135F6A7"&
              "5BC5D6EA4F85C9A757E68E561A56AC0FC19F1F086C43272F";

    Put_Line("---sponge test---");
    Sponge(Input, Bitrate, Output);
    Put_Line("Input is:");
    for I of Input loop
      Put(Bit'Image(I));
    end loop;
    new_line(1);

    Put_Line("Output is:");
    for I of Output loop
      Put(Bit'Image(I));
    end loop;
    new_line(1);

    Error := 0;
    for I in 1..Output'Length/4 loop
      Pos := Output'First + (I-1)*4;
      C := Natural( Output( Pos ) ) * 8 +
           Natural( Output( Pos + 1 ) ) * 4 +
           Natural( Output( Pos + 2 ) ) * 2 +
           Natural( Output( Pos + 3 ) );
			Hexstring(I) := Hex(C);
      if Hexstring(I) /= ExpHex(I) then
        Error := Error + 1;
      end if;
    end loop;
    Put_Line("Expected: ");
    Put_Line(ExpHex);
    Put_Line("Obtained: ");
    Put_Line(Hexstring);
    Put_Line("Errors found: " & Natural'Image(Error));

  end test_sponge;

  procedure test_keccak_function(T: in Test_Round) is
    S: State;
  begin
    Put_Line("---Full Keccak Function test---");
    S := Keccak_Function(T(Round_Index'First)(None));
    if S /= T(Round_Index'Last)(Iota) then
      Put_Line("FAILED: full keccak function test");
    else
      Put_Line("PASSED: full keccak function test");
    end if;
  end test_keccak_function;

The .vpatch for the above, together with its signature are as usual on my Reference Code Shelf and linked for your convenience below:

In the next chapter I'll get rid of the endianness trouble by changing everything in the Keccak implementation (constants' representation included) so that all of it works at bit level, as it should! Feel free to play around with this current version anyway but be aware that it's not the one that will be used in EuCrypt. For the proper, bit-level implementation of Keccak, see Chapter 8 of this series (to be published next week).


  1. As described in The Keccak Reference v. 3.0, Bertoni, G., Daemen, J., Peeters, M. and Van Assche, G., 2011. 

January 18, 2018

EuCrypt Chapter 6: Keccak Transformations

Filed under: EuCrypt — Diana Coman @ 8:46 p.m.

~ This is part of the EuCrypt series. Start with Introducing EuCrypt. ~

EuCrypt will use Keccak for all its hashing and RSA padding needs, as per TMSR-RSA specification. In this age of ever-mutating labels on top of labels though, I have to clearly state that EuCrypt will actually use Keccak itself and not SHA-3 or whatever other thing the original Keccak currently morphed into. More specifically and true to its name, EuCrypt's Keccak is a direct implementation of The Keccak Reference, Version 3.01. This means of course that I have to do the implementation from scratch since history is apparently re-written on the webs rather than preserved: current keccak website morphed at some point2 to the "new keccak" aka SHA-3, simply wiping the previous code from sight. Why would one assume their own past, why would one even need or want or try to actually follow the evolution of anything at all, right? There is apparently only "now" on the wide webs outside TMSR and everything is just re-written and replaced with no traceability to speak of.3

On the bright side, this Keccak implementation does not need to rely on the rather unreliable MPI or any other existing parts for that matter. Moreover, it is meant to work perfectly fine as part of EuCrypt itself but also as a standalone component. Consequently, I'm in the happy position of being able to gladly discard C/C++ as programming language and proceed in a much saner and altogether more pleasant to use language: Ada. The discussion of this choice and of Ada itself is outside the scope of this series but the interested reader can find quite a lot on this topic in the logs. This choice of a new programming language comes with its own challenge of course, as Ada is quite new to me, but this is the sort of challenge that you should actively search for and run after rather than the sort to run away from and turn down when it appears. So I'll write it in Ada - nothing better than a pointed need to actually learn something anyway.

Still on this pleasantly bright side, the task of implementing Keccak in Ada is relatively straightforward to start with, mainly due to the clear description of the Keccak permutations in the reference paper. It also helped to be able to play around a bit in the beginning with a previous attempt at Keccak in Ada, by Peter Lambert. Although I discovered a few problems with that initial attempt, it was nevertheless quite useful as a stepping stone to better understand what Keccak is about and what sort of troubles one might have when implementing it in Ada. Once that initial needed exploration was done I proceeded however to implement Keccak separately, from scratch and this new version of mine is the one I will focus on here. In other words, if there are any errors in this code they are entirely mine.

According to the reference paper, Keccak "is a family of sponge functions that use as a building block a permutation from a set of 7 permutations"4. If that doesn't clarify the issue, it's mainly because hash functions are essentially voodoo or in other words, *what* they really do is not all that clearly (as in mathematically) proven anywhere. That aside, *how* this unknown actual effect is achieved is quite clearly defined. In a nutshell, the working is this: a stream of input bits (the original text) are loaded into a 3D structure (essentially a cuboid made of bits, think of a larger, non-standard Rubik's cube where instead of colours you have bits in each cell and the range of permitted movements is not limited by physical considerations); this Keccak cuboid (called "state") is then scrambled by means of 5 transformations applied in a pre-defined order; the same sequence of 5 transformations is then repeated several times with different constants affecting each repetition (each full set of transformations is called a round); the resulting scrambled bits can be extracted then back into a stream that would represent the hash of the original text. Note that the "7 permutations" mentioned in the definition are not the 5 transformations of the state. Instead, the idea is that Keccak itself, as a whole, acts as a permutation over a number of bits b, where b can take 7 distinct values (hence the 7 permutations). Essentially there is only one mechanism but it can work on 7 different sizes of a bitstream.

The first step (the part that this chapter covers) is to implement therefore the Keccak permutation, meaning in more detail precisely this "state" structure and the 5 transformations that can work with it. I'll start by defining the needed knobs, constants and types, all of them part of the SMG_Keccak package in a new file eucrypt/smg_keccak/smg_keccak.ads:

 -- S.MG implementation of Keccak-f permutations

 -- (Based on The Keccak Reference, Version 3.0, January 14, 2011, by
 --   Guido Bertoni, Joan Daemen, Michael Peeters and Gilles Van Assche)

 -- S.MG, 2018

package SMG_Keccak is
  pragma Pure(SMG_Keccak);  --stateless, no side effects -> can cache calls

  --knobs (can change as per keccak design but fixed here for S.MG purposes)--
  Keccak_L: constant := 6;  --gives keccak z (word) dimension of 2^6=64 and
                            --therefore keccak function 1600 with current
                            --constants (5*5*2^6)

  --constants: dimensions of keccak state and number of rounds
  XY_Length: constant := 5;
  Z_Length: constant := 2**Keccak_L;
  Width: constant := XY_Length * XY_Length * Z_Length;
  N_Rounds: constant := 12 + 2*Keccak_L;

  --types
  type XYCoord is mod XY_Length;
  type ZCoord is mod Z_Length;
  type Round_Index is mod N_Rounds;

  type ZWord is mod 2**Z_Length;	--"lane" in keccak ref
  type Plane is array(XYCoord) of ZWord; --a "horizontal slice" of keccak state
  type State is array(XYCoord, XYCoord) of ZWord; --the full keccak state

  type Round_Constants is array(Round_Index) of ZWord;  --magic keccak constants

The "pragma Pure" line at the beginning of the SMG_Keccak package indicates the fact that this implementation of Keccak is made on purpose to be *stateless*. This means that none of the procedures and functions in the package affect any global variables or state, indeed that there are no such global variables or state(s) in the first place. If this strikes you as odd given that Keccak itself has effectively states (through the distinct rounds for instance and further deep down the different transformations that have to be applied in a pre-defined order) note that those are at most *internal* states of Keccak rather than external and there is no reason whatsoever for those "states" to be visible from outside or indeed to be actually stored as such. Each procedure and function in SMG_Keccak really operates on a *given* (as opposed to stored) state (and round constant for the iota function) producing another state, without any need to rely on anything else. In other words, there are no side effects of calling SMG_Keccak functions/procedures. As I am indeed very happy with *not* having any side effects if I can help it at all, that pragma stays exactly where it is.

As you can notice above further in the code after the pragma, there is a single knob for the user to play with, namely Keccak_L or length. The value of this knob however is used to effectively choose one of the "7 permutations" meaning in practice to calculate the number of Keccak rounds (i.e. how many times the full set of 5 transformations are applied), the Z dimension of the Keccak cuboid (Z_Length) and consequently the total width (i.e. how many bits can fit at any one given time). By adjusting this knob, the user can obtain a wider or narrower Keccak cuboid, trading to some extent width for speed (since there are fewer bits and also fewer rounds for a smaller width). However, the other 2 dimensions (X and Y, named for convenience) are fixed at 5, as per Keccak reference documentation. Similarly, the number of rounds takes the length knob into account but it is nevertheless at least 13, as an absolute minimum.

The types defined in the code above take advantage of one of Ada's very useful approaches: each type really spans the set of values that are valid for the intended use and nothing else. For instance, XY_Coord is defined as a modular type, based on XY_Length. This means that valid values of XY_Coord type are only 0 to XY_Coord -1 and moreover, any calculations with XY_Coord type will be considered modulo XY_Length. No need for further code to check on this, no headaches having to check again and again explicitly at all times5 that only this subset of values are valid X, Y coordinates: it's enough to define the type properly here and then simply use it throughout as intended!

Similarly to XY_Coord, there is Z_Coord as modular type with only difference that this is modulo Z_Length, since the Z dimension is not fixed and potentially different from X/Y dimensions. Using those, the Keccak cuboid is defined as "State": a matrix of ZWords, where each ZWord is of length 2^Z_Length (i.e. contains Z_Length bits). The additional type Plane represents a horizontal "slice" of the cuboid and is defined for convenience since it comes in very handy for some of the permutations later on. Note that the reference documentation defines vertical slices as well but I did not find (at least not yet) any actual need for them, so I did not include them as separate types.

The next part of the same file contains the definition of the internal constants and methods of Keccak:

private
  -- these are internals of the keccak implementation, not meant to be directly
  --  accessed/used

  --Keccak magic numbers
  RC : constant Round_Constants :=
    (
     16#0000_0000_0000_0001#,
     16#0000_0000_0000_8082#,
     16#8000_0000_0000_808A#,
     16#8000_0000_8000_8000#,
     16#0000_0000_0000_808B#,
     16#0000_0000_8000_0001#,
     16#8000_0000_8000_8081#,
     16#8000_0000_0000_8009#,
     16#0000_0000_0000_008A#,
     16#0000_0000_0000_0088#,
     16#0000_0000_8000_8009#,
     16#0000_0000_8000_000A#,
     16#0000_0000_8000_808B#,
     16#8000_0000_0000_008B#,
     16#8000_0000_0000_8089#,
     16#8000_0000_0000_8003#,
     16#8000_0000_0000_8002#,
     16#8000_0000_0000_0080#,
     16#0000_0000_0000_800A#,
     16#8000_0000_8000_000A#,
     16#8000_0000_8000_8081#,
     16#8000_0000_0000_8080#,
     16#0000_0000_8000_0001#,
     16#8000_0000_8000_8008#
    );

  --gnat-specific methods to have bit-ops for modular types
  function Rotate_Left( Value  : ZWord;
	                      Amount : Natural)
	                      return ZWord;
  pragma Import(Intrinsic, Rotate_Left);

  function Shift_Right( Value  : ZWord;
                        Amount : Natural)
                        return ZWord;
  pragma Import(Intrinsic, Shift_Right);

  --Keccak permutations
  function Theta ( Input       : in State) return State;
  function Rho   ( Input       : in State) return State;
  function Pi    ( Input       : in State) return State;
  function Chi   ( Input       : in State) return State;
  function Iota  ( Round_Const : in ZWord; Input : in State) return State;

  --Keccak full function with block width currently 1600 (Width constant above)
  --this simply applies *all* keccak permutations in the correct order and using
  -- the keccak magic numbers (round constants) as per keccak reference
  function Keccak_Function(Input: in State) return State;

end SMG_Keccak;

In the internals (private part) of the SMG_Keccak package, there are first the actual values of the constants that essentially differentiate each round from the others. Those are for all intents and purposes magic numbers, no way around it, so they get called in the code precisely that: Keccak magic numbers. After those, there are two gnat-specific methods imported for bit rotation and bit shifting of modular types. While I still don't like these imports, I don't have a good alternative for now, so there they are. Finally, the actual Keccak transformations follow, each of them taking as input a Keccak State (and in the case of the last transformation, iota, also a round constant) and providing as output another Keccak State. The Keccak_Function that follows applies the 5 transformations (theta, rho, pi, chi, iota) in the correct order and moreover iteratively and with the correct constants as required by the pre-established (for this particular Keccak permutation) number of rounds.

The implementation of all the above Keccak transformations and function can be found in eucrypt/smg_keccak/smg_keccak.adb. The code should be relatively easy to follow as it adheres quite closely to the pseudo-code given in the Keccak reference:

 -- S.MG, 2018

package body SMG_Keccak is

  function Theta(Input : in State) return State is
    Output : State;
    C      : Plane;
    W      : ZWord;
  begin
    for X in XYCoord loop
      C(X) := Input(X, 0);
      for Y in 1..XYCoord'Last loop
        C(X) := C(X) xor Input(X, Y);
      end loop;
    end loop;

    for X in XYCoord loop
      W := C(X-1) xor Rotate_Left(C(X+1), 1);
      for Y in XYCoord loop
        Output(X,Y) := Input(X,Y) xor W;
      end loop;
    end loop;

    return Output;
  end Theta;

  function Rho(Input : in State) return State is
    Output      : State;
    X, Y, Old_Y : XYCoord;
  begin
    Output(0,0) := Input(0,0);
    X           := 1;
    Y           := 0;

    for T in 0..23 loop
      Output(X, Y) := Rotate_Left(Input(X,Y), ( (T+1)*(T+2)/2) mod Z_Length);
      Old_Y := Y;
      Y := 2*X + 3*Y;
      X := Old_Y;
    end loop;
    return Output;
  end rho;

  function Pi(Input : in State) return State is
    Output: State;
  begin
    for X in XYCoord loop
      for Y in XYCoord loop
        Output(Y, 2*X + 3*Y) := Input(X, Y);
      end loop;
    end loop;
    return Output;
  end pi;

  function Chi(Input : in State) return State is
    Output: State;
  begin
    for Y in XYCoord loop
      for X in XYCoord loop
        Output(X, Y) := Input(X, Y) xor
                        ( (not Input(X + 1, Y)) and Input(X + 2, Y) );
      end loop;
    end loop;
    return Output;
  end chi;

  function Iota(Round_Const : in ZWord; Input : in State) return State is
    Output: State;
  begin
    Output := Input;
    Output(0,0) := Input(0,0) xor Round_Const;
    return Output;
  end iota;

  function Keccak_Function(Input: in State) return State is
    Output: State;
  begin
    Output := Input;
    for I in Round_Index loop
      Output := Iota(RC(I), Chi(Pi(Rho(Theta(Output)))));
    end loop;

    return Output;
  end Keccak_Function;

end SMG_Keccak;

To round this off, all we need is a way to compile everything. If you are using gnatmake, it's quite straightforward as it's only a file for now. However, for the future and in the interest of making choices explicit, I've wrote a .gpr file as well (eucrypt/smg_keccak/smg_keccak.gpr), for use with gprbuild:

 -- S.MG, 2018
project SMG_Keccak is
  for Languages use ("Ada");
  for Library_Name use "SMG_Keccak";
  for Library_Kind use "static";

  for Source_Dirs use (".");
  for Object_Dir use "obj";
  for Library_Dir use "lib";
end SMG_Keccak;

As usual, an implementation cannot really be published without any tests at all, so there is a tests folder too, containing two text files with one distinct test case each (taken from keccak archives that I managed to find) and the corresponding Ada testing code that reads those text files, runs the Keccak implementation and reports at each step if the expected and actual outputs of each transformation match or not. As usual again, this is of course more code than in the implementation that it tests, apparently I can't escape this. Moreover, a lot of it is the faffing about with parsing the input since the original "format" of the test vectors does not strike me at all as particularly friendly for automated tests. The whole testing code is quite strict on having only one single test case per file as well as some specific markers in the text itself to be able to identify correctly each round and state. Nevertheless, strict and long as it is, you can find it all in eucrypt/smg_keccak/tests/smg_keccak.adb:

with SMG_Keccak; use SMG_Keccak;
with Ada.Exceptions; use Ada.Exceptions;
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Strings.Fixed; use Ada.Strings.Fixed;
with Interfaces; use Interfaces;

procedure SMG_Keccak.Test is
  --types
  type Keccak_Perms is (None, Theta, Rho, Pi, Chi, Iota);
  type Test_Vector is array(Keccak_Perms) of State;
  type Test_Round is array(Round_Index) of Test_Vector;

  --helper methods

  procedure print_state(S: in State; Title: in String) is
    Hex: array(0..15) of Character := ("0123456789ABCDEF");
    Len: constant Natural := Z_Length / 4;
    HexString: String(1..Len);
    W: ZWord;
  begin
    Put_Line("---------" & Title & "---------");
    for Y in XYCoord loop
      for X in XYCoord loop
        W := S(X,Y);
        for Z in 0..Len-1 loop
          HexString(Natural(Len-Z)) := Hex(Natural(W mod 16));
          W := W / 16;
        end loop;
        Put(HexString & " ");
      end loop;
      Put_Line("");
    end loop;
  end;

  function read_state(File: in FILE_TYPE; Oct: Positive :=8) return State is
    S: State;
    Line1: String := "0000000000000000 " &
                     "0000000000000000 " &
                     "0000000000000000 " &
                     "0000000000000000 " &
                     "0000000000000000";
    StartPos, EndPos: Positive;
    Len: Positive := Oct*2;
  begin
    for Y in XYCoord loop
      Line1 := Get_Line(File);
      StartPos := Line1'First;
      EndPos := StartPos + Len-1;

      for X in XYCoord loop
        S(X,Y) := ZWord'value("16#" & Line1(StartPos..EndPos) & "#");
        StartPos := EndPos + 2;	--one space to skip
        EndPos := StartPos + Len - 1;
      end loop;
    end loop;
    return S;
  end read_state;

  --reads a full test round from specified file (pre-defined format)
  function read_from_file (filename : in String;
                           T        : out Test_Round)
                           return Boolean is
    file: FILE_TYPE;
    InputMarker: String := "lanes as 64-bit words:";
    octets: Positive := 8;
    RoundNo: Round_Index;
  begin
    -- try to open the input file
    begin
      open(file, In_File, filename);
    exception
      when others =>
        Put_Line(Standard_Error,
                 "Can not open the file '" & filename & "'. Does it exist?");
        return False;
    end;

  -- find & read input state first
    RoundNo := -1;
    loop
      declare
        Line: String := Get_Line(file);
      begin
        --check if this is test data of any known kind
        if index(Line, InputMarker, 1) > 0 then
          T(0)(None) := read_state(file, octets);
          print_state(T(0)(None), "Read Input State");
        elsif index(Line, "Round ", 1) > 0 then
          RoundNo := RoundNo +1;
        elsif index(Line, "theta", 1) > 0 then
          T(RoundNo)(Theta) := read_state(file, octets);
          if (RoundNo > 0) then
            T(RoundNo)(None) := T(RoundNo-1)(Iota);  -- previous state as input
          end if;
        elsif index(Line, "rho", 1) > 0 then
          T(RoundNo)(Rho) := read_state(file, octets);
        elsif index(Line, "pi", 1) > 0 then
          T(RoundNo)(Pi) := read_state(file, octets);
        elsif index(Line, "chi", 1) > 0 then
          T(RoundNo)(Chi) := read_state(file, octets);
        elsif index(Line, "iota", 1) > 0 then
          T(RoundNo)(Iota) := read_state(file, octets);
        end if;
        exit when End_Of_File(file);
      end;
    end loop;
    Close(file);
    return True;
  end read_from_file;

  -- performs one single round of Keccak, step by step
  -- each permutation is tested separately
  -- test fails with exception raised at first output not matching expected
  procedure test_one_round(T: Test_Vector; Round: Round_Index) is
    Input: State;
    Expected: State;
    Output: State;
    Test_One_Round_Fail: Exception;
  begin
    Input := T(None);
    for I in Keccak_Perms range Theta..Iota loop
      Expected := T(I);
      case I is
        when Theta => Output := SMG_Keccak.Theta(Input);
        when Rho   => Output := SMG_Keccak.Rho(Input);
        when Pi    => Output := SMG_Keccak.Pi(Input);
        when Chi => Output := SMG_Keccak.Chi(Input);
        when Iota => Output := SMG_Keccak.Iota(RC(Round), Input);
        when others => null;
      end case;

      if (Output /= Expected) then
        print_state(Output, "----------real output-------");
        print_state(Expected, "----------expected output--------");
        raise Test_One_Round_Fail;
      else
        Put_Line("PASSED: " & Keccak_Perms'Image(I));
      end if;
      -- get ready for next permutation
      Input := Expected;
    end loop;
  end test_one_round;
  -- end of helper methods

	--variables
  T: Test_Round;
begin
  Put_Line("-----Testing with zero state as input------");
  if (not read_from_file("testvectorszero.txt", T)) then
    return;
  end if;

  for I in Round_Index loop
    Put_Line("---round " & Round_Index'Image(I) & "---");
    test_one_round(T(I), I);
  end loop;

  Put_Line("-----Testing with non-zero state as input------");
  if (not read_from_file("testvectorsnonzero.txt", T)) then
    return;
  end if;

  for I in Round_Index loop
    Put_Line("---round " & Round_Index'Image(I) & "---");
    test_one_round(T(I), I);
  end loop;

end SMG_Keccak.Test;

The .gpr file for building the test suite that can then be simply executed:

 -- Tests for SMG_Keccak (part of EuCrypt)
 -- S.MG, 2018

project SMG_Keccak_Test is
  for Source_Dirs use (".", "../");
  for Object_Dir use "obj";
  for Exec_Dir use ".";

  for Main use ("smg_keccak-test.adb");
end SMG_Keccak_Test;

The .vpatch and its signature for this chapter are made quite on purpose to have only the genesis of EuCrypt as ascendant. This reflects the fact that smg_keccak itself does not depend on mpi or smg_rsa. It also allows any users of smg_keccak to potentially take just the smg_keccak tree if that's all they need out of EuCrypt. So the next chapters will build on this one and essentially further develop the smg_keccak branch of the big EuCrypt tree. When the whole smg_keccak is ready, a unifying .vpatch will bring everything back together into a common trunk, as everything gets used together as intended. Until then, here's this first keccak .vpatch and its signature:

In the next chapter I'll further expand this Keccak implementation so stay tuned!


  1. Bertoni, G., Daemen, J., Peeters, M. and Van Assche, G., 2011. The Keccak Reference. Version 3.0 

  2. As the man says: holly shit the original keccak www is gone. 

  3. Why exactly is this so? Try and tell yourself why. From where I am, I can only really see a whole army of underlings (not even employees for they do this dirty work unpaid even, for the saddest part of it) at the Ministry of Truth busily at work, what is there more to say about it. 

  4. Bertoni, G., Daemen, J., Peeters, M. and Van Assche, G., 2011. The Keccak Reference, Version 3.0, p. 7 

  5. of course one still keeps that in mind, but it doesn't have to be at the forefront at all times since there's no need to re-implement the check each time coordinates are used. 

January 11, 2018

EuCrypt Chapter 5: C = M ^ e mod n

Filed under: EuCrypt — Diana Coman @ 4:21 p.m.

~ This is part of the EuCrypt series. Start with Introducing EuCrypt. ~

Having a true random number generator (trng) and on top of it a true random prime number generator (trpng) from previous chapters, I can now finally touch on RSA1 itself: this chapter adds a way to generate RSA keys and to actually use them directly to encrypt/decrypt a chunk of octets as they are given2. Future chapters will build further on this by adding for instance slicing of messages into adequate blocks, padding and hashing. For now though, I'll focus strictly on RSA itself, meaning on obtaining the components of a working RSA key pair and then using those for the two main RSA operations of encryption and decryption. Specifically:

  • RSA public key: n, e, where n is called the modulus and e is called the public exponent. Most importantly, n is obtained as a product of 2 secret primes (usually called p and q) that are randomly chosen.
  • RSA private key: n, d, where n is the same one as above in the public key and d is essentially an inverse of e that depends not only on e itself but also on the hidden p and q.
  • encryption: C = M ^ e mod n, where C is the resulting encrypted message (cipher), M is the message in plain-text, e is the public exponent of an RSA key pair and n is the public modulus of the same key pair.
  • decryption: M = C ^ d mod n, where M, C and n are the very same as in the encryption formula above and d is the secret exponent of the same key pair as above (the inverse of e, essentially).

Note that mathematically encryption and decryption are really the very same operation: an exponentiation modulo n (even same n). However, in practice the two operations need to be treated quite separately because they involve knowledge of different parts and that is the whole point from a cryptographic perspective: encryption means using someone's public key, hence the assumed knowledge is of e and n, nothing else; decryption however means using one's own key, hence there is full knowledge of not only e and n but also the hidden parts, p and q. This difference of knowledge translates at implementation time in a different route taken to perform those same exponentiations modulo n: while encryption has to proceed directly as it is, decryption can take a faster route using the Chinese Remainder Theorem (CRT).

It's worth mentioning that the use of CRT at this stage is considered acceptable for Eulora's needs. However, you need to make your own decision on this. Note that a truly non-leaking RSA requires effectively constant-time operations at all levels, so you'll need to throw away more than CRT itself if that's what you are aiming for - you'll probably want to have a look at FFA in such case.

Generating a RSA key pair in EuCrypt consists in the following steps:

  • use the true random prime number generator from Chapter 4 to obtain 2 random primes, p and q, of 2048 bits (256 octets) each. This value is precisely half of the intended key length, as per current TMSR RSA specification. Note that both primes will have top 2 bits set to 1 precisely to ensure that their product is indeed 4096 bits in length in all cases. See discussion in Chapter 4 for more details on the working of the generator itself.
  • Compare p and q and swap them if needed so that p is always less than q. Note that this is NOT a requirement of RSA itself but it is needed basically as helper for the use of the Chinese Remainder Theorem (CRT) to speed-up decryption, see next step and then the decryption part itself.
  • Calculate u so that u * p = 1 (mod q). This is effectively the inverse of p, mod q. Due to the previous step, p here is always less than q. As with the previous step, this is not required by RSA itself - it's simply a value that is used at decryption to speed up the calculations by means of using CRT.
  • Calculate the modulus n of the key pair, as n = p * q.
  • Calculate the Euler totient3 of n, phi = (p - 1) * (q - 1)
  • Choose another random prime between 3 and phi. This is e (3 < e < phi), the public exponent of the newly generated key pair. A few comments here:

    • The public exponent is not a fixed value and there really is no defensible reason why it should be fixed at the level of an RSA implementation meant to really serve the user4 rather than anyone else. So EuCrypt does not fix the public exponent to any value. By contrast, GnuPG fixes the public exponent to 65537 as it really knows better than you what you need and even what you could possibly want, what want and what choice, why should you even think of any such things? You as user of EuCrypt can of course fix anything you want, public exponent included, IF you want to.
    • Mathematically, RSA requires e to be merely co-prime with (p-1)*(q-1), NOT necessarily prime in itself. However, EuCrypt chooses here the stronger constraint of e strictly prime, as per previous discussion of the TMSR spec.
    • The chosen size of e is the same as that of p and q. This means that the public exponent will likely be large but at the same time the corresponding private exponent will not become tiny either. For more details see the same previous discussion of the TMSR spec, as linked above.
    • Because e is obtained the same way as any other prime in EuCrypt, it will also always have top 2 bits and bottom 1 bit (so 3 bits in total) all set to 1. Combined with the fact that minimum length accepted by the prime generator is 1 octet (8 bits), it follows that, as long as the current generator of primes is used, the e in EuCrypt will be in fact at all times at least 193 (1+64+128) even if the length of e is lowered (with current length of 256 octets, that minimum is significantly higher). Nevertheless, this higher limit is imposed by the current generator, not by the RSA algorithm itself and for this reason the check here is more lenient as it aims to reflect requirements of RSA rather than other characteristics of its current surroundings - basically the generator can be changed at a later time without having to touch the RSA code itself.
    • The search for a suitable e is iterative: if a prime happens to fail the boundary checks (so it's either too small or too large) then it is discarded and another prime is generated.
  • calculate the private exponent, d, such that e * d = 1 mod phi (the inverse of e, mod phi).

The implementation of the above relies on two new structures that hold the various parts of a public and private RSA key, respectively. Those two new structures are added by the .vpatch for this chapter to eucrypt/smg_rsa/include/smg_rsa.h:

typedef struct {
    MPI n;      /* modulus */
    MPI e;      /* public exponent */
} RSA_public_key;

typedef struct {
    MPI n;      /* public modulus */
    MPI e;      /* public exponent */
    MPI d;      /* private exponent: e*d=1 mod phi */
    MPI p;      /* prime  p */
    MPI q;      /* prime  q */
    MPI u;      /* inverse of p mod q */
} RSA_secret_key;

In the same file eucrypt/smg_rsa/include/smg_rsa.h, there is a further addition of the signature of the method that generates a RSA key pair:

/*********rsa.c*********/
/*
 * Generates a pair of public+private RSA keys using directly the entropy source
 * specified in eucrypt/smg_rsa/include/knobs.h
 *
 * ALL RSA keys are 4096 bits out of 2 2048 bits primes, as per TMSR spec.
 *
 * @param sk a fully-allocated structure to hold the generated keypair (secret
key structure holds all the elements anyway, public key is a subset of this)
 *
 * NB: this procedure does NOT allocate memory for components in sk!
 *     caller should ALLOCATE enough memory for all the MPIs in sk
 * Precondition:
 * MPIs in sk have known allocated memory for the nlimbs fitting their TMSR size
 */
void gen_keypair( RSA_secret_key *sk );

It's worth noting what the comments shout at you there as loud as they can: as in previous code, I avoid separating allocation of memory from de-allocation and for this reason, it's the caller's responsibility to allocate memory for the things they want to use (in this case the MPIs that make up the key pair).

If you wonder why the gen_keypair method takes only one argument rather than two (i.e. only a secret key structure rather than a secret key AND a public key structure) scroll back and look again at the two structures: the public key is really but a subset of the secret key, so the secret key structure effectively holds the whole "pair" of keys by itself.

The actual body of the gen_keypair method lives in a new file, eucrypt/smg_rsa/rsa.c:

*
 * An implementation of TMSR RSA
 * S.MG, 2018
 */

#include "smg_rsa.h"
#include 

void gen_keypair( RSA_secret_key *sk ) {
  /* precondition: sk is not null */
  assert(sk != NULL);

  /* precondition: enough memory allocated, corresponding to key size */
  int noctets_pq = KEY_LENGTH_OCTETS / 2;
  unsigned int nlimbs_pq = mpi_nlimb_hint_from_nbytes( noctets_pq);
  unsigned int nlimbs_n = mpi_nlimb_hint_from_nbytes( KEY_LENGTH_OCTETS);
  assert( mpi_get_alloced( sk->n) >= nlimbs_n);
  assert( mpi_get_alloced( sk->p) >= nlimbs_pq);
  assert( mpi_get_alloced( sk->q) >= nlimbs_pq);

  /* helper variables for calculating Euler's totient phi=(p-1)*(q-1) */
  MPI p_minus1 = mpi_alloc(nlimbs_pq);
  MPI q_minus1 = mpi_alloc(nlimbs_pq);
  MPI phi = mpi_alloc(nlimbs_n);

  /* generate 2 random primes, p and q*/
  /* gen_random_prime sets top 2 bits to 1 so p*q will have KEY_LENGTH bits */
  /* in the extremely unlikely case that p = q, discard and generate again */
  do {
    gen_random_prime( noctets_pq, sk->p);
    gen_random_prime( noctets_pq, sk->q);
  } while ( mpi_cmp( sk->p, sk->q) == 0);

  /* swap if needed, to ensure p < q for calculating u */
  if ( mpi_cmp( sk->p, sk->q) > 0)
    mpi_swap( sk->p, sk->q);

  /* calculate helper for Chinese Remainder Theorem:
      u = p ^ -1 ( mod q )
     this is used to speed-up decryption.
  */
  mpi_invm( sk->u, sk->p, sk->q);

  /* calculate modulus n = p * q */
  mpi_mul( sk->n, sk->p, sk->q);

  /* calculate Euler totient: phi = (p-1)*(q-1) */
  mpi_sub_ui( p_minus1, sk->p, 1);
  mpi_sub_ui( q_minus1, sk->q, 1);
  mpi_mul( phi, p_minus1, q_minus1);

  /* choose random prime e, public exponent, with 3 < e < phi */
  /* because e is prime, gcd(e, phi) is always 1 so no need to check it */
  do {
    gen_random_prime( noctets_pq, sk->e);
  } while ( (mpi_cmp_ui(sk->e, 3) < 0) || (mpi_cmp(sk->e, phi) > 0));

  /* calculate private exponent d, 1 < d < phi, where e * d = 1 mod phi */
  mpi_invm( sk->d, sk->e, phi);

  /*  tidy up: free locally allocated memory for helper variables */
  mpi_free(phi);
  mpi_free(p_minus1);
  mpi_free(q_minus1);
}

As usual, the method checks as well as it can its own preconditions that reduce in this case to some checks on the known size of allocated memory for some of the MPIs. However, these checks don't change the fact that it still is the caller's responsibility to allocate memory for *all* MPIs in the structure and to allocate *enough* such memory for each of them, too. The unfortunate souls who have some knowledge of the mpi lib might interject at this point with the observation that mpi methods tend to re-allocate memory whenever they deem it necessary without any concern whatsoever about whether it is their role to do so. This is true unfortunately and it's not something I'm going to sink time into fixing at this moment. Still, it's not in itself a valid excuse or otherwise a "reason" to fail to allocate the correct amount of memory from the start. Don't add to existing garbage lest you'll get eaten by rats later and don't get lazy just because there's nobody looking right now.

Encryption with a previously generated public RSA key is a simple exponentiation modulo n. However, as usual, digging through the parts of the mpi lib that are needed for this reveals that the mpi_powm method that does this exponentiation is not only rather gnarly but also unable to handle properly the corner case when the MPIs for storing the input (the message to encrypt so the MPI raised to power e) and the output (the result of the encryption, hence the result of the exponentiation) are the same. The "solution" of GnuPG on this is -again, as usual- to paper it over and force the RSA layer to take care to avoid this case by allocating memory and using basically a temporary copy of the original input MPI. As you can probably tell by now, I don't like this approach at all because it neither solves the problem nor flags it as such, clearly5. Instead, it avoids the problem but it does so in the wrong place (why should the encryption operation allocate new memory?) and with the unfortunate effect of hiding it from anything that builds on top of the RSA layer.

Since fixing this wobble of the mpi method promises to be more time-consuming than it's currently worth it, EuCrypt takes the second-best option on it: the encryption method clearly states as its precondition that input and output have to be two different MPIs; the reason for this is given too, so that the problem is documented clearly, not hidden; the method then checks this precondition and it aborts execution if the check fails; whenever the precondition is met, the method simply does precisely what it promised (i.e. the exponentiation) and nothing more. The signature of this method is in eucrypt/smg_rsa/include/smg_rsa.h:

/****************
 * Public key operation. Encrypt input with pk and store result into output.
 *
 *  output = input^e mod n , where e,n are elements of pkey.
 * NB: caller should allocate *sufficient* memory for output to hold the result.
 * NB: NO checks are made on input!
 *
 * @param output MPI with enough allocated memory to hold result of encryption
 * @param input MPI containing content to encrypt; it *has to be* different from
output!
 * @param pk the public key that will be used to encrypt input
 *
 * Precondition:
 *  output != input
 * Output and input have to be two distinct MPIs because of the sorry state of
the underlying mpi lib that can't handle properly the case when those are the
same.
 */
void public_rsa( MPI output, MPI input, RSA_public_key *pk );

The implementation of public_rsa is in eucrypt/smg_rsa/include/rsa.c and it barely has 3 lines in total, comment line included:

void public_rsa( MPI output, MPI input, RSA_public_key *pk ) {

  /* mpi_powm can't handle output and input being same */
  assert (output != input);

  mpi_powm( output, input, pk->e, pk->n );
}

Decrypting with a previously generated RSA private key is mathematically the same operation as encrypting. However, given the additional information available about the modulus (namely its factors, p and q), the implementation can take advantage of CRT to perform the same calculation faster (~4 times faster). This is implemented in EuCrypt in the method secret_rsa with signature in eucrypt/smg_rsa/include/smg_rsa.h:

/****************
 * Secret key operation. Decrypt input with sk and store result in output.
 *
 *  output = input^d mod n , where d, n are elements of skey.
 *
 * This implementation uses the Chinese Remainder Theorem (CRT):
 *
 *      out1   = input ^ (d mod (p-1)) mod p
 *      out2   = input ^ (d mod (q-1)) mod q
 *      h      = u * (out2 - out1) mod q
 *      output = out1 + h * p
 *
 * where out1, out2 and h are intermediate values, d,n,p,q,u are elements of
skey. By using CRT, encryption is *faster*. Decide for yourself if this fits
your needs though!
 * NB: it is the caller's responsibility to allocate memory for output!
 * NB: NO checks are made on input!
 *
 * @param output MPI with enough allocated memory to hold result of decryption
 * @param input MPI containing content to decrypt
 * @param sk the secret key that will be used to decrypt input
 */
void secret_rsa( MPI output, MPI input, RSA_secret_key *sk );

And the implementation of secret_rsa, in eucrypt/smg_rsa/rsa.c, with comments literally at every step:

void secret_rsa( MPI output, MPI input, RSA_secret_key *skey ) {
  /* at its simplest, this would be input ^ d (mod n), hence:
   *    mpi_powm( output, input, skey->d, skey->n );
   * for faster decryption though, we'll use CRT and Garner's algorithm, hence:
   *        u = p ^ (-1) (mod q) , already calculated and stored in skey
   *       dp = d mod (p-1)
   *       dq = d mod (q-1)
   *       m1 = input ^ dp (mod p)
   *       m2 = input ^ dq (mod q)
   *        h = u * (m2 - m1) mod q
   *   output = m1 + h * p
   * Note that same CRT speed up isn't available for encryption because at
encryption time not enough information is available (only e and n are known).
   */
  /* allocate memory for all local, helper MPIs */
  MPI p_minus1 = mpi_alloc( mpi_get_nlimbs( skey->p) );
  MPI q_minus1 = mpi_alloc( mpi_get_nlimbs( skey->q) );
  int nlimbs   = mpi_get_nlimbs( skey->n ) + 1;
  MPI dp       = mpi_alloc( nlimbs );
  MPI dq       = mpi_alloc( nlimbs );
  MPI m1       = mpi_alloc( nlimbs );
  MPI m2       = mpi_alloc( nlimbs );
  MPI h        = mpi_alloc( nlimbs );

  /* p_minus1 = p - 1 */
  mpi_sub_ui( p_minus1, skey->p, 1 );

  /* dp = d mod (p - 1) aka remainder of d / (p - 1) */
  mpi_fdiv_r( dp, skey->d, p_minus1 );

  /* m1 = input ^ dp (mod p) */
  mpi_powm( m1, input, dp, skey->p );

  /* q_minus1 = q - 1 */
  mpi_sub_ui( q_minus1, skey->q, 1 );

  /* dq = d mod (q - 1) aka remainder of d / (q - 1) */
  mpi_fdiv_r( dq, skey->d, q_minus1 );

  /* m2 = input ^ dq (mod q) */
  mpi_powm( m2, input, dq, skey->q );

  /* h = u * ( m2 - m1 ) mod q */
  mpi_sub( h, m2, m1 );
  if ( mpi_is_neg( h ) )
    mpi_add ( h, h, skey->q );
  mpi_mulm( h, skey->u, h, skey->q );

  /* output = m1 + h * p */
  mpi_mul ( h, h, skey->p );
  mpi_add ( output, m1, h );

  /* tidy up */
  mpi_free ( p_minus1 );
  mpi_free ( q_minus1 );
  mpi_free ( dp );
  mpi_free ( dq );
  mpi_free ( m1 );
  mpi_free ( m2 );
  mpi_free ( h );

}

And now that we have everything in place to actually generate RSA key pairs and proceed to encrypt and decrypt, a few new tests and timing methods are needed in eucrypt/smg_rsa/tests/tests.c:

/* Test encryption+decryption on noctets of random data, using sk
 * Output is written to file.
 */
void test_rsa_keys( RSA_secret_key *sk, unsigned int noctets, FILE *file ) {
  RSA_public_key pk;
  MPI test = mpi_alloc ( mpi_nlimb_hint_from_nbytes (noctets) );
  MPI out1 = mpi_alloc ( mpi_nlimb_hint_from_nbytes (noctets) );
  MPI out2 = mpi_alloc ( mpi_nlimb_hint_from_nbytes (noctets) );

  pk.n = mpi_copy(sk->n);
  pk.e = mpi_copy(sk->e);
  unsigned char *p;
  p = xmalloc(noctets);

  fprintf(file, "TEST encrypt/decrypt on %d octets of random datan", noctets);
  fflush(file);
  if (get_random_octets( noctets, p) == noctets) {
    mpi_set_buffer( test, p, noctets, 0 );

    fprintf(file, "TEST data:n");
    mpi_print(file, test, 1);
    fprintf(file, "n");
    fflush(file);

    public_rsa( out1, test, &pk );
    secret_rsa( out2, out1, sk );

    fprintf(file, "ENCRYPTED with PUBLIC key data:n");
    mpi_print(file, out1, 1);
    fprintf(file, "n");
    fflush(file);

    fprintf(file, "DECRYPTED with SECRET key:n");
    mpi_print(file, out2, 1);
    fprintf(file, "n");
    fflush(file);

    if( mpi_cmp( test, out2 ) )
      fprintf(file, "FAILED: RSA operation: public(secret) failedn");
    else
      fprintf(file, "PASSED: RSA operation: public(secret) passedn");
    fflush(file);

    secret_rsa( out1, test, sk );
    public_rsa( out2, out1, &pk );
    if( mpi_cmp( test, out2 ) )
      fprintf(file, "FAILED: RSA operation: secret(public) failedn");
    else
      fprintf(file, "PASSED: RSA operation: secret(public) passedn");
  }
  else
    fprintf(file, "FAILED: not enough bits returned from entropy sourcen");

  fflush(file);
  xfree(p);
  mpi_free( pk.n);
  mpi_free( pk.e);

  mpi_free( test );
  mpi_free( out1 );
  mpi_free( out2 );
}

void test_rsa( int nruns, FILE *fkeys, FILE *fout) {
  RSA_secret_key sk;
  int noctets = KEY_LENGTH_OCTETS;
  int noctets_pq = noctets / 2;
  int nlimbs = mpi_nlimb_hint_from_nbytes(noctets);
  int nlimbs_pq = mpi_nlimb_hint_from_nbytes(noctets_pq);
  int i;

  sk.n = mpi_alloc(nlimbs);
  sk.e = mpi_alloc(nlimbs);
  sk.d = mpi_alloc(nlimbs);
  sk.p = mpi_alloc(nlimbs_pq);
  sk.q = mpi_alloc(nlimbs_pq);
  sk.u = mpi_alloc(nlimbs_pq);

  printf("TEST RSA key generation and use with %d runsn", nruns);
  fflush(stdout);

  for (i = 0;i < nruns; i++) {
    gen_keypair(&sk);
    printf(".");
    fflush(stdout);

    mpi_print(fkeys, sk.n, 1);
    fwrite("n", sizeof(char), 1, fkeys);

    mpi_print(fkeys, sk.e, 1);
    fwrite("n", sizeof(char), 1, fkeys);

    mpi_print(fkeys, sk.d, 1);
    fwrite("n", sizeof(char), 1, fkeys);

    mpi_print(fkeys, sk.p, 1);
    fwrite("n", sizeof(char), 1, fkeys);

    mpi_print(fkeys, sk.q, 1);
    fwrite("n", sizeof(char), 1, fkeys);

    mpi_print(fkeys, sk.u, 1);
    fwrite("n", sizeof(char), 1, fkeys);

    test_rsa_keys(&sk, noctets_pq, fout);
    printf("*");
    fflush(stdout);
  }

  mpi_free(sk.n);
  mpi_free(sk.e);
  mpi_free(sk.d);
  mpi_free(sk.p);
  mpi_free(sk.q);
  mpi_free(sk.u);

}

void test_rsa_exp() {
  MPI msg = mpi_alloc(0);
  MPI expected = mpi_alloc(0);
  MPI result;

  RSA_public_key pk;
  pk.n = mpi_alloc(0);
  pk.e = mpi_alloc(0);

  printf("TEST verify of rsa exponentiation on input data: n");

  mpi_fromstr(msg, "0x
5B6A8A0ACF4F4DB3F82EAC2D20255E4DF3E4B7C799603210766F26EF87C8980E737579
EC08E6505A51D19654C26D806BAF1B62F9C032E0B13D02AF99F7313BFCFD68DA46836E
CA529D7360948550F982C6476C054A97FD01635AB44BFBDBE2A90BE06F7984AC8534C3
8613747F340C18176E6D5F0C10246A2FCE3A668EACB6165C2052497CA2EE483F4FD8D0
6A9911BD97E9B6720521D872BD08FF8DA11A1B8DB147F252E4E69AE6201D3B374B171D
F445EF2BF509D468FD57CEB5840349B14C6E2AAA194D9531D238B85B8F0DD352D1E596
71539B429849E5D965E438BF9EFFC338DF9AADF304C4130D5A05E006ED855F37A06242
28097EF92F6E78CAE0CB97");

  mpi_fromstr(expected, "0x
1FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF003051300
D0609608648016503040203050004406255509399A3AF322C486C770C5F7F6E05E18FC
3E2219A03CA56C7501426A597187468B2F71B4A198C807171B73D0E7DBC3EEF6EA6AFF
693DE58E18FF84395BE");
  result = mpi_alloc( mpi_get_nlimbs(expected) );

  mpi_fromstr(pk.n, "0x
CDD49A674BAF76D3B73E25BC6DF66EF3ABEDDCA461D3CCB6416793E3437C7806562694
73C2212D5FD5EED17AA067FEC001D8E76EC901EDEDF960304F891BD3CAD7F9A335D1A2
EC37EABEFF3FBE6D3C726DC68E599EBFE5456EF19813398CD7D548D746A30AA47D4293
968BFBAFCBF65A90DFFC87816FEE2A01E1DC699F4DDABB84965514C0D909D54FDA7062
A2037B50B771C153D5429BA4BA335EAB840F9551E9CD9DF8BB4A6DC3ED1318FF3969F7
B99D9FB90CAB968813F8AD4F9A069C9639A74D70A659C69C29692567CE863B88E191CC
9535B91B417D0AF14BE09C78B53AF9C5F494BCF2C60349FFA93C81E817AC682F0055A6
07BB56D6A281C1A04CEFE1");

  mpi_fromstr( pk.e, "0x10001");

  mpi_print( stdout, msg, 1);
  printf("n");

  public_rsa( result, msg, &pk);
  if ( mpi_cmp( result, expected) != 0 )
    printf( "FAILEDn");
  else
    printf( "PASSEDn");

  printf("Expected:n");
  mpi_print( stdout, expected, 1);
  printf("n");

  printf("Obtained:n");
  mpi_print( stdout, result, 1);
  printf("n");

  mpi_free( pk.n );
  mpi_free( pk.e );
  mpi_free( msg );
  mpi_free( expected );
  mpi_free( result );
}

void time_rsa_gen( int nruns ) {
  struct timespec tstart, tend;
  long int diff;
  int i;

  RSA_secret_key sk;
  int noctets = KEY_LENGTH_OCTETS;
  int noctets_pq = noctets / 2;
  int nlimbs = mpi_nlimb_hint_from_nbytes(noctets);
  int nlimbs_pq = mpi_nlimb_hint_from_nbytes(noctets_pq);
  sk.n = mpi_alloc(nlimbs);
  sk.e = mpi_alloc(nlimbs);
  sk.d = mpi_alloc(nlimbs);
  sk.p = mpi_alloc(nlimbs_pq);
  sk.q = mpi_alloc(nlimbs_pq);
  sk.u = mpi_alloc(nlimbs_pq);

  clock_gettime(CLOCK_MONOTONIC, &tstart);
  for (i = 0;i < nruns; i++) {
    gen_keypair(&sk);
  }
  clock_gettime(CLOCK_MONOTONIC, &tend);

  diff = tend.tv_sec-tstart.tv_sec;
  printf("TOTAL: %ld seconds for generating %d key pairsn", diff, nruns);
  printf("Average (%d runs): %f seconds per TMSR RSA key pair.n",
        nruns, diff / (1.0*nruns));
  mpi_free(sk.n);
  mpi_free(sk.e);
  mpi_free(sk.d);
  mpi_free(sk.p);
  mpi_free(sk.q);
  mpi_free(sk.u);
}

The testing suite grows as usual much more than the code itself. To put those new tests to use, there are a few more cases in the main function in tests.c:

    case 6:
      fk = fopen("keys.asc", "a");
      if ( fk == NULL )
        err("Failed to open file keys.asc!");
      fout = fopen("check_keys.asc", "a");
      if ( fout == NULL ) {
        fclose(fk);
        err("Failed to open file keys_check.asc!");
      }
      test_rsa(nruns, fk, fout);
      fclose(fk);
      fclose(fout);
      break;
    case 7:
      test_rsa_exp();
      break;
    case 8:
      time_rsa_gen(nruns);
      break;
    default:
      printf("Current test ids:n");
      printf("0 for timing entropy sourcen");
      printf("1 for entropy output testn");
      printf("2 for is_composite (Miller-Rabin) testn");
      printf("3 for timing Miller-Rabinn");
      printf("4 for random prime number generator testn");
      printf("5 for timing random prime number generatorn");
      printf("6 for testing rsa key pair generation and use;
writes to keys.asc and check_keys.ascn");
      printf("7 for testing rsa exponentiation (fixed data)n");
      printf("8 for timing rsa key pair generatorn");
  }

  return 0;
}

You are of course invited to write your own tests and run them to your satisfaction. The tests provided are meant as *minimal* tests, not by any means a full testing suite of any sort. Note that option 7 (test_rsa_exp) effectively performs the verification of asciilifeform's signature for his Chapter 6 of FFA (which you are warmly invited to read for that matter). Take it as a little exercise to change there the data so that you check instead my signature of the .vpatch for EuCrypt's chapters or any other RSA signatures you might have.

Some preliminary timings for this implementation of RSA key generation suggest that a key pair can take on average almost one hour to generate (54 minutes averaged time per key pair on a set of 30 generated pairs). This relatively long wait for a key pair is the unavoidable cost of using truly random primes as opposed to pseudo-random ones (since a non-suitable candidate is discarded rather than massaged into primality for instance) and of increasing the number of iterations that Miller-Rabin performs when checking whether a candidate number is indeed prime. Note that the timings here are not directly comparable in any case with the timings previously reported for a different version of the rsa code, due to differences in both the code itself (most significant: fewer Miller-Rabin iterations, fixed exponent) and in the choice of measurement method (most significant: the time reported here is overall execution time that is ~always more than the actual CPU time in any case).

The .vpatch for all of the above and its signature live from now on on my Reference Code Shelf with links here too, for your convenience:

In the next chapter I'll let the RSA folder alone for once (I'll get back to it later) and move on to the chosen hashing function, namely keccak. So for the next chapter, dust your keccak references and have your Ada manuals at the ready, as keccak implementation will be in Ada, no more C for a while. Hooray!


  1. Rivest, Shamir and Adleman, A Method for Obtaining Digital Signatures and Public-Key Cryptosystems 

  2. It took only 4 chapters or otherwise a whole month not to mention the work done prior to that, just to get to actually generate some RSA keys but hey, it could have been worse! By worse I mean I could have taken the "easy" route, naively using GnuPG itself, trusting it because everybody uses it and it is open source and it's been around for so long and it says it does RSA and Miller-Rabin and random numbers and saying it is surely now just as good as doing it only a whole lot easier, isn't it? 

  3. This gives the number of co-primes between 0 and the given number. As it is a multiplicative function, the totient phi(p*q) = phi(p)*phi(q) and since p and q are chosen to be prime numbers, they will each have all smaller positive numbers as their co-primes. Consequently, phi(n) = phi(p*q) = (p-1)*(q-1). 

  4. Note that the user is the one who reads and understands what they are running, as a bare minimum requirement. Most pointedly, a monkey pressing some/any/all keys is not user of anything, it's still a monkey and nothing else. 

  5. These 2 options are the only acceptable ways of dealing with a problem really: ideally you solve it, meaning you address its root cause and from there up; if the ideal option is not chosen because of any less-than-ideal but very real world constraints of any kind, the honest approach is to flag the issue, making it, if at all possible, even more visible than it was, certainly not less visible! 

January 4, 2018

EuCrypt Chapter 4: Random Prime Number Generator

Filed under: EuCrypt — Diana Coman @ 10:43 p.m.

~ This is part of the EuCrypt series. Start with Introducing EuCrypt. ~

2018 starts well for EuCrypt as I finally get to put to some use all the building blocks of years past and then simply expand the library further. The aim of this chapter is to provide a random prime number generator (rpng) that actually fits the name. As usual, the code is commented and choices made are discussed in more detail here. Following the sane TMSR design principle of "fits in head", the algorithm is as simple as possible: keep getting random odd numbers of the required length until a prime one is chanced upon. The signature of the procedure doing this can be found in eucrypt/smg_rsa/include/smg_rsa.h:

/**
 * Generates a random number that has passed the Miller-Rabin test for primality (see function is_composite above).
 * NB: top 2 bits and bottom bit are ALWAYS 1! (i.e. a mask 110....01 is applied to the random bits)
 *    a prime of 8*noctets long will have only (8*noctets-3) bits that are randomly chosen!
 * NB: this method does NOT allocate space for the requested MPI; it is the caller's responsibility to allocate it!
 * The source of randomness is ENTROPY_SOURCE in eucrypt/smg_rsa/include/knobs.h
 * The number of witnesses checked by Miller-Rabin is M_R_ITERATIONS in eucrypt/smg_rsa/include/knobs.h
 * Preconditions:
 *      noctets > 0 (at least one octet!)
 *      output has known allocated memory for at least nlimbs(noctets)
 *      successful access to the entropy source
 * @param noctets the length of the desired prime number, in octets
 * @param output the result: an MPI with sufficient memory allocated for a number that is noctets long
 */
void gen_random_prime( unsigned int noctets, MPI output);

Before even going to the implementation, note here two design choices:

  1. The length of the desired random prime number is expected by gen_random_prime in *octets* (bytes if you must but the point is that this is 8 bits and nothing else) and not in bits! The conversion between bits and octets is of course straightforward but the choice is made here to use octets because this reflects more clearly the underlying reality: while individual bits can certainly be altered or otherwise worked with, the data type in use at all further levels down (unsigned char) is still an octet long rather than a bit long.Obviously, nothing stops you, the user, from making a different choice and changing this to use number of bits if that fits your needs better but do make sure you follow through all the code that is used and make any other changes that might be required to get exactly what you want.
  2. The result is returned via the second parameter to gen_random_prime, an MPI called output. Moreover, gen_random_prime does *not* allocate memory for this MPI! It is instead the caller's duty to make sure that the MPI they provide as argument when calling gen_random_prime has indeed enough memory allocated for the desired length. This makes memory allocation and de-allocation neatly happen in the same place (i.e. the caller) rather than splitting them needlessly and messily across function calls. While there is no way around the fact that a pointer has to be passed on here from a piece of code to another, there really is no need to split the responsibility of memory allocation and de-allocation: the one who allocates memory should de-allocate it as well and that is the caller here.Note that gen_random_prime will do however as much as it can to ensure that it doesn't proceed working with unallocated memory. Concretely, gen_random_prime checks as a precondition the known allocated memory for the output MPI and if the check fails, the execution is aborted, *not* papered over with some "fix" where it doesn't belong. Failure as the best sort of feedback that can't be ignored is really the most appropriate response to a mess-up so that's what gen_random_prime aims for. Don't rely on it to do more than it says. Just do your own share of work and call it properly with correctly allocated MPI and it will then gladly do its job, simple as that.

While gen_random_prime is by design flexible to accommodate any number of octets you might wish to give as length of your desired prime MPI, TMSR RSA specification is rather strict on this: RSA keys are meant to be exactly 4096 bits (512 octets) long and made of 2 distinct 2048 bits (256 octets) long primes. Consequently, the RSA key length is defined in EuCrypt as a *constant* and *not* as a knob. You'll find it therefore in the same file as the above, eucrypt/smg_rsa/include/smg_rsa.h:

/*
 * These are constants as per TMSR RSA specification, NOT knobs!
 * TMSR key length is 4096 bits (512 octets); this means 2 primes of 2048 bits (256 octets) each.
 * NB: if you choose here an odd key length in octets you might end up with a smaller actual key, read the code.
 */
static const int KEY_LENGTH_OCTETS = 512;

The actual implementation of gen_random_prime is in eucrypt/smg_rsa/primegen.c and it is quite straightforward:

void gen_random_prime( unsigned int noctets, MPI output )
{
  /* precondition: at least one octet long */
  assert(noctets > 0);

  /* precondition: enough memory allocated for the limbs corresponding to noctets */
  unsigned int nlimbs = mpi_nlimb_hint_from_nbytes(noctets);
  assert(mpi_get_alloced(output) >= nlimbs);

  /* precondition: access to the entropy source */
  int entropy_source = open_entropy_source(ENTROPY_SOURCE); /* source of random bits */
  assert(entropy_source >= 0);

  unsigned int nbits = 8*noctets;                           /* length of MPI in bits */

  /*
   * loop until a prime is found: get noctets of random bits, trim and apply 110...01 mask, check if prime
   */
  unsigned char *p = xmalloc( noctets );
  do {
    get_random_octets_from( noctets, p, entropy_source );
    mpi_set_buffer( output, p, noctets, 0); /* convert to MPI representation */
    mpi_set_highbit( output, nbits - 1 );   /* trim at required size and set top bit */
    mpi_set_bit( output, nbits - 2);          /* set second top bit */
    mpi_set_bit( output, 0 );               /* set bottom bit to ensure odd number */
  } while (is_composite(output, M_R_ITERATIONS, entropy_source));

  /* tidy up, a prime was found */
  xfree(p);
  close(entropy_source);
}

As expected, the first things gen_random_prime does is to check its clearly stated preconditions: a minimal length of 1 octet for the requested number (there is no such thing as a number represented on 0 octets, that's nonsense!); enough allocated memory at least as far as one can tell from within this code; accessible entropy source. Note that the mpi_get_alloced function is a new addition to mpi, since there was as far as I could tell no other function to expose this simple information: how much memory is allocated for this here MPI? Sure, there *was* direct access to this information via an MPI's internal structure, meaning calling something of the sort mpi->alloced. However, using this directly is messy because it forces the calling code to *know* the internal structure of an MPI, which is a bad idea and if you can't give at least 2 reasons why then you have no business writing any code yet, go and read, re-read and especially understand some basic books first1. The new mpi_get_alloced function has its signature in eucrypt/mpi/include/mpi.h, which is the header included by any user of the mpi library:

int mpi_get_alloced (MPI a);  /* returns the allocated memory space for this MPI, in number of limbs */

The implementation of mpi_get_alloced is a tiny bit of code in eucrypt/mpi/mpiutil.c where it belongs together with the other utility functions for MPIs:

/*
 * Returns the allocated space for the given MPI, as number of limbs.
 */
int
mpi_get_alloced (MPI a)
{
  return a->alloced;
}

After all three preconditions of gen_random_prime have passed the checks, the next step is simply to loop through random numbers of the required size until a prime one is found. Note that each random number will have only 8*noctets-3 actually random bits. This is because the top 2 bits and the bottom bit are all fixed at 1. Since the goal is to find a large prime number, the bottom bit has to be 1 for basic mathematical reasons (as otherwise the number is even, hence divisible by 2). The top 2 bits are set at 1 however for length-related reasons: both those bits need to be 1 to ensure that in all cases the resulting RSA key is indeed of the desired 4096 bits length (although of those 4096 bits the top and bottom ones will always be 1).

Note that EuCrypt's random prime number generator relies on two main things: a truly random source of bits (the excellent Fuckgoats from S.NSA) and the Miller-Rabin algorithm previously implemented and discussed in Chapter 3 of this series. This is in rather stark contrast with the GnuPG implementation that takes up a lot more space to do essentially the following: use a pseudo-random source of bits, try first all small primes between 3 and 49992, then do a Fermat test, then do the Miller-Rabin test with a fixed base 2 and only then do the Miller-Rabin test with 4 more pseudo-randomly chosen bases. Does that seem "more secure" to you or merely confusing? Because no, piling up more checks without concrete and clear reason is not a way to achieve security, quite on the contrary. And EuCrypt's approach that seems "simple" by contrast to GnuPG's is not an accident nor the result of laziness, quite on the contrary: it takes a lot of work behind the scenes to end up with only what is truly useful for the task at hand. Shocking, right? Read on.

Since GnuPG's code certainly does not discuss this mess of "choices" in any way, the only thing left to do is to guess at some reasons: supposedly all the initial dancing about with small primes and Fermat would be a fast(-er, -ish?) way of discarding composites before reaching the more expensive Miller-Rabin test. However, there's hardly any clear support for such reasoning3, especially for a prime number generator that is specifically aiming to find *large* and *randomly generated* prime numbers: sure, 2 is a witness for many odd composite numbers (and presumably this is the reason for the fixed 2 witness attempt in GnuPG) but it is known also that there are *infinitely* many pseudoprimes to base 2! Moreover, there are also *infinitely* many n for which the smallest witness is greater than ln(n)^(1/(3*ln(ln(ln(n))))). And finally, Miller-Rabin, as discussed previously has very good chances of finding a witness if it really looks for them randomly, seeing how three quarters of the numbers between 1 and n-1 are reliable witnesses anyway. So if your candidate primes are really both large and randomly chosen then there is little point in introducing some pre-Miller-Rabin overhead that is anyway NOT going to work in all cases. Better increase the number of Miller-Rabin iterations and gain something useful4 instead of wasting time to try first less effective approaches just for the sake of the fact that in *some* cases they might be enough.

To see the new rpng in action, I wrote a few tests as well. And in this process I ended up simplifying the configuration of the entropy source because the previous code turned out to be unnecessarily long and rather not as fit-in-head as it could be. So the new set_usb_attribs function in eucrypt/smg_rsa/truerandom.c is this:

int set_usb_attribs(int fd, int speed) {
  struct termios tty;
  if (tcgetattr(fd, &tty) < 0) {
    return -1;
  }

  /* input and output speeds */
  cfsetospeed(&tty, (speed_t)speed);
  cfsetispeed(&tty, (speed_t)speed);

  /* raw */
  tty.c_lflag &= ~(ECHO | ECHOE | ECHOK);
  tty.c_oflag &= ~OPOST;

  /* read at least one octet at a time; BLOCK until at least VMIN octets read */
  tty.c_cc[VMIN] = 1;
  tty.c_cc[VTIME] = 0;

  if (tcsetattr(fd, TCSAFLUSH, &tty) != 0)
    return -1;

  return 0;
}

As with all changes, this got at least a bit of testing too. Testing here meant using the new version of the code to get 390MB of random data on which I ran ent and dieharder. Ent reported:

Entropy = 7.999999 bits per byte.

Optimum compression would reduce the size
of this 390672000 byte file by 0 percent.

Chi square distribution for 390672000 samples is 277.91, and randomly
would exceed this value 25.00 percent of the times.

Arithmetic mean value of data bytes is 127.5026 (127.5 = random).
Monte Carlo value for Pi is 3.141530471 (error 0.00 percent).
Serial correlation coefficient is -0.000012 (totally uncorrelated = 0.0).

Dieharder reported 3 "FAILED" tests, 5 "WEAK" tests and 106 "PASSED" tests. You can download and look at the full dieharder report and ent report for this data. Note however that you should run at least a similar test (preferably on more data, preferably several times, preferably periodically and on all your Fuckgoats) before even thinking of relying on your setup as a source of random bits.

And if the above was not enough, I also ended up writing quite a bit in... eucrypt/smg_rsa/tests/tests.c. Fancy that, I wrote more lines of code for the tests than for the actual code! Anyway, here are the new tests and the new structure of the main function in eucrypt/smg_rsa_tests.tests.c:

void time_mr(int nruns) {
  struct timespec tstart, tend;
  long int diff;
  int i;
  MPI prime;
  unsigned int noctets = KEY_LENGTH_OCTETS / 2;
  unsigned int nlimbs = mpi_nlimb_hint_from_nbytes(noctets);

  int entropy_source = open_entropy_source(ENTROPY_SOURCE);
  if (entropy_source <= 0)
    err("can't open entropy source!");

  /* first generate a prime of half key length, to make sure M-R will run max number of iterations */
  printf("Generating a prime number of %d octets length for M-R timing testn", noctets);
  prime = mpi_alloc(nlimbs);
  gen_random_prime(noctets, prime);

  printf("Running timing test for Miller-Rabin with %d repetitions and %d witnesses on prime number ", nruns, M_R_ITERATIONS);
  mpi_print(stdout, prime, 1);
  printf("n");
  /* now do the actual runs and time it all */
  clock_gettime(CLOCK_MONOTONIC, &tstart);
  for (i=0; i < nruns; i++) {
    if (is_composite(prime, M_R_ITERATIONS, entropy_source))
      printf("FAIL");
    else printf(".");
    fflush(stdout);
  }
  clock_gettime(CLOCK_MONOTONIC, &tend);

  diff = tend.tv_sec-tstart.tv_sec;
  printf("nTimings on prime number %d octets long, %d runs of MR with %d iterations (witnesses checked) eachn",
    noctets, nruns, M_R_ITERATIONS);
  printf("Total time: %ld secondsnTime per MR run: %f secondsnTime per MR iteration: %f secondsn",
       diff, diff / (1.0*nruns), diff / (1.0*nruns * M_R_ITERATIONS));

  mpi_free(prime);
  close(entropy_source);
}
void test_rpng(int nruns) {
  unsigned int noctets = KEY_LENGTH_OCTETS / 2;
  unsigned int nlimbs = mpi_nlimb_hint_from_nbytes(noctets);
  int entropy_source = open_entropy_source(ENTROPY_SOURCE);
  if (entropy_source <= 0)
    err("can't open entropy source!");

  MPI prime = mpi_alloc(nlimbs);
  int i;

  printf("TEST: random prime number generator with %d runsn", nruns);
  for (i = 0;i < nruns; i++) {
    gen_random_prime(noctets, prime);
    printf("Run %d: ", i+1);
    mpi_print(stdout, prime, 1);
    if (is_composite(prime, M_R_ITERATIONS, entropy_source))
      printf("  **FAIL**n");
    else
      printf("  **PASS**n");
  }

  mpi_free(prime);
  close(entropy_source);
}

void time_rpng(int nruns) {
  struct timespec tstart, tend;
  long int diff;

  unsigned int noctets = KEY_LENGTH_OCTETS / 2;
  unsigned int nlimbs = mpi_nlimb_hint_from_nbytes(noctets);

  int entropy_source = open_entropy_source(ENTROPY_SOURCE);
  if (entropy_source <= 0)
    err("can't open entropy source!");
  MPI prime = mpi_alloc(nlimbs);
  int i;

  printf("TIMING: random prime number generator with %d runsn", nruns);
  clock_gettime(CLOCK_MONOTONIC, &tstart);
  for (i = 0;i < nruns; i++) {
    gen_random_prime(noctets, prime);
  }
  clock_gettime(CLOCK_MONOTONIC, &tend);

  diff = tend.tv_sec-tstart.tv_sec;
  printf("TOTAL: %ld secondsn", diff);
  printf("Average: %f seconds to generate one random prime of %d octets lengthn", diff / (1.0*nruns), noctets);
  mpi_free(prime);
  close(entropy_source);
}

int main(int ac, char **av)
{
  int nruns;
  int id;
  if (ac<2) {
    printf("Usage: %s number_of_runs/octets [testID]n", av[0]);
    return -1;
  }
  nruns = atoi(av[1]);

  if (ac < 3)
    id = -1;
  else
    id = atoi(av[2]);

  switch ( id ) {
    case 0:
      printf("Timing entropy source...n");
      time_entropy_source(nruns, 4096);
      break;
    case 1:
      test_entropy_output(nruns, "entropy_source_output.txt");
      break;
    case 2:
      /* tests on miller-rabin */
      /* a few primes (decimal): 65537, 116447, 411949103, 20943302231 */
      test_is_composite(nruns, "0x10001", 0);
      test_is_composite(nruns, "0x1C6DF", 0);
      test_is_composite(nruns, "0x188DD82F", 0);
      test_is_composite(nruns, "0x4E0516E57", 0);
      /* a few mersenne primes (decimal): 2^13 - 1 = 8191, 2^17 - 1 = 131071, 2^31 - 1 = 2147483647 */
      test_is_composite(nruns, "0x1FFF", 0);
      test_is_composite(nruns, "0x1FFFF", 0);
      test_is_composite(nruns, "0x7FFFFFFF", 0);
      /* a few carmichael numbers, in decimal: 561, 60977817398996785 */
      test_is_composite(nruns, "0x231", 1);
      test_is_composite(nruns, "0xD8A300793EEF31", 1);
      /* an even number */
      test_is_composite(nruns, "0x15A9E672864B1E", 1);
      /* a phuctor-found non-prime public exponent: 170141183460469231731687303715884105731 */
     test_is_composite(nruns, "0x80000000000000000000000000000003", 1);
      break;
    case 3:
      time_mr(nruns);
      break;
    case 4:
      test_rpng(nruns);
      break;
    case 5:
      time_rpng(nruns);
      break;
    default:
      printf("Current test ids:n");
      printf("0 for timing entropy sourcen");
      printf("1 for entropy output testn");
      printf("2 for is_composite (Miller-Rabin) testn");
      printf("3 for timing Miller-Rabinn");
      printf("4 for random prime number generator testn");
      printf("5 for timing random prime number generatorn");
  }

  return 0;
}

There are 3 new tests in there: one for timing Miller-Rabin (id 3), one for the functioning of the rpng itself (id 4) and one for timing the rpng (id 5). The timing test for Miller-Rabin first generates (with the rpng described in this post) a prime number of 256 octets since this is the length that is most relevant to EuCrypt. Then a timer is started and the Miller-Rabin test is ran repeatedly on this known-prime for the requested number of times. At the end, the total time as well as the average time per M-R run and per M-R iteration are reported. As the number is known to be prime, M-R will always run all its 16 iterations and therefore the test can actually calculate the average time per iteration. On 1000 runs, this test reported an average of 9.78 seconds per M-R run, which corresponds to 0.61 seconds per iteration of M-R.

The test of rpng requests gen_random_prime to generate a random prime number of 256 octets and then it runs another time the Miller-Rabin test on it reporting fail or pass depending on the result. This is obviously a very basic test of gen_random_prime - it's meant at this stage more as an example of use rather than anything else. Run it though and see for yourself or modify it/add to it as you see fit, as usual.

The timing test for the rpng starts a timer and then calls gen_random_prime as many times as requested, reporting at the end the total time as well as the resulting average time per prime. A relatively short test run obtained 40 random primes of 2048 bits each in 13274 seconds in total (3.7 hours) meaning on average 331.85 seconds per prime (~6 minutes).

Finally, the .vpatch itself and its corresponding signature (you'll need all the previous patches from the EuCrypt series to be able to press to this one):

On a slightly different note for the end: this "simple" loop rpng of a chapter ended up close to 3000 words in total here on the blog and I didn't even include all the references considered; chapter 3 just before this one was 3500 words; correcting MPI before that was a mere 2000 words; chapter 2 was 1500 words; the introduction to EuCrypt was close-to-but-not-quite 1000 words, as befits an introduction. At this rate I'm clearly writing here some brick of a book and most of it is certainly NOT code, but all the discussion around it. Shock and horror and all that. See you at the next chapter!


  1. The Art of Computer Programming by Donald Knuth is an absolute must. 

  2. WHY 4999? Magic number! 

  3. To be fair, there probably was not much reasoning really, just successive additions of code on top of existing code, always adding, rarely even considering how to prune the resulting mess. They writing the code had no time to write short and "simple" code, so they wrote it long and messy. But they worked hard and had all the good intentions (those can be as long as you like really, they don't cost anything), you know? 

  4. as more M-R iterations lower at least the maximum probability of error, since p(error)< (1/4)^iterations for M-R. 

December 28, 2017

EuCrypt Chapter 3: Miller-Rabin Implementation

Filed under: EuCrypt — Diana Coman @ 9:26 p.m.

~ This is part 4 of the EuCrypt series. Start with Introducing Eucrypt. ~

Primality testing1 is a key part of any implementation of RSA2 and therefore a key part of EuCrypt as well. At first glance, there is a wide choice of primality tests that one can use, from naive direct divisions in search of factors to prime number generators and statistical primality tests. However, many of those are not currently very practical for EuCrypt as they simply take too long to run on the sort of very large numbers that RSA has to use to avoid the simplest brute-force attacks. As a result, the choice narrows down considerably to include mainly probabilistic primality tests: Fermat, Solovay-Strassen and Miller-Rabin. Of those, Miller-Rabin is simply best currently: it has the lowest error probability upper bound among the three and it is at most as expensive computationally as the others. Arguably a deterministic, polynomial-time algorithm such as AKS3 would be even better than Miller-Rabin, but unfortunately AKS is currently still too slow for EuCrypt's needs. Consequently, the chosen algorithm for primality testing in EuCrypt is Miller-Rabin mainly because of a lack of a working better alternative.

If you think that "everyone uses Miller-Rabin anyway", think again. While GnuPG and mostly everyone else using RSA is indeed likely to claim that they are also using Miller-Rabin for primality testing, you'd be well advised to check such claims very, very closely because claims are just cheap labels that stick to anything just the same. If you actually take the time to check those claims and then necessarily find yourself peeling down label after label trying to get to the actual thing that is there as opposed to what the labels claim it is, you might find that every new label further dilutes the original meaning. That's how Koch in his GnuPG sticks the "random" label on pseudo-random at best, that's how 4096 bit randomly-generated keys contain actually at most 4090 pseudo-randomly generated bits and so on until you might even find as I did last time that bits and parts of the implementation do nothing in fact. Don't take my word for it either: go and check for yourself, it's a very healthy habit that might save your very skin some day.

Despite being called a "primality test", Miller-Rabin (like all the other probabilistic primality tests) is more of a compositeness test: the algorithm can prove that a number is composite, but it can not actually prove that a number is indeed prime. Essentially, the given number is suspected of the crime of being composite (as opposed to the desired prime) and witnesses for its compositeness are sought. If one single witness for compositeness is found, then the given number is indeed composite. However, if no witness is found, Miller-Rabin can only reach a relatively weaker conclusion, namely that the given number is likely to be prime. How likely? That depends to a significant degree on the choice of candidate witnesses: how many candidate witnesses the algorithm was asked to investigate and how it actually chose them.

In its search for witnesses, Miller-Rabin relies on the important fact that most numbers between 1 and n-1 are reliable witnesses for n, if n is indeed a composite number. More precisely, at most 1/4 of those numbers are strong liars for n, meaning that at most 1/4 of them will fail to reveal the compositeness of n, when investigated by Miller-Rabin. As a result, the more witnesses investigated, the lower the chances of a composite number to pass for a prime one. Assuming that witnesses are indeed chosen randomly4, the algorithm's error probability is at most (1/4)^t, where t is the number of witnesses investigated. Obviously, each additional witness adds to the cost of running the algorithm and for this reason EuCrypt exposes this as a knob5 for you, the user, to set depending on your own needs. Use it and use it wisely!

The updated eucrypt/smg_rsa/include/knobs.h:

#ifndef SMG_RSA_KNOBS_H
#define SMG_RSA_KNOBS_H

#define ENTROPY_SOURCE "/dev/ttyUSB0"

/*
 * This is the number of witnesses checked by the Miller-Rabin (MR) algorithm for each candidate prime number.
 * The value of M_R_ITERATIONS directly affects the outer bound of MR which is calculated as 4^(-M_R_ITERATIONS)
 * S.MG's choice of 16 here means an outer bound of 4^(-16) = 0.0000000002,
    which is currently considered sufficient for Eulora's needs.
    If your needs are different, change this knob accordingly.
 * NB: if you use this to make keys for some serious use, an outer bound of 1e-10 is really not nearly good enough
    and therefore you'll probably want to *increase* the value of this knob.
 */
#define M_R_ITERATIONS 16

#endif /*SMG_RSA_KNOBS_H*/

The default EuCrypt value for M_R_ITERATIONS is 16 and that means 16 randomly chosen candidate witnesses that are checked. By contrast, GnuPG 1.4.10 at first glance appears to check 5 candidate witnesses (as per cipher/primegen.c call is_prime(ptest, 5, &count2)) and at a deeper investigation it turns out that it checks 1 fixed witness (magic number 2, because why not) and 4 pseudo-randomly chosen ones at best. The label that was 5 but acted more like 4 and the parameter that didn't quite stand for what you'd expect, isn't that precisely the sort of thing you want in your cryptographic tool? No? Then stop using code from the swamps, start using signed code and in any case always read the darned code before you use it because otherwise that's exactly what you will get, each and every time: something other than what it seems, something continuously and rather invisibly to you drifting further away from what you need.

Leaving aside GnuPG for now, let's dive straight in and implement Miller-Rabin using the MPI functions as if they actually worked well6. The algorithm is quite straight forward and the code aims to be as short and clear as possible, with comments to help you follow along. The function is called is_composite, to reflect the fact that Miller-Rabin really checks for compositeness, regardless of the fact that we might prefer it to be otherwise. The n parameter is the actual large number (hence, stored as an MPI) that is suspected of being composite. The nwitnesses parameter is the number of randomly chosen witnesses to check (this is called "security parameter" in some reference books, most notably in Handbook of Applied Cryptography by Menezes, van Oorschot and Vanstone, 1997). You can also think of this nwitnesses as "number of iterations" because each iteration is effectively the check of one candidate witness. Finally, the third parameter, entropy_source, is the handler of an already opened and properly configured source of true random bits (see Chapter 2 for how this is set up in EuCrypt). First, the added function signature in eucrypt/smg_rsa/include/smg_rsa.h:

/*********primegen.c*********/

/*
 * This is an implementation of the Miller-Rabin probabilistic primality test:
 *   checking the specified number of randomly-chosen candidate witnesses
 *    (i.e. with an outer bound of (1/4)^nwitnesses).
 * NB: a 1 result from this test means that the given n is indeed composite (non-prime)
    but a 0 result does not fully guarantee that n is prime!
    If this doesn't make sense to you, read more on probabilistic primality tests.
 * @param n the candidate prime number;
    the function will investigate whether this number is composite or *likely* to be prime.
    How likely? It depends on the number of witnesses checked, see next parameter.
 * @param nwitnesses this is the number of randomly chosen candidate witnesses to the compositeness of n
      that will be checked; the outer bound of the algorithm depends on this.
 * @param entropy_source the source of entropy (ready to read from) that will be used
    to choose candidate witnesses to the compositeness of n.
* @return 1 if at least one witness to the compositeness of n has been found
      (i.e. n is certainly composite);
      0 if no witness to the compositeness of n was found (i.e. it is likely that n is prime)
 * NB: the probability that n is *not* prime although this function returned 0 is
      less than (1/4)^nwitnesses, but it is NOT zero.
 */
int is_composite( MPI n, int nwitnesses, int entropy_source);

And the corresponding implementation, in a new file eucrypt/smg_rsa/primegen.c :

/* primegen.c - prime number generation and checks
 *
 * S.MG, 2017
 *
 */

#include
#include
#include

#include "smg_rsa.h"

/****************
 * is_composite
 * Returns 1 if it finds evidence that n is composite and 0 otherwise.
 * NB: this is a probabilistic test and its strength is directly linked to:
 *  - the number of witnesses AND
 *  - the quality of the entropy source given as arguments!
 */

int is_composite( MPI n, int nwitnesses, int entropy_source) {
  int evidence = 0;
  int nlimbs = mpi_get_nlimbs(n);       /* see MPI representation   */
  unsigned int nbits = mpi_get_nbits(n);        /* used bits        */
  unsigned int noctets = (nbits + 7) / 8; /* source works on octets */
  MPI nminus1 = mpi_alloc(nlimbs);      /* n-1 value is used a LOT  */
  MPI mpi2 = mpi_alloc_set_ui(2);         /* 2 as MPI               */
  MPI a = mpi_alloc(nlimbs);      /* candidate witness              */
  MPI y = mpi_alloc(nlimbs);      /* intermediate values            */
  MPI r = mpi_alloc(nlimbs);      /* n = 1 + 2^s * r                */
  int s;                          /* n = 1 + 2^s * r                */
  int j;                          /* counter for loops              */
  int nread;              /* number of random octets actually read  */

  /* precondition: n > 3 */
  assert( nbits > 2 );

  /* nminus1 = n - 1 as MPI                                         */
  mpi_sub_ui( nminus1, n, 1);

  /*
   * find r odd and s so that n = 1 + 2^s * r
   * n-1 = 2^s * r
   * s is given by the number of trailing zeros of binary n-1
   * r is then obtained as (n-1) / (2^s)
   */
  s = mpi_trailing_zeros( nminus1 );
  mpi_tdiv_q_2exp(r, nminus1, s);

  /*
   * Investigate randomly chosen candidate witnesses.
   * Stop when either:
      * the specified number of witnesses (nwitnesses) have
        been investigated OR
      * a witness for compositeness of n was found
   */
  while (nwitnesses > 0 && evidence == 0) {
    unsigned char *p = xmalloc(noctets);
    do {
      nread = get_random_octets_from(noctets, p, entropy_source);
    } while (nread != noctets);

    mpi_set_buffer(a, p, noctets, 0);
    /* ensure that a < n-1 by making a maximum nbits-1 long:
        * clear all bits above nbits-2 in a
        * keep value of bit nbits-2 in a as it was
    */
    if (mpi_test_bit(a, nbits - 2))
      mpi_set_highbit(a, nbits - 2);
    else
      mpi_clear_highbit(a, nbits - 2);

    /* ensure that 1 < a < n-1; if not, try another random number
     * NB: true random means a CAN end up as 0 or 1 here.
     */

    if (mpi_cmp(a, nminus1) < 0 && mpi_cmp_ui(a, 1) > 0) {
      /* calculate y = a^r mod n */
      mpi_powm(y, a, r, n);
      if (mpi_cmp_ui(y, 1) && mpi_cmp(y, nminus1)) {
        j = 1;
        while ( (j < s) && mpi_cmp(y, nminus1) && (evidence == 0) ) {
          /* calculate y = y^2 mod n */
          mpi_powm(y, y, mpi2, n);
          if (mpi_cmp_ui(y, 1) == 0)
            evidence = 1;
          j = j + 1;
        } /* end while */
        if (mpi_cmp(y, nminus1))
          evidence = 1;
      } /* end if y != 1 and y != n-1 */
      nwitnesses = nwitnesses - 1;
    } /* end if 1 < a < n-1 */
    xfree(p);
  } /* end while for investigating candidate witnesses */

  mpi_free( nminus1 );
  mpi_free( mpi2 );
  mpi_free( a );
  mpi_free( y );
  mpi_free( r );

  return evidence;
}

The variable evidence is initially 0 as Miller-Rabin does not yet have any evidence about n being composite. When and if evidence of compositeness is found for n, this variable will get updated to 1. If the whole algorithm finishes without updating this variable, it means that n is probably prime, with a maximum probability error of (1/4)^nwitnesses, as previously discussed. In any case, this variable holds the result of the Miller-Rabin test at any moment and its value is the one returned when the test finishes.

The nlimbs and nbits are basically measures of how long the MPI n actually is and they are initialised with values returned by the corresponding MPI functions. The nbits value is then converted to number of octets (in noctets) for the very simple reason that the source of randomness in EuCrypt anyways reads at the moment full octets rather than individual bits. This is of course a matter of choice as you could change the setting of the source to read bit by bit, but I can't quite see at the moment any significant advantage to that.

Having obtained this basic length-information on n, the function then goes on to declare and allocate memory for a set of MPI variables that it will need for the Miller-Rabing algorithm itself. Note that there is *no* use of the so-called "secure" memory thing from MPI for the unfortunate reason that the existing implementation of secure is very much an empty label: all it does is to set a flag so that theoretically the memory is not swapped, but there is no guarantee to either that or to the more useful fact that nobody else can read that memory. So given that there is in fact no secure memory implementation no matter how much it would be useful if there was one, EuCrypt takes instead the honest and practical approach of making it clear that it uses plain memory and nothing else. No label if no actual matching object to stick it on, as simple as that.

Once the needed variables are declared and initialised when appropriate, a precondition is checked: assert( nbits > 2); What's this? It's a sanity check essentially because Miller-Rabin is meant for checking large integers, not tiny ones. Moreover, due to the insanity of the underlying MPI which considers in its infinite stupidity that 0 for instance is represented on 0 bits, tiny values of less than 4 (hence, represented on less than 3 bits) will... block the whole thing. Let me point out for now just the very simple fact that the algorithm uses nbits-1 and nbits-2 meaning that nbits should better be at least 3 or otherwise it ends up trying to work with numbers represented on 0 bits and other such nonsense. So instead of risking working with nonsense, EuCrypt uses this assert call to abort the whole thing rather than propagate the nonsense even one instruction further.

Oh, if you wonder by any chance whether GnuPG bothers to even consider such corner cases, that's good for you. I'm sure it's all right if they don't because such cases "should never happen" and "nobody calls Miller-Rabin on small numbers" and all those wonderful castles of trust built on nothing but air. Come to think about it, I even enjoy blowing up such air-supported castles and what not, they make a most-satisfying poufffff! Do you enjoy living in them? POUFFFFF!

The rest of the function closely follows the Miller-Rabin algorithm and the comments in the code hopefully make it easier to understand what each line does even when using the MPI calls. Note that candidate witnesses are chosen indeed randomly by using the specified source and making sure that the call truly returned the requested number of random octets. However, the actual number of random bits in any candidate random number will be by necessity nbits-1 because of the constraint that the random candidate should be less than n-1. This constraint is enforced by simply clearing any bits above nbits-2 (bit numbering starts at 0 so last bit is n-1 rather than n) but keeping at the same time the value of bit on position nbits-2. If that was 1 then mpi_set_highbit(a, nbits-2) is called. If that was 0 then mpi_clear_highbit(a, nbits-2) is called instead.

Note that those 2 mpi functions (mpi_set_highbit and mpi_clear_highbit) are supposedly similar in that they clear all bits above the position indicated and otherwise set to 1 or, respectively, to 0 the bit on the given position. However, the actual code reveals that they are not entirely similar: mpi_set_highbit allocates more memory if the position given is above the current size of the mpi; mpi_clear_highbit doesn't allocate memory in this case. This means effectively that mpi_set_highbit returns an mpi of specified length, while mpi_clear_highbit returns always an mpi of length smaller than the specified bit position. At first glance this might seem to make some sense but the reality is worse than that: mpi_clear_highbit sometimes trims the leading 0 bits of a number and sometimes... doesn't! Possibly for this reason of rather dubious behaviour, GnuPG's Miller-Rabin avoids using mpi_clear_highbit entirely and dances around instead with double calls to mpi_set_highbit instead, on both branches of the if. Since I'm doing coding here rather than voodooing or naked-dances-in-the-code, I fixed instead mpi_clear_highbit to at least reliably trim leading 0s at all times and I'm using it where it is needed. The memory allocation issue is not relevant for my code here anyway because there is already enough memory allocated for the MPI at the beginning of this function.

A core aspect of any Miller-Rabin implementation is the way in which candidate witnesses are chosen. EuCrypt chooses them entirely at random in the interval [2, n-2]. Different options were considered, most notably that of choosing a smaller, potentially more relevant interval, for instance by increasing the lower bound of this interval to at least 256 (2^8). Moreover, according to Eric Bach's paper7, an appropriate upper bound for witnesses would be 2*log(n)*log(n). However, Bach's result relies on the Extended Riemann Hypothesis (ERH) which hasn't been proved so far. So although those bounds are very appealing, EuCrypt sticks for now with using randomly chosen witnesses over the whole interval.

To sum it all up, the main changes brought by today's vpatch are the following:

  1. Addition of primegen.c as part of smg_rsa. This includes the actual implementation of the Miller-Rabin algorithm with the choices discussed above. It uses the MPI implementation introduced in Chapter 1 and corrected along the way.
  2. Changes to MPI, namely further identification and, to the extent needed for current needs of EuCrypt, the killing of additional cockroaches that were identified in the MPI code as a result of investigating the functions that are needed for Miller-Rabin. Most notably: fixing mpi_clear_highbit so that it always trims leading 0 if any, as opposed to current functionality where it sometimes trims them and sometimes not; identifying the fact that MPI currently considers that 0 is represented on 0 bits. Note that this last issue is flagged up and made obvious through updated tests but it is not changed mainly because following all its potential implications through MPI at this stage would eat up so much time as to make it cheaper to just implement from scratch something solid at least. So for the time being at least, the decision made is to honestly admit and clearly highlight this existing fault of MPI.
  3. New tests for MPI highlighting the new issues uncovered.
  4. New tests for smg_rsa focusing on Miller-Rabin: the testing program allows the user to specify the number of runs for each data point and reports a test as failed if at least one run returned a result different from the one expected; test data for Miller-Rabin is chosen to include a few mersenne primes, carmichael composites and a phuctor-found non-prime public exponent of someone's RSA key. Feel free to add to them anything you consider relevant and then run those tests!
  5. A small change to the function that fetches random octets from the FG: the errno is now set to 0 prior to every call that reads from the USB port and its value is then checked in addition to the return value of the read function. This change is needed in order to avoid the unfortunate case when no bits are read but there is apparently no underlying error. Previous version of my function would still abort in such case but current version would instead keep trying as this is a more useful approach for EuCrypt.

The updated get_random_octets_from function in eucrypt/smg_rsa/truerandom.c:

int get_random_octets_from(int noctets, unsigned char *out, int from) {

  int nread;
  int total = 0;

  while (total < noctets) { errno = 0; nread = read(from, out+total, noctets-total); //on interrupt received just try again if (nread == -1 && errno == EINTR) continue; //on error condition abort if (errno != 0 && (nread == -1 || nread == 0)) { printf("Error reading from entropy source %s after %d read: %sn", ENTROPY_SOURCE, total, strerror(errno)); return total; //total read so far } if (nread > 0)
      total = total + nread;
  }
  return total; //return number of octets read
}

The new test_is_composite function in eucrypt/smg_rsa/tests/tests.c:

void test_is_composite(int nruns, char *hex_number, int expected) {
  int i;
  int output;
  int count_ok = 0;
  int source = open_entropy_source(ENTROPY_SOURCE);
  MPI p = mpi_alloc(0);

  mpi_fromstr(p, hex_number);
  printf("TEST is_composite on MPI(hex) ");
  mpi_print(stdout, p, 1);
  for (i=0; i < nruns; i++) {
    printf(".");
    fflush(stdout);
    output = is_composite(p, M_R_ITERATIONS, source);
    if (output == expected)
      count_ok = count_ok + 1;
  }
  printf("done, with %d out of %d correct runs for expected=%d: %sn", count_ok, nruns, expected, count_ok==nruns? "PASS"$
  mpi_free(p);
  close(source);
}

The updated main in eucrypt/smg_rsa/tests/tests.c:

int main(int ac, char **av)
{
  int nruns;
  int id;

  if (ac<2) {
    printf("Usage: %s number_of_runs [testID]n", av[0]);
    return -1;
  }
  nruns = atoi(av[1]);

  if (ac < 3) id = 0; else id = atoi(av[2]); if (id == 0 || id == 1) { printf("Timing entropy source...n"); time_entropy_source(nruns,4096); } if (id == 0 || id == 2) { /* a few primes (decimal): 65537, 116447, 411949103, 20943302231 */ test_is_composite(nruns, "0x10001", 0); test_is_composite(nruns, "0x1C6DF", 0); test_is_composite(nruns, "0x188DD82F", 0); test_is_composite(nruns, "0x4E0516E57", 0); /* a few mersenne primes (decimal): 2^13 - 1 = 8191, 2^17 - 1 = 131071, 2^31 - 1 = 2147483647 */ test_is_composite(nruns, "0x1FFF", 0); test_is_composite(nruns, "0x1FFFF", 0); test_is_composite(nruns, "0x7FFFFFFF", 0); /* a few carmichael numbers, in decimal: 561, 60977817398996785 */ test_is_composite(nruns, "0x231", 1); test_is_composite(nruns, "0xD8A300793EEF31", 1); /* an even number */ test_is_composite(nruns, "0x15A9E672864B1E", 1); /* a phuctor-found non-prime public exponent: 170141183460469231731687303715884105731 */ test_is_composite(nruns, "0x80000000000000000000000000000003", 1); } if (id > 2)
    printf("Current test ids: 0 for all, 1 for entropy source test only, 2 for is_composite test only.");

  return 0;
}

Finally, the .vpatch itself and my signature (you'll need all previous .vpatches of EuCrypt to press this one):

In the next chapter we take one step further towards having RSA, so stay tuned, drink only the good stuff and make sure you won't go... POUFFF!


  1. Primality testing means answering the question: is this number prime or not? 

  2. If this is not clear to you, it's best to just review RSA itself. In a nutshell: the whole working of RSA as cryptographic tool relies on properties of prime numbers; both secret and private RSA keys are essentially made out of prime numbers. Don't eat only this nutshell though - better read the original paper by Rivest, Shamir and Adleman: A Method for Obtaining Digital Signatures and Public-Key Cryptosystems

  3. Agrawal, Kayal and Saxena, Primes is in P, Ann. of Math, Vol. 2, pp. 781-793. 

  4. Which is *not* the case in GnuPG where first witness is actually...fixed and the rest are anyway chosen pseudo-randomly, go and read that code. By contrast, EuCrypt actually chooses them randomly, using a true random number generator, the FG. 

  5. Like all other knobs, this can be found in include/knobs.h 

  6. No, they don't, surprise, surprise. So I'll fix what I use and otherwise at least highlight what other problems I find, as I find them, such is this wonderful world we live in. 

  7. Bach, Eric, 1990. Explicit Bounds for Primality Testing and Related Problems. Mathematics of Computation, 55, pp. 355-380. 

December 21, 2017

EuCrypt: Correcting MPI Implementation

Filed under: EuCrypt — Diana Coman @ 9:53 p.m.

~ An unexpected part of the EuCrypt library series. Start with Introducing EuCrypt. ~

This is a sad but necessary interruption in the EuCrypt series itself: although coming immediately after chapter 2, this is not chapter 3 at all, I'm sorry to say. Instead of adding another useful part of smg-rsa as the actual chapter 3 does, I'm forced by the uncovered reality in the field to publish this correction: a correction of a coding error that has lived for years in the very much used GnuPG 1.4.10 open sore code. As a wonderful and truly epic example of all the great qualities of the open-source approach that thrives on thousands of eyes that surely quickly and accurately find and correct any bug, this error has survived perfectly fine until now, undetected and even papered over and buried deeper when it tried to manifest! It is quite natural after all, not to mention according to the "law": given enough eyeballs, any bug will be quickly plastered over and buried as deep as it can be. There, Mr. Raymond, I fixed that Bazaar description for you. Let it be said again and again and forever that in anything that matters, it's never quantity that matters, but always quality.

The error itself is a tiny thing with a huge impact: a single wrong letter in the wrong place, transforming a piece of code meant to copy over a whole MPI into a piece of code that does precisely... nothing1. Obviously, typing the wrong letter can and does happen from time to time to anyone who types. However, it takes a very sloppy sort of monkey pushing the keys to fail to notice its effect, namely that the resulting code does... nothing. You'd think that anyone would at least read their code once more at a later time to catch such silly mistakes, pretty much like one reads any other text for mere proofreading if nothing more. You'd think that anyone at anytime would at least run their code once to see it doing the thing they wanted it to do, wouldn't you? Well, it clearly shows this is not the way of the great Bazaar, no. So let me show you the resulting code that achieves nothing but still actually eats up some resources whenever called, here you are, straight from mpi/include/mpi-internal.h:

#define MPN_COPY_INCR( d, s, n)
    do {
  mpi_size_t _i;
  for( _i = 0; _i < (n); _i++ )
      (d)[_i] = (d)[_i];
    } while (0)

 

Speaking of eyeballs, take your time a bit and spot the error. Then call your child or your grandpa over, ask them to spot the error too and let me know how this addition of another pair of eyeballs clearly helped.

Once the error is spotted, the reader might ask the obvious question, of course: "why does the caller appear to work?" The answer to this is... layered let's say. The caller does not actually work in fact, unsurprisingly. The caller is a method called mpi_tdiv_q_2exp(MPI w, MPI u, unsigned count). This method supposedly shifts u by count bits to the right and stores the result in w. Except that it doesn't actually *always* do this: in some cases, all it does is to trim from w count bits and nothing more because it relies on the above good-for-nothing code to copy the relevant bits from u to w. Here, let's look at its code that is found in mpi/mpi-div.c:

void
mpi_tdiv_q_2exp( MPI w, MPI u, unsigned count )
{
    mpi_size_t usize, wsize;
    mpi_size_t limb_cnt;

    usize = u->nlimbs;
    limb_cnt = count / BITS_PER_MPI_LIMB;
    wsize = usize - limb_cnt;
    if( limb_cnt >= usize )
  w->nlimbs = 0;
    else {
  mpi_ptr_t wp;
  mpi_ptr_t up;

  RESIZE_IF_NEEDED( w, wsize );
  wp = w->d;
  up = u->d;

  count %= BITS_PER_MPI_LIMB;
  if( count ) {
      mpihelp_rshift( wp, up + limb_cnt, wsize, count );
      wsize -= !wp[wsize - 1];
  }
  else {
      MPN_COPY_INCR( wp, up + limb_cnt, wsize);
  }

  w->nlimbs = wsize;
    }
}

Let's actually read that a bit and see how it works: usize stores the number of "limbs" of an MPI, while limb_cnt stores the number of full limbs that need to be discarded as a result of a shift right by count bits. Those limbs are essentially machine words, groups of bits that the machine can work with directly. Their length in bits is that BITS_PER_MPI_LIMB, hence the integer division of count by it to find out just how many full limbs a shift by count bits implies. Once that is known, the code sets the remaining size (as number of limbs) of the result w and proceeds to copy over from u the bits that remain after the shift (the wp and up are simply pointers to the internal array of limbs of w and u respectively). However, at this point, the path splits by means of that if (count): the shift can further affect another limb by discarding (shifting right) only some of its bits and in this case there is in fact a shift-and-copy operation by means of calling mpihelp_rshift; if this is not the case and the original shift was made by a multiple of BITS_PER_MPI_LIMB, there is only a straightforward copy of limbs to make and that is supposedly achieved by our do-nothing-by-error code, called MPN_COPY_INCR.

Considering the above, it's hopefully no surprise to you when I say that mpi_tdiv_q_2exp fails of course, as soon as it makes use of the buggy code of MPN_COPY_INCR. And you'd think the error would at least be caught at this point if it wasn't caught earlier, wouldn't you? Basic testing of the most obvious kind, meaning both branches of an if, something everyone always does in one way or another, right? Bless your heart and sweet youth and all that but wake up already, as this is open sore we are talking about: no, it's not caught here either, of-course-not, it's just-not-enough-eyeballs-yet. Honestly, I suspect they meant "given enough eyeballs LOST as a result, any bug becomes shallow." We might even live to see this, you know?

Still, there is more! This defective mpi_tdiv_q_2exp is actually used by another method in the original GnuPG so surely the error is exposed there! It has to be, it's by now the third layer and presumably it's really not all that many layers that can be built on code-that-does-nothing + code-that-works-only-sometimes. Oh, but the Bazaar is all mighty, the Bazaar can2, unlike those pesky cathedrals that actually had any foundation. Let me illustrate this to you, with code taken straight from GnuPG 1.4.10, from the primegen.c file, from method is_prime that supposedly does probabilistic primality testing and such little things on which the security of your RSA key depends:

    q = mpi_copy( nminus1 );
    k = mpi_trailing_zeros( q );
    mpi_tdiv_q_2exp(q, q, k);

What's that, you ask? Why, nothing terrible, just a bit of code meant to calculate q as nminus1 / (2^k). Since a division by 2^k is simply a right shift by k bits, that's really the perfect job for mpi_tdiv_q_2exp, isn't it? So then why does this code do first a copy of nminus1 into q and only then a shift right by k bits of resulting q into... same variable q. Why exactly doesn't it do simply a shift right by k bits of nminus1 into q? Like this:

  k = mpi_trailing_zeros( nminus1 );
  mpi_tdiv_q_2exp( q, nminus1, k );

Two method calls instead of three, two lines instead of three, overall much clearer and easier to follow code anyway, since we are dividing indeed nminus1 by 2^k, not some "q" that is meant to be the result, really. So why do it in a complicated way when you can do it in a simple way? Except, you can't, you see, because you don't actually read the code you use and the code you use is broken but nobody actually bothered to check it. And when the error that is by now 3 layers deep manifests itself through unexpected results of the simple and straightforward code that should have been there, you don't spend like me the hours needed to track down the error and actually, finally, mercifully put it out of its misery correcting the code. Oh no, that would be wasted time, wouldn't it? Instead you are being productive and you find a workaround, simply papering the error over and dancing around it with an idiotic extra-copy that makes *this* code more difficult to follow and further *hides* the previous error, pushing it one layer deeper. Oh, now I plainly see the main advantage of open source: since you are not responsible in any way for the code you write, you can get away with such despicable behaviour, can't you?

Well, here is not open source. Here code gets actually read and errors get dissected, documented and corrected when found, not swept under the carpet. So, to correct this error, first thing written is a basic test that runs all the branches of that if in mpi_tdiv_q_2exp. Here it is, together with its helper print function, both to be found from now on in mpi/tests/test_mpi.c:

void print_results(MPI in, MPI out, char * title)
{
  fprintf(stdout, "******** %s ********", title);
  terpri(stdout);

  fprintf(stdout, "input : ");
  mpi_print(stdout, in, 1);
  terpri(stdout);

  fprintf(stdout, "output: ");
  mpi_print(stdout, out, 1);
  terpri(stdout);

  terpri(stdout);
  fflush(stdout);
}

/*
 * Test that will fail on original code and will pass after EuCrypt fix is applied.
 */
void test_rshift()
{
  MPI out, in, copy_in;
  out = mpi_alloc(0);
  in = mpi_alloc(0);
  copy_in = mpi_alloc(0);

  mpi_fromstr(out, "0x20E92FE28E1929");   /* some value */
  mpi_fromstr(in, "0x2000000010000001000000002");
  mpi_fromstr(copy_in, "0x2000000010000001000000002"); /* to make sure the actual input is print, since call can modify in */

  /* print value of BITS_PER_MPI_LIMB */
  fprintf(stdout, "BITS_PER_MPI_LIMB is %dn", BITS_PER_MPI_LIMB);

  /* shift by 0 */
  mpi_tdiv_q_2exp(out, in, 0);
  print_results(copy_in, out, "TEST: right shift by 0");

  /* shift by multiple of BITS_PER_MPI_LIMB */
  mpi_fromstr(in, "0x2000000010000001000000002");

  mpi_tdiv_q_2exp(out, in, BITS_PER_MPI_LIMB);
  print_results(copy_in, out, "TEST: right shift by BITS_PER_MPI_LIMB");

  /* shift by non-multiple of BITS_PER_MPI_LIMB */
  mpi_fromstr(in, "0x2000000010000001000000002");
  mpi_tdiv_q_2exp(out, in, BITS_PER_MPI_LIMB - 3);
  print_results(copy_in, out, "TEST: right shift by BITS_PER_MPI_LIMB - 3");

  mpi_free(copy_in);
  mpi_free(out);
  mpi_free(in);
}

Running this with the old code from mpi (so the code from chapter 2, earlier) will show just how mpi_tdiv_q_2exp "works" so that there is no doubt remaining about that. After this and before doing any change, the next step is to search in the code for *any other* users of either mpi_tdiv_q_2exp or the MPN_COPY_INCR itself, since those users for all I know might even rely by now on the wrong behaviour. Mercifully, this search returned empty and so I can proceed finally to the fix itself, which is of course very easy to do *after all this work* i.e. once it was found and tracked down and isolated to ensure that a change does not break down something else. And after this, the last but mandatory step is of course to run the previously written test again and check its output. The corrected code in mpi/include/mpi-internal.h:

#define MPN_COPY_INCR( d, s, n)
    do {
  mpi_size_t _i;
  for( _i = 0; _i < (n); _i++ )
      (d)[_i] = (s)[_i];
    } while (0)

After all this work that was caused not by the original tiny error itself, not by the wrong letter at the wrong place, but by the total and repeated failure to correct it afterwards, even 3 layers up, tell me again about all those eyeballs and about how productive it is to write a quick workaround instead of searching for the error and eliminate it at source. Then go and run a grep -r "workaround" . in any source directory of any big open source project, just for... fun let's say, what else.

To wrap this up, here is the vpatch with all the above changes, and my signature for it:

Hopefully no other similar trouble will surface until next week, so that I can finally move on to Chapter 3 that is all about the Miller-Rabin algorithm. Stay tuned!


  1. You know, by this point one might even be relieved to find that it does at least nothing. As opposed to doing something worse. 

  2. "Yes, we can", right?  

December 14, 2017

EuCrypt Chapter 2: A Source of Randomness

Filed under: EuCrypt — Diana Coman @ 11:53 p.m.

EuCrypt uses as source of randomness the Fuckgoats auditable TRNG (True Random Number Generator) from S.NSA (No Such lAbs). The choice here was made very easy by a basic combination of facts: on one hand, EuCrypt needs an actual, auditable source of randomness1 as opposed to anything else, pseudo-random generators included; on the other hand, the Fuckgoats (FG) device is the only currently available auditable TRNG. So problem solved and even rather narrowly solved at that. I'm quite grateful that for once there IS actually something matching the requirements and therefore I don't have to cobble it together myself from bits and pieces.

The above being said, you as the user of EuCrypt are *expected* to make your own decisions. Consequently, there really is nothing stopping you from using whatever you want as your own "source of randomness", be it actually random or pseudo-random or straight non-random or anything in between, why would EuCrypt care? Just change existing knobs in EuCrypt (see below) or directly replace the relevant methods (mainly get_random_octets, discussed below) with your own code and that's what will get used. For my own needs however, I'll use FG and moreover, one that I actually bought (even with some additional cost) and tested myself.

To keep this as clear as possible, let's start with the tiniest part, namely a single lonely knob for EuCrypt, giving the name of the entropy source to use and defined in knobs.h, as part of the smg_rsa component of EuCrypt (eucrypt/smg_rsa/include/knobs.h):

#ifndef SMG_RSA_KNOBS_H
#define SMG_RSA_KNOBS_H

#define ENTROPY_SOURCE "/dev/ttyUSB0"

#endif /*SMG_RSA_KNOBS_H*/

As it is quite clear from above, current version of EuCrypt assumes that an FG is connected simply to an USB port that can be accessed via /dev/ttyUSB0. If the device dev path on your machine is different, change the value of this knob accordingly. If your FG is connected to something other than an USB port (such as a serial port for instance), you'll need to change more than this knob here.

Now for the signatures of the functions that will provide access to the specified source of randomness, have a look at smg_rsa.h:

/* smg_rsa.h
 * S.MG, 2017
 */

#ifndef SMG_RSA_H
#define SMG_RSA_H

#include "mpi.h"
#include "knobs.h"

/*********truerandom.c*********/

/*
 * Opens and configures (as per FG requirements) the specified entropy source (e.g. "/dev/ttyUSB0")
 * @param source_name the name of the file to open (e.g. "/dev/ttyUSB0")
 * @return the descriptor of the open file when successful; negative value otherwise
 */
int open_entropy_source(char* source_name);

/*
 * Returns noctets random octets (i.e. 8*noctets bits in total) as obtained from EuCrypt's preferred source.
 * Preferred source is defined in knobs.h as ENTROPY_SOURCE and should be a TRNG (e.g. Fuckgoats).
 * @param nboctets the length of desired random sequence, in octets
 * @param out pointer to allocated memory space for the requested random noctets;
 * NB: this method does NOT allocate space!
 * @return the actual number of octets that were obtained from the currently configured entropy source
 * (this is equal to noctets on successful read of required noctets)
 */
int get_random_octets(int noctets, unsigned char *out);

/* Returns noctets random octets as obtained from the specified "from" source;
 * NB: the "from" source is considered to be the handle of an already opened stream;
 * This method will simply attempt to read from the source as needed!
 *
 * @param noctets the length of desired random sequence, in octets
 * @param out pointer to allocated memory space for the requested random octets;
 * NB: this method does NOT allocate space!
 * @param from handle of an already opened entropy source - this method will just READ from it as needed
 * @return the actual number of octets that were obtained
 */
int get_random_octets_from(int noctets, unsigned char *out, int from);

#endif /*SMG_RSA*/

The difference between the two functions that retrieve a specified number of octets is that one opens and closes the source itself (hence, every time it is called) while the second one simply reads the specified number of bits from an already opened source that is given as argument. Note that the function opening and closing the source itself uses the other one for the actual reading. The reason for providing both functions is simply the fact that opening/closing the source can easily be a significant overhead when reading only a few octets at a time.

To configure and open the source, the function used is open_entropy_source. This function provides a way to obtain a handler of the opened source that can then be passed repeatedly to the function that retrieves octets, as needed.

One important aspect to note above is that the two functions that retrieve random octets do NOT allocate memory for the output. They assume that the caller has allocated enough memory and provided a valid pointer. The reason for this is two-fold: first, as a design principle I prefer to keep allocation and de-allocation of memory in the same function as much as possible without passing responsibility around; second, for RSA purpose, the memory allocation is often done with specific MPI methods and using those (or indeed being aware of them) is outside the scope of this bit of code as it has nothing at all to do with getting random bits.

The actual implementations of the above functions are found in truerandom.c. UPDATED on 4 January 2018: this code has been changed, see Chapter 4 and corresponding .vpatch!

#include < stdio.h>
#include < stdlib.h>
#include < string.h>

#include < fcntl.h>
#include < unistd.h>
#include < termios.h>
#include < errno.h>

#include "smg_rsa.h"

int set_usb_attribs(int fd, int speed) {
	struct termios tty;
	if (tcgetattr(fd, &tty) < 0) {
		return -1;
	}

	//input and output speeds
	cfsetospeed(&tty, (speed_t)speed);
	cfsetispeed(&tty, (speed_t)speed);

	tty.c_cflag |= (CLOCAL | CREAD);	//ignore modem controls
	tty.c_cflag &= ~CSIZE;
	tty.c_cflag |= CS8;			//8 bit characters
	tty.c_cflag &= ~PARENB;	//no parity bit
	tty.c_cflag &= ~CSTOPB;	//only need 1 stop bit
	tty.c_cflag &= ~CRTSCTS;	//no hardware flow control

	//non-canonical mode
	tty.c_cflag &= ~(IGNBRK | BRKINT | PARMRK | ISTRIP | INLCR | IGNCR | ICRNL | IXON);
	tty.c_cflag &= ~(ECHO | ECHONL | ICANON | ISIG | IEXTEN);
	tty.c_cflag &= ~OPOST;

	//read at least one octet at a time; timeout 1 tenth of second between octets read
	tty.c_cc[VMIN] = 1;
	tty.c_cc[VTIME] = 1;

	if (tcsetattr(fd, TCSANOW, &tty) != 0)
		return -1;

	return 0;
}

int open_entropy_source(char* source_name) {
	int in, err;

	in = open(source_name, O_RDONLY | O_NOCTTY | O_NDELAY);
	if (in == -1) {
		printf("ERROR: failure to open entropy source %s: %sn", source_name, strerror(errno));
		return in;	//failed to access entropy source
	}

	fcntl(in, F_SETFL, 0);

	err = set_usb_attribs(in, B115200);
	if (err==-1) {
		printf("Error setting attributes on %s: %sn", source_name, strerror(errno));
		return err;
	}

	return in;	//source opened, return its descriptor
}

int get_random_octets_from(int noctets, unsigned char *out, int from) {

	int nread;
	int total = 0;

	while (total < noctets) {
           nread = read(from, out+total, noctets-total);
           //on interrupt received just try again
           if (nread == -1 && errno == EINTR)
             continue;
           //on error condition abort
           if (nread == -1 || nread == 0) {
             printf("Error reading from entropy source %s: %sn", ENTROPY_SOURCE, strerror(errno));
             return total; //total read so far
           }
           if (nread > 0)
		total = total + nread;
	}
	return total;	//return number of octets read
}

int get_random_octets(int noctets, unsigned char *out) {
	int in;
	int nread = 0;

	in = open_entropy_source(ENTROPY_SOURCE);
	if (in > 0) {
		nread = get_random_octets_from(noctets, out, in);
		close(in);
	}
	return nread;
}

As it can be seen above, a significant part of the code is simply for configuring the device. Most importantly, the configuration aims to turn OFF all flow control and to set the baud rate as required by FG. While this should work under most versions of Linux, be aware of the known pl2303 vs pl2303x issue with some connectors on older systems.

Note that an incorrectly configured device will simply block and since the functions above are written to always wait for the full number of bits required, they will *also* block in this case.

Finally, a basic test in tests/test.c:

#include "smg_rsa.h"

#include < stdlib.h>
#include < time.h>

void err(char *msg)
{
  fprintf(stderr, "%sn", msg);
  exit(1);
}

void time_entropy_source(int nruns, int noctets) {
	unsigned char buffer[noctets];
	int read, i;
	struct timespec tstart, tend;
	long int diff;

	clock_gettime(CLOCK_MONOTONIC, &tstart);
	for (i=0; i < nruns; i++) {
		read = get_random_octets(noctets,buffer);
		if (read != noctets)
			err("Failed reading from entropy source!");
	}
	clock_gettime(CLOCK_MONOTONIC, &tend);

	diff = tend.tv_sec-tstart.tv_sec;
	double kbps = (nruns*noctets) / (diff*1000.0);
	printf("ENTROPY source timing: %d kB in %ld seconds, at an average speed of %f kB/s over %d runs of %d octets eachn", nruns*noctets, diff, kbps, nruns, noctets);
}

int main(int ac, char **av)
{
	int nruns;

	if (ac<2) {
		printf("Usage: %s number_of_runsn", av[0]);
		return -1;
	}
	nruns = atoi(av[1]);

	printf("Timing entropy source...n");
	time_entropy_source(nruns,4096);

  return 0;
}

For testing, simply plug into an USB port your (previously audited, hopefully) FG, compile everything and then run the test with as many runs as you want. When it's done (so after a while, depending on how many runs you asked for), it should print on screen the speed at which it obtained the random bits from FG.

Following the sad realisation that I can't currently safely alter folder structure under V, I created a genesis patch for EuCrypt containing the intended structure (more like: writing in stone the intended structure) and it's on top of this that each chapter of EuCrypt adds new content by means of vpatches. Here you have everything you need so far:

Note that the patch in Chapter 1 is NOT needed anymore directly for EuCrypt (it still is valid though in itself and as a further snip on the standalone mpi so I will keep it where it is). The changes that patch makes are already included in the version of mpi that ch1_mpi.vpatch simply bring into EuCrypt.

In the next chapter, since we have already an MPI implementation as well as a way to access true randomness, we can get ever so closer to the actual RSA itself, so stay tuned!


  1. I'll likely expand on this as soon as I get to the actual implementation of RSA so in the next few chapters. 

December 7, 2017

Introducing EuCrypt

Filed under: EuCrypt — Diana Coman @ 6:09 p.m.

EuCrypt is a self-contained library that Eulora server will use for its communication needs. EuCrypt has the following 5 main components:

  1. smg-comm - the implementation of the basic client-server communication protocol. This makes use of all the other components, namely:
  2. smg-serpent - the symmetric cipher that is used by smg-comm for everyday message exchanges between Eulora clients and server.
  3. smg-keccak - the implementation of the keccak function and sponge construction, used by smg-comm mainly as part of the data padding scheme.
  4. smg-rsa - the implementation of the RSA encryption algorithm using a source of true randomness and the sane-mpi implementation of big number arithmetics:
  5. sane-mpi - the implementation of big number arithmetics, as extracted from GnuPG 1.4.10 by Stanislav Datskovskiy.

The above structure is only meant to give you a high-level, top-down idea of what EuCrypt is made of. However, to actually understand EuCrypt at all1, you'll need to read the code and discussion for each of the above components and so I will start from bottom-up with the code + discussion of sane-mpi and then proceed to add parts and pieces until we get the whole library. Note that some of the components above are bigger than others and therefore I will split the discussion of those in several installments, so expect overall more than 5 posts on this. Comments and questions are welcome at any time and on any of the parts, so don't be shy.

Chapter 1: sane-mpi

This component is implemented in C and offers support for storing and working with arbitrary large integers (multi precision integers or MPIs as we'll call them from now on). The code is rather messy but at the moment there isn't really anything better available and as it stands, this sane-mpi is at least readable now - mainly through the efforts of Stanislav Datskovskiy who extracted it from the big ball of mess that is GnuPG 1.4.10. I've made only a small further snip to his version, discarding a set of methods for accessing specific parts of an MPI. While such methods could conceivably be useful at some point, the point is that EuCrypt at the moment does *not* need them and moreover the existing implementation was so ugly as to need a re-write in any case. In short those parts failed to be either useful or at the very least sane, so they are no more.

There are lots of very useful comments all through the code of sane-mpi so the best option is really to actually read each file, at least once. There's no point in repeating those comments in here. I'll focus instead on a few aspects that I think are most relevant to EuCrypt and its purpose.

First issue is related to how and where MPIs are stored, because EuCrypt uses MPIs for storing private RSA keys. In principle, sane-mpi offers methods for allocating secure memory for any given MPI (see secmem.c and memory.c). However, a correct description of the service is that sane-mpi will attempt to allocate secure memory, with results depending on the machine and operating system you are running it on.

Note also that it pays to be mindful when performing operations with MPIs that are a mix of secure and insecure since undesired leaks may happen at intermediate steps. Sane-mpi attempts to avoid this by allocating additional secure memory space for intermediate results when one of the operands is in secure memory (see for instance mpi_mul method in mpi-mul.c). However, this (among others) means also that a simple arithmetical operation involving 2 MPIs will actually differ in execution depending not only on the values of the 2 numbers but also on where they are each stored. Moreover, this sort of side-effect memory allocation happens in fact in more than one place in sane-mpi.

Second issue refers to the way in which MPI operations are implemented in sane-mpi. Essentially following the implementation of one single arithmetical operation is not always very straightforward as it can easily lead one through several files (even when leaving aside the side-effect memory allocation issues and focusing instead only on the actual operations performed). From the point of view of fully and comfortably holding sane-mpi in head, this is rather unfortunate but it is what it is.

Since EuCrypt uses by necessity not only the multiplication and exponentiation but also direct bit setting of MPIs, I'll highlight mpi-bit.c as another file that should be studied in more detail. In particular, the method mpi_set_highbit can be confusing as it does a bit more than the name suggests: it sets the indicated bit but it ALSO clears all bits above the one indicated. Similarly, mpi_clear_highbit clears the indicated bit AND all bits above it. By contrast, mpi_set_bit and mpi_clear_bit are the more straightforward versions, simply setting or clearing, respectively, the indicated bit and nothing more.

To download, build and test sane-mpi as will be used in EuCrypt you will need:

If you need help with V, there are is a gentle introduction courtesy of Ben Vulpes.

In the next chapter I'll proceed to introduce parts of EuCrypt's own RSA that puts all this sane-mpi to some actual use. Stay tuned!

 

 


  1. Note that you will be effectively trusting this library with your data and money whenever you play Eulora so better understand it before playing the game. 

« Newer PostsOlder Posts »

Theme and content by Diana Coman