~ This is part of the EuCrypt series. Start with Introducing EuCrypt. ~

2018 starts well for EuCrypt as I finally get to put to some use all the building blocks of years past and then simply expand the library further. The aim of this chapter is to provide a random prime number generator (rpng) that actually fits the name. As usual, the code is commented and choices made are discussed in more detail here. Following the sane TMSR design principle of "fits in head", the algorithm is as simple as possible: keep getting random odd numbers of the required length until a prime one is chanced upon. The signature of the procedure doing this can be found in eucrypt/smg_rsa/include/smg_rsa.h:

/** * Generates a random number that has passed the Miller-Rabin test for primality (see function is_composite above). * NB: top 2 bits and bottom bit are ALWAYS 1! (i.e. a mask 110....01 is applied to the random bits) * a prime of 8*noctets long will have only (8*noctets-3) bits that are randomly chosen! * NB: this method does NOT allocate space for the requested MPI; it is the caller's responsibility to allocate it! * The source of randomness is ENTROPY_SOURCE in eucrypt/smg_rsa/include/knobs.h * The number of witnesses checked by Miller-Rabin is M_R_ITERATIONS in eucrypt/smg_rsa/include/knobs.h * Preconditions: * noctets > 0 (at least one octet!) * output has known allocated memory for at least nlimbs(noctets) * successful access to the entropy source * @param noctets the length of the desired prime number, in octets * @param output the result: an MPI with sufficient memory allocated for a number that is noctets long */ void gen_random_prime( unsigned int noctets, MPI output);

Before even going to the implementation, note here two design choices:

- The length of the desired random prime number is expected by gen_random_prime in *octets* (bytes if you must but the point is that this is 8 bits and nothing else) and not in bits! The conversion between bits and octets is of course straightforward but the choice is made here to use octets because this reflects more clearly the underlying reality: while individual bits can certainly be altered or otherwise worked with, the data type in use at all further levels down (unsigned char) is still an octet long rather than a bit long.Obviously, nothing stops you, the user, from making a different choice and changing this to use number of bits if that fits your needs better but do make sure you follow through all the code that is used and make any other changes that might be required to get exactly what you want.
- The result is returned via the second parameter to gen_random_prime, an MPI called output. Moreover, gen_random_prime does *not* allocate memory for this MPI! It is instead the caller's duty to make sure that the MPI they provide as argument when calling gen_random_prime has indeed enough memory allocated for the desired length. This makes memory allocation and de-allocation neatly happen in the same place (i.e. the caller) rather than splitting them needlessly and messily across function calls. While there is no way around the fact that a pointer has to be passed on here from a piece of code to another, there really is no need to split the responsibility of memory allocation and de-allocation: the one who allocates memory should de-allocate it as well and that is the caller here.Note that gen_random_prime will do however as much as it can to ensure that it doesn't proceed working with unallocated memory. Concretely, gen_random_prime checks as a precondition the known allocated memory for the output MPI and if the check fails, the execution is aborted, *not* papered over with some "fix" where it doesn't belong. Failure as the best sort of feedback that can't be ignored is really the most appropriate response to a mess-up so that's what gen_random_prime aims for. Don't rely on it to do more than it says. Just do your own share of work and call it properly with correctly allocated MPI and it will then gladly do its job, simple as that.

While gen_random_prime is by design flexible to accommodate any number of octets you might wish to give as length of your desired prime MPI, TMSR RSA specification is rather strict on this: RSA keys are meant to be exactly 4096 bits (512 octets) long and made of 2 distinct 2048 bits (256 octets) long primes. Consequently, the RSA key length is defined in EuCrypt as a *constant* and *not* as a knob. You'll find it therefore in the same file as the above, eucrypt/smg_rsa/include/smg_rsa.h:

/* * These are constants as per TMSR RSA specification, NOT knobs! * TMSR key length is 4096 bits (512 octets); this means 2 primes of 2048 bits (256 octets) each. * NB: if you choose here an odd key length in octets you might end up with a smaller actual key, read the code. */ static const int KEY_LENGTH_OCTETS = 512;

The actual implementation of gen_random_prime is in eucrypt/smg_rsa/primegen.c and it is quite straightforward:

void gen_random_prime( unsigned int noctets, MPI output ) { /* precondition: at least one octet long */ assert(noctets > 0); /* precondition: enough memory allocated for the limbs corresponding to noctets */ unsigned int nlimbs = mpi_nlimb_hint_from_nbytes(noctets); assert(mpi_get_alloced(output) >= nlimbs); /* precondition: access to the entropy source */ int entropy_source = open_entropy_source(ENTROPY_SOURCE); /* source of random bits */ assert(entropy_source >= 0); unsigned int nbits = 8*noctets; /* length of MPI in bits */ /* * loop until a prime is found: get noctets of random bits, trim and apply 110...01 mask, check if prime */ unsigned char *p = xmalloc( noctets ); do { get_random_octets_from( noctets, p, entropy_source ); mpi_set_buffer( output, p, noctets, 0); /* convert to MPI representation */ mpi_set_highbit( output, nbits - 1 ); /* trim at required size and set top bit */ mpi_set_bit( output, nbits - 2); /* set second top bit */ mpi_set_bit( output, 0 ); /* set bottom bit to ensure odd number */ } while (is_composite(output, M_R_ITERATIONS, entropy_source)); /* tidy up, a prime was found */ xfree(p); close(entropy_source); }

As expected, the first things gen_random_prime does is to check its clearly stated preconditions: a minimal length of 1 octet for the requested number (there is no such thing as a number represented on 0 octets, that's nonsense!); enough allocated memory at least as far as one can tell from within this code; accessible entropy source. Note that the mpi_get_alloced function is a new addition to mpi, since there was as far as I could tell no other function to expose this simple information: how much memory is allocated for this here MPI? Sure, there *was* direct access to this information via an MPI's internal structure, meaning calling something of the sort mpi->alloced. However, using this directly is messy because it forces the calling code to *know* the internal structure of an MPI, which is a bad idea and if you can't give at least 2 reasons why then you have no business writing any code yet, go and read, re-read and especially ^{1}. The new mpi_get_alloced function has its signature in eucrypt/mpi/include/mpi.h, which is the header included by any user of the mpi library:

int mpi_get_alloced (MPI a); /* returns the allocated memory space for this MPI, in number of limbs */

The implementation of mpi_get_alloced is a tiny bit of code in eucrypt/mpi/mpiutil.c where it belongs together with the other utility functions for MPIs:

/* * Returns the allocated space for the given MPI, as number of limbs. */ int mpi_get_alloced (MPI a) { return a->alloced; }

After all three preconditions of gen_random_prime have passed the checks, the next step is simply to loop through random numbers of the required size until a prime one is found. Note that each random number will have only 8*noctets-3 actually random bits. This is because the top 2 bits and the bottom bit are all fixed at 1. Since the goal is to find a large prime number, the bottom bit has to be 1 for basic mathematical reasons (as otherwise the number is even, hence divisible by 2). The top 2 bits are set at 1 however for length-related reasons: both those bits need to be 1 to ensure that in all cases the resulting RSA key is indeed of the desired 4096 bits length (although of those 4096 bits the top and bottom ones will always be 1).

Note that EuCrypt's random prime number generator relies on two main things: a truly random source of bits (the excellent Fuckgoats from S.NSA) and the Miller-Rabin algorithm previously implemented and discussed in Chapter 3 of this series. This is in rather stark contrast with the GnuPG implementation that takes up a lot more space to do essentially the following: use a pseudo-random source of bits, try first all small primes between 3 and 4999^{2}, then do a Fermat test, then do the Miller-Rabin test with a fixed base 2 and only then do the Miller-Rabin test with 4 more pseudo-randomly chosen bases. Does that seem "more secure" to you or merely confusing? Because no, piling up more checks without concrete and clear reason is not a way to achieve security, quite on the contrary. And EuCrypt's approach that seems "simple" by contrast to GnuPG's is not an accident nor the result of laziness, quite on the contrary: it takes a lot of work behind the scenes to end up with only what is truly useful for the task at hand. Shocking, right? Read on.

Since GnuPG's code certainly does not discuss this mess of "choices" in any way, the only thing left to do is to guess at some reasons: supposedly all the initial dancing about with small primes and Fermat would be a fast(-er, -ish?) way of discarding composites before reaching the more expensive Miller-Rabin test. However, there's hardly any clear support for such reasoning^{3}, especially for a prime number generator that is specifically aiming to find *large* and *randomly generated* prime numbers: sure, 2 is a witness for many odd composite numbers (and presumably this is the reason for the fixed 2 witness attempt in GnuPG) but it is known also that there are *infinitely* many pseudoprimes to base 2! Moreover, there are also *infinitely* many n for which the smallest witness is greater than ln(n)^(1/(3*ln(ln(ln(n))))). And finally, Miller-Rabin, as discussed previously has very good chances of finding a witness if it really looks for them randomly, seeing how three quarters of the numbers between 1 and n-1 are reliable witnesses anyway. So if your candidate primes are really both large and randomly chosen then there is little point in introducing some pre-Miller-Rabin overhead that is anyway NOT going to work in all cases. Better increase the number of Miller-Rabin iterations and gain something useful^{4} instead of wasting time to try first less effective approaches just for the sake of the fact that in *some* cases they might be enough.

To see the new rpng in action, I wrote a few tests as well. And in this process I ended up simplifying the configuration of the entropy source because the previous code turned out to be unnecessarily long and rather not as fit-in-head as it could be. So the new set_usb_attribs function in eucrypt/smg_rsa/truerandom.c is this:

int set_usb_attribs(int fd, int speed) { struct termios tty; if (tcgetattr(fd, &tty) < 0) { return -1; } /* input and output speeds */ cfsetospeed(&tty, (speed_t)speed); cfsetispeed(&tty, (speed_t)speed); /* raw */ tty.c_lflag &= ~(ECHO | ECHOE | ECHOK); tty.c_oflag &= ~OPOST; /* read at least one octet at a time; BLOCK until at least VMIN octets read */ tty.c_cc[VMIN] = 1; tty.c_cc[VTIME] = 0; if (tcsetattr(fd, TCSAFLUSH, &tty) != 0) return -1; return 0; }

As with all changes, this got at least a bit of testing too. Testing here meant using the new version of the code to get 390MB of random data on which I ran ent and dieharder. Ent reported:

Entropy = 7.999999 bits per byte. Optimum compression would reduce the size of this 390672000 byte file by 0 percent. Chi square distribution for 390672000 samples is 277.91, and randomly would exceed this value 25.00 percent of the times. Arithmetic mean value of data bytes is 127.5026 (127.5 = random). Monte Carlo value for Pi is 3.141530471 (error 0.00 percent). Serial correlation coefficient is -0.000012 (totally uncorrelated = 0.0).

Dieharder reported 3 "FAILED" tests, 5 "WEAK" tests and 106 "PASSED" tests. You can download and look at the full dieharder report and ent report for this data. Note however that you should run at least a similar test (preferably on more data, preferably several times, preferably periodically and on all your Fuckgoats) before even thinking of relying on your setup as a source of random bits.

And if the above was not enough, I also ended up writing quite a bit in... eucrypt/smg_rsa/tests/tests.c. Fancy that, I wrote more lines of code for the tests than for the actual code! Anyway, here are the new tests and the new structure of the main function in eucrypt/smg_rsa_tests.tests.c:

void time_mr(int nruns) { struct timespec tstart, tend; long int diff; int i; MPI prime; unsigned int noctets = KEY_LENGTH_OCTETS / 2; unsigned int nlimbs = mpi_nlimb_hint_from_nbytes(noctets); int entropy_source = open_entropy_source(ENTROPY_SOURCE); if (entropy_source <= 0) err("can't open entropy source!"); /* first generate a prime of half key length, to make sure M-R will run max number of iterations */ printf("Generating a prime number of %d octets length for M-R timing testn", noctets); prime = mpi_alloc(nlimbs); gen_random_prime(noctets, prime); printf("Running timing test for Miller-Rabin with %d repetitions and %d witnesses on prime number ", nruns, M_R_ITERATIONS); mpi_print(stdout, prime, 1); printf("n"); /* now do the actual runs and time it all */ clock_gettime(CLOCK_MONOTONIC, &tstart); for (i=0; i < nruns; i++) { if (is_composite(prime, M_R_ITERATIONS, entropy_source)) printf("FAIL"); else printf("."); fflush(stdout); } clock_gettime(CLOCK_MONOTONIC, &tend); diff = tend.tv_sec-tstart.tv_sec; printf("nTimings on prime number %d octets long, %d runs of MR with %d iterations (witnesses checked) eachn", noctets, nruns, M_R_ITERATIONS); printf("Total time: %ld secondsnTime per MR run: %f secondsnTime per MR iteration: %f secondsn", diff, diff / (1.0*nruns), diff / (1.0*nruns * M_R_ITERATIONS)); mpi_free(prime); close(entropy_source); } void test_rpng(int nruns) { unsigned int noctets = KEY_LENGTH_OCTETS / 2; unsigned int nlimbs = mpi_nlimb_hint_from_nbytes(noctets); int entropy_source = open_entropy_source(ENTROPY_SOURCE); if (entropy_source <= 0) err("can't open entropy source!"); MPI prime = mpi_alloc(nlimbs); int i; printf("TEST: random prime number generator with %d runsn", nruns); for (i = 0;i < nruns; i++) { gen_random_prime(noctets, prime); printf("Run %d: ", i+1); mpi_print(stdout, prime, 1); if (is_composite(prime, M_R_ITERATIONS, entropy_source)) printf(" **FAIL**n"); else printf(" **PASS**n"); } mpi_free(prime); close(entropy_source); } void time_rpng(int nruns) { struct timespec tstart, tend; long int diff; unsigned int noctets = KEY_LENGTH_OCTETS / 2; unsigned int nlimbs = mpi_nlimb_hint_from_nbytes(noctets); int entropy_source = open_entropy_source(ENTROPY_SOURCE); if (entropy_source <= 0) err("can't open entropy source!"); MPI prime = mpi_alloc(nlimbs); int i; printf("TIMING: random prime number generator with %d runsn", nruns); clock_gettime(CLOCK_MONOTONIC, &tstart); for (i = 0;i < nruns; i++) { gen_random_prime(noctets, prime); } clock_gettime(CLOCK_MONOTONIC, &tend); diff = tend.tv_sec-tstart.tv_sec; printf("TOTAL: %ld secondsn", diff); printf("Average: %f seconds to generate one random prime of %d octets lengthn", diff / (1.0*nruns), noctets); mpi_free(prime); close(entropy_source); } int main(int ac, char **av) { int nruns; int id; if (ac<2) { printf("Usage: %s number_of_runs/octets [testID]n", av[0]); return -1; } nruns = atoi(av[1]); if (ac < 3) id = -1; else id = atoi(av[2]); switch ( id ) { case 0: printf("Timing entropy source...n"); time_entropy_source(nruns, 4096); break; case 1: test_entropy_output(nruns, "entropy_source_output.txt"); break; case 2: /* tests on miller-rabin */ /* a few primes (decimal): 65537, 116447, 411949103, 20943302231 */ test_is_composite(nruns, "0x10001", 0); test_is_composite(nruns, "0x1C6DF", 0); test_is_composite(nruns, "0x188DD82F", 0); test_is_composite(nruns, "0x4E0516E57", 0); /* a few mersenne primes (decimal): 2^13 - 1 = 8191, 2^17 - 1 = 131071, 2^31 - 1 = 2147483647 */ test_is_composite(nruns, "0x1FFF", 0); test_is_composite(nruns, "0x1FFFF", 0); test_is_composite(nruns, "0x7FFFFFFF", 0); /* a few carmichael numbers, in decimal: 561, 60977817398996785 */ test_is_composite(nruns, "0x231", 1); test_is_composite(nruns, "0xD8A300793EEF31", 1); /* an even number */ test_is_composite(nruns, "0x15A9E672864B1E", 1); /* a phuctor-found non-prime public exponent: 170141183460469231731687303715884105731 */ test_is_composite(nruns, "0x80000000000000000000000000000003", 1); break; case 3: time_mr(nruns); break; case 4: test_rpng(nruns); break; case 5: time_rpng(nruns); break; default: printf("Current test ids:n"); printf("0 for timing entropy sourcen"); printf("1 for entropy output testn"); printf("2 for is_composite (Miller-Rabin) testn"); printf("3 for timing Miller-Rabinn"); printf("4 for random prime number generator testn"); printf("5 for timing random prime number generatorn"); } return 0; }

There are 3 new tests in there: one for timing Miller-Rabin (id 3), one for the functioning of the rpng itself (id 4) and one for timing the rpng (id 5). The timing test for Miller-Rabin first generates (with the rpng described in this post) a prime number of 256 octets since this is the length that is most relevant to EuCrypt. Then a timer is started and the Miller-Rabin test is ran repeatedly on this known-prime for the requested number of times. At the end, the total time as well as the average time per M-R run and per M-R iteration are reported. As the number is known to be prime, M-R will always run all its 16 iterations and therefore the test can actually calculate the average time per iteration. On 1000 runs, this test reported an average of 9.78 seconds per M-R run, which corresponds to 0.61 seconds per iteration of M-R.

The test of rpng requests gen_random_prime to generate a random prime number of 256 octets and then it runs another time the Miller-Rabin test on it reporting fail or pass depending on the result. This is obviously a very basic test of gen_random_prime - it's meant at this stage more as an example of use rather than anything else. Run it though and see for yourself or modify it/add to it as you see fit, as usual.

The timing test for the rpng starts a timer and then calls gen_random_prime as many times as requested, reporting at the end the total time as well as the resulting average time per prime. A relatively short test run obtained 40 random primes of 2048 bits each in 13274 seconds in total (3.7 hours) meaning on average 331.85 seconds per prime (~6 minutes).

Finally, the .vpatch itself and its corresponding signature (you'll need all the previous patches from the EuCrypt series to be able to press to this one):

On a slightly different note for the end: this "simple" loop rpng of a chapter ended up close to 3000 words in total here on the blog and I didn't even include all the references considered; chapter 3 just before this one was 3500 words; correcting MPI before that was a mere 2000 words; chapter 2 was 1500 words; the introduction to EuCrypt was close-to-but-not-quite 1000 words, as befits an introduction. At this rate I'm clearly writing here some brick of a book and most of it is certainly NOT code, but all the discussion around it. Shock and horror and all that. See you at the next chapter!

The Art of Computer Programming by Donald Knuth is an absolute must. ↩

WHY 4999? Magic number! ↩

To be fair, there probably was not much reasoning really, just successive additions of code on top of existing code, always adding, rarely even considering how to prune the resulting mess. They writing the code had no time to write short and "simple" code, so they wrote it long and messy. But they worked hard and had all the good intentions (those can be as long as you like really, they don't cost anything), you know? ↩

as more M-R iterations lower at least the maximum probability of error, since p(error)< (1/4)^iterations for M-R. ↩

Should be ensure.

"read, re-read

and understandTAOCP" ? or what, add K&R in there ? steele's 2002 thing maybe too ?technically it's incorrect to say "6 bits will always be 1" ; it's the case that the first and the last bit of N are always 1 ; and a further number of bits are subtly affected (ie, biased) by the p, q masks.

Smaller lower bound, isn't it ?

As in, real prime number generator ? Nice lol.

Specifically : moved to raw modem controls ; to /* comment style ; removed 0.1s delay that was there I dunno why.

"Should be ensure. " - indeed, will correct it.

“read, re-read and understand TAOCP” - Knuth's book is a must, yes and for this reason alone I'll add it. But I highly suspect that one who fails to get the issue there is likely to have to go even further back on various basics, not even all of them programming-related even so basically I can't say generically where they need to start. That's really why I did not give a list there.

"technically it’s incorrect to say “6 bits will always be 1” ;" - Right. I had in mind the 2 primes, hence 6 bits that contribute to N rather than 6 bits in N itself. I'll re-phrase that more clearly there, thank you.

"Smaller lower bound, isn’t it ?" - It gets confusing, doesn't it? The probability error for M-R has an upper bound that is (1/4)^nwitnesses. If you increase nwitnesses, this upper bound of the probability gets smaller. All you can ever do is lower the maximum probability of error. So it is correct as it is but a bit of an odd-shape for head, will re-phrase.

"As in, real prime number generator ? Nice lol." - Works. Was "random" in my mind, as written in first para but double entendre is even better, as always (rng is random number generator, rpng is random prime number generator, trng is fine, trpng seems a bit too much).

Ah, re entropy source configuration changes: concrete specifics are really best seen in the patch itself in this case.

By the looks of it and as per log discussion there is at some point a "80 cols format applied on everything" patch looming too.

That's going to be painful in terms of noise, but I guess less painful earlier than later.

Noise pretty much the same at any time, it's not as if the amount of re-formatting increases: new code will have to be 80 cols now since that's the standard. So sort of lowish priority, but it still needs to get done and the sooner it's done the longer the whole code is at least consistent.

Merely wanna state that this is very beneficial , Thanks for taking your time to write this.

I wish I could say the same regarding your comment.

The lines in primegen.c (http://btcbase.org/patches/eucrypt_ch3_miller_rabin#selection-2009.2-1372.220)

/* ensure that a < n-1 by making a maximum nbits-1 long:

* clear all bits above nbits-2 in a

* keep value of bit nbits-2 in a as it was

*/

if (mpi_test_bit(a, nbits - 2))

mpi_set_highbit(a, nbits - 2);

else

mpi_clear_highbit(a, nbits - 2);

Seem to be equivalent to:

mpi_clear_highbit(a, nbits - 1);

Also, the whole sizing / resizing in mpi_clear_highbit / mpi_set_highbit is suspicious. It seems to be implemented to allow for clear/set highbits in limbs before the last limp (and also after the last limb in case of 'mpi_set_highbit'). But, the limbs that are are not cleared. So if you do a clear (or set) on bit 63 of a 128 bit MPI number (i.e. limb 0 of the 2 limbs MPI). The MPI number will now report to have 1 limb and 2 allocated limbs. Then doing a set_highbit on bit 120 of the MPI number will resuse the second limb (which is still filled with whatever that was there). For example:

byte buf[16];

int i = 0;

for(i = 0; i < 16; i ++) {

buf[i] = 0xff;

}

printf("mpi_clear_highbit / mpi_set_highbit test\n");

MPI t = mpi_alloc( 2 );

mpi_set_buffer( t, buf, 16, 0);

mpi_print( stdout, t, 1);

mpi_clear_highbit(t, 62);

`printf("\n");`

mpi_print( stdout, t, 1);

printf("\n");

mpi_set_highbit(t, 120);

mpi_print( stdout, t, 1);

This can be fixed by clearing the resused limbs in mpi_set_highbit.

But, the limbs that are are not cleared.

-->

But, the limbs that are

removedare not cleared.Yes, the highbit/lowbit behaviour is an ugly mess; I briefly touched on it previously: http://www.dianacoman.com/2017/12/07/introducing-eucrypt/#selection-133.195-133.456 and in more detail http://www.dianacoman.com/2017/12/28/eucrypt-chapter-3-miller-rabin-implementation/#selection-169.0-169.1019 .

The change you suggest seems reasonable to me but you'd need to fully comb through all of MPI to make sure that there is no other code that relies on the current idiotic behaviour of those two functions - for this sort of reason I honestly avoided modifying MPI unless I really, really had to (i.e. it was broken in some part I'm using).

[…] functions are found in truerandom.c. UPDATED on 4 January 2018: this code has been changed, see Chapter 4 and corresponding […]