Over the past few days I've taken the time to finally go through all the currently published chapters of FFA (1-19 with some bis and a,b,c on the way too) and got as a result a clearer idea of the whole thing as it builds up from basic definitions of large integers to the Peh interpreter machine with its own registers, two stacks (data and control, respectively) and the Cutouts mechanism for controlling access to sensitive data. While initially I was planning to add to my previous notes on FFA Ch1-Ch4, my experience of getting back to FFA after a significant break pointed to the need for a higher-level overview of what-where-why even before looking for such detailed illustrations of the code involved. So here's my current overview of FFA - open to any criticism you are able to throw at it, as always - consisting first of a brief list of main approaches in FFA and then the 4 "layers" that I see to the whole thing so far.
Main FFA Approaches and Algorithms
- Strictly non-branching on input values and therefore practically the death of "if". No ifs, no buts, so what's left is MUXes^{1}: essentially the "if" goes down one level, from a logical switch (which operation to perform) to a mechanical one (which buffer to copy the result from); the same operation is always performed but its result is not always used.
- Strictly constant-time execution for any given inputs size. This means in practice strictly worst-time, of course, as it's always the case when you require all to be the same. The only way for all to be equal is by making them all equal to the worst, no way around it. Then again, some people's worst turn out better than other people's best so take "worst" with a pinch of context, as always.
- Comba multiplication algorithm: calculating sequentially the *columns* of a product; additional optimisations for the squaring case (mainly avoiding calculating the same products twice).
- Karatsuba multiplication algorithm: calculating the product XY as a combination of cheaper operations (additions and shifts preferred by far to multiplications) on the half-words of X and Y; additional optimisations for the squaring case.
- Barrett's modular reduction: calculating X mod M by means of an approximation using the "Barrettoid", B, that can be precomputed from any given module; X mod M is approximated first as X - M*B and then adjusted by at most 2 subtractions of M (i.e. the error of X - M*B as approximation of X mod M has only three possible values: 0, M or 2M).
- Stein's binary GCD: rather quasi-Stein as it's adapted to FFA requirements, most notably the constant-time execution requirement (by iterating over the full number of bits rather than stopping when the smallest number is 0).
- Miller-Rabin compositeness test: an adapted version of MR, forcing any given witness into the acceptable range and providing a "one-shot" test for any given value N and witness W.
- Peh interpreter machine: basically the "cryptographic machine" that the user gets to push the buttons of; Peh has a data stack and a control stack, a set of registers and an instruction set including the ability to define and call subroutines and loops as well as to define Cutouts aka a protected area for sensitive data.
FFA Layers (from bottom up)
- Representation of arbitrarily large integers as fixed-size (i.e. specified for every run) arrays of machine-words (with iron-specific size of machine-words changeable via knobs in the code). In FFA terminology, such integers are called "FZ".
- Basic arithmetical, modular arithmetical and logical operations on FZ entities in constant time and handling any overflows/underflows. This part builds up from the most "naive" approach (aka most straightforward if rather costly) dubbed "egyptological" through successive refinements (optimisations of the original algorithm followed by radical changes of approach) until the final result uses Barrett's reduction for modular operations, Karatsuba with Comba core-case (i.e. below empirically-established thresholds) for multiplications and a variation of those (Karatsuba + Comba + specific threshold) specifically optimised for squarings.
- Primality-related algorithms working in constant time on FZ entities: Stein's binary GCD^{2} and an adapted MR^{3} that can adequately handle any value of a given witness (by forcing it effectively within the valid range).
- Peh, the finite-tape, dual-stack and cutouts-enabled interpreter. While I still need to turn this inside out a few times more, I can tell already that I want a Peh-iron!
And while there are still a lot of experiments and timings and playings to go through with all the FFA, I'm happy at this point to sign the remaining chapters as I went through them with pen in hand and as far as I can tell they do what it says on the tin (not to mention that I certainly would use FFA and in fact I'm itching to drop already the shitty gpg, ffs!):
- ffa_ch12_karatsuba_redux.kv.vpatch
- ffa_w_borrow_expr.kv.vpatch
- ffa_ch13_measure_and_qshifts.kv.vpatch
- ffa_ch14_barrett.kv.vpatch
- ffa_ch15_gcd.kv.vpatch
- ffa_ch16_miller_rabin.kv.vpatch
- ffa_ch17_peh.kv.vpatch
- ffa_ch18_subroutines.kv.vpatch
- ffa_ch19_peh_tuning_and_demos.kv.vpatch
Dear Diana,
This is an accurate summary. Would add to (1) in "Approaches & Algorithms", however, that the Mux is not only to avoid branching on secrets, but also memory indexing on same (on irons having a cache of any form, indexing memory by a secret reliably leaks the secret.)
Yours,
-S
Comment by Stanislav Datskovskiy — July 29, 2019 @ 1:51 p.m.
Good point, thank you!
Comment by Diana Coman — July 29, 2019 @ 2:52 p.m.