Ossa Sepia

February 28, 2019

ZCX vs SJLJ - Data Set

Filed under: Coding — Diana Coman @ 5:37 p.m.

This all started with a misbehaving Ada program: the one that keeps "killing" its tasks with blind bullets that do precisely and exactly nothing. The horror of having code that refuses to actually abort its own spawned tasks is amply discussed in the logs linked above but I think it's worth pointing out the thick layers of horrid at play: it's not only (as if that wasn't enough by itself) that it won't abort its tasks but also that it does this on the quiet as it were - the abort code is effectively hijacked to do nothing for as long as one uses the default ZCX1. Now why exactly is ZCX the default given that it's not providing what the Ada standard says it should? And what is anyway its rationale2 for chopping away bits of the standard? No idea really, the docs are silent on this: it's apparently enough bother that they admit at least that indeed "the ZCX run-time does not support asynchronous abort of tasks(abort and select-then-abort constructs)" and otherwise throw in there an unsubstantiated statement that "most programs should experience a substantial improvement by being compiled with a ZCX run-time." Why would they bother with rationale or with justifying decisions - nobody ever asks for or indeed reads such things anyway as they are way too complex, aren't they?

Once the culprit was identified as ZCX, a quick test3 showed that the older SJLJ4 is indeed perfectly capable of killing spawned tasks. And so the investigation moved on to the possible cost of using SJLJ vs ZCX. Supposedly ZCX is faster, SJLJ is slower but suppositions are one thing and hard data quite another. To figure out if that's indeed the case, one needs simply to design an experiment for the exact sort of thing investigated and then set to work implementing and running it. Given the docs' claim that SJLJ comes with a signficant penalty for any exception handlers present, the experiment included also runs with and without exception handlers to try and gauge if those add indeed signficantly to the running time. Specifically, there were two main programs used (linked below if you want to reproduce this experiment):
1. Nested procedure calls (with and without exception handlers): there are three procedures A, B, C that increment each a global counter and pseudo-randomly5 call one or both of the other two procedures for as long as the counter is below a certain value. Note that you'll likely need to increase the size of the stack for this experiment since otherwise it will quickly overflow.
2. Nested for loops with Serpent encryption (NO exception handlers): there are between 1 and 22 for loops counting either from 1 to 10 or from 1 to 100 and going to the next level on a subset of values (those divisible by 2, by 3 or by 4 - those are essentially knobs to adjust for different runs); the inner-most loop performs a Serpent encryption.

I ran this experiment on two different computers:

  • S.MG Test server: this is an x86_64 AMD Opteron(tm) Processor 6376 (8 cores, 2.3GHz).
  • A torture-room desktop code-named D1: this is an Intel i5, 4GB RAM (2 cores, 3.2GHz).

The S.MG Test server runs 3 different incarnations of Ave1's GNAT: the ZCX variant which is obtained as a straight compilation using Ave1's scripts from the 15th of July 2018; the SJLJ-compilation of the same (i.e. supporting both dynamic and static linking); the SJLJ compilation of the latest, static-only version from September 2018 .

The D1 machine runs Adacore's GNAT 2016 and simply switches between ZCX and SJLJ via the --RTS parameter given to gprbuild.

The full set of data obtained is this:

S.MG Test server
Ave1's GNAT, build 301 (dynamic linking) and build 202 (static only) for SJLJ, older Ave1's GNAT for ZCX

Procedure Calls testing  with MT as PRNG;
Max = 16777216; 3 procs, mod 3 calls both, one or the other of the 2;
NO serpent calls; X final gives the total number of calls;
Seeding of MT: 16#111#, 16#222#, 16#333#, 16#444#  -> X = 22368144.
NB: as MT is used, X final is repeatable between runs (same seed).

Handlers | ZCX (s)  | SJLJ (s) (build 301) | SJLJ (s) (build 202 - static only)
_______________________________________________________________________________
0        | 1.119    | 1.211                | 1.165
0        | 1.174    | 1.210                | 1.161
1        | 1.299    | 4.106                | 3.850
1        | 1.302    | 3.777                | 3.755
2        | 1.599    | 3.929                | 3.301
3        | 1.862    | 4.125                | 4.193

 Nested Loops testing 
loops 1 to 100; if mod 2
Loops (runs)    | ZCX	(s)    |	Serpent Timing             | SJLJ (s) (b301) | SJLJ (s) (b202 - static ONLY)
_____________________________________________________________________________________________________________________
1 (a)  (1k runs)| 0.000168893  | 0.000168893 / (50^1) = 3.377e-6   |   0.000168199   | 0.000168727
2 (b)  (1k runs)| 0.007213758  | 0.007213758 / (50^2) = 2.8855e-6  |   0.007055130   | 0.007084592
3 (c)  (1k runs)| 0.351611073  | 0.351611073 / (50^3) = 2.81e-6    |   0.352684722   | 0.351471743
4 (d)  (1 run)  | 17.740324000 | 17.740324000 / (50^4) = 2.83e-6   |  17.580107000   | 17.73699000
5 (e)  (1 run)  | 879.95117100 | 879.951171000 / (50^5) = 2.81e-6  | 881.139986000   | 879.7342540

loops 1 to 10; if mod 4
Loops (runs=1)  | ZCX (s)      | Time per Serpent (ZCX)            | SJLJ (s) (b301) | SJLJ (s) (b202 - static ONLY)
_____________________________________________________________________________________________________________________
1               | 0.000008     | 0.000008 / (2^1) = 4e-6           |  0.000010       |  0.000009
2               | 0.000017     | 0.000017 / (2^2) = 4.25e-6        |  0.000016       |  0.000017
3               | 0.000031     | 0.000031 / (2^3) = 3.875e-6       |  0.000030       |  0.000024
4               | 0.000046     | 0.000046 / (2^4) = 2.875e-6       |  0.000057       |  0.000056
5               | 0.000150     | 0.000150 / (2^5) = 4.6875e-6      |  0.000110       |  0.000089
10              | 0.027650     | 0.027650 / (2^10)= 2.7e-5         |  0.002937       |  0.002915
22              | 11.98272     | 11.98272 / (2^22)= 2.8569e-6      | 12.045471       | 12.011244

loops 1 to 10; if mod 3
Loops (runs=1)  | ZCX (s)      | Time per Serpent (ZCX)            | SJLJ (s) (b301) | SJLJ (s) (b202 - static ONLY)
_____________________________________________________________________________________________________________________
1               | 0.000012000  | 0.000012000 / (3^1) = 4e-6        |   0.000013000   |   0.000010000
2               | 0.000034000  | 0.000034000 / (3^2) = 3.778e-6    |   0.000027000   |   0.000033000
3               | 0.000076000  | 0.000076000 / (3^3) = 2.815e-6    |   0.000111000   |   0.000075000
4               | 0.000219000  | 0.000219000 / (3^4) = 2.704e-6    |   0.000221000   |   0.000220000
5               | 0.000654000  | 0.000654000 / (3^5) = 2.691e-6    |   0.000823000   |   0.000673000
10              | 0.167347000  | 0.167347000 / (3^10)= 2.834e-6    |   0.167906000   |   0.167250000
15              | 40.69105200  | 40.69105200 / (3^15)= 2.836e-6    |  40.743956000   |  40.716526000
16              | 121.9171600  | 121.9171600 / (3^16)= 2.832e-6    | 123.760958000   | 121.816518000

D1 computer 
Adacore's GNAT 2016, switching with --RTS between ZCX and SJLJ

Procedure Calls testing  with MT as PRNG;
Max = 16777216; 3 procs, mod 3 calls both, one or the other of the 2;
NO serpent calls; X final gives the total number of calls;
Seeding of MT: 16#111#, 16#222#, 16#333#, 16#444#  -> X = 22368144.
NB: as MT is used, X final is repeatable between runs (same seed).

Handlers | ZCX (s)  | SJLJ (s)
______________________________
0        | 0.896    | 0.882
0        | 0.905    | 0.906
1        | 1.058    | 184.516
1        | 1.064    | 265.339
2        | 1.215    | 372.574
3        | 1.329    | 446.821

 Nested Loops testing 
loops 1 to 100; if mod 2
Loops           | ZCX (s)      | Time per Serpent                  | SJLJ (s)
___________________________________________________________________________________
1 (a)  (1k runs)|   0.000095269|  0.000095269 / (50^1) = 1.905e-6  |   0.000098855
2 (b)  (1k runs)|   0.004609351|  0.004609351 / (50^2) = 1.844e-6  |   0.004598492
3 (c)  (1k runs)|   0.231582664|  0.231582664 / (50^3) = 1.853e-6  |   0.230719036
4 (d)  (1 run)  |  11.548194   | 11.548194 / (50^4)    = 1.848e-6  |  11.579138
5 (e)  (1 run)  | 580.261706   |580.261706 / (50^5)    = 1.857e-6  | 586.996228

loops 1 to 10; if mod 4
Loops (runs=1)  | ZCX (s)      | Time per Serpent                  | SJLJ (s)
_______________________________________________________________________________
1               | 0.000009     | 0.000009 / (2^1) = 4.5e-6         | 0.000013
2               | 0.000023     | 0.000023 / (2^2) = 5.75e-6        | 0.000023
3               | 0.000036     | 0.000036 / (2^3) = 4.5e-6         | 0.000043
4               | 0.000083     | 0.000083 / (2^4) = 5.18e-6        | 0.000100
5               | 0.000167     | 0.000167 / (2^5) = 5.218e-6       | 0.000167
10              | 0.005537     | 0.005537 / (2^10)= 5.4e-6         | 0.007399
22              | 8.007915     | 8.007915 / (2^22)= 1.909e-6       | 7.944913
22              | 7.971711     | 7.971711 / (2^22)= 1.9006e-6      | 7.966864         

loops 1 to 10; if mod 3
Loops (runs=1)  | ZCX (s)      | Time per Serpent                  | SJLJ (s)
_______________________________________________________________________________
1               |  0.000008    |  0.000008 / (3^1)  = 2.6e-6       |  0.000009
2               |  0.000017    |  0.000017 / (3^2)  = 1.8e-6       |  0.000019
3               |  0.000142    |  0.000142 / (3^3)  = 1.75e-6      |  0.000143
4               |  0.000427    |  0.000427 / (3^4)  = 5.27e-6      |  0.000150
5               |  0.001267    |  0.001267 / (3^5)  = 5.21e-6      |  0.001259
10              |  0.107943    |  0.107943 / (3^10) = 1.82e-6      |  0.113296
15              | 27.163182    | 27.163182 / (3^15) = 1.89e-6      | 27.167316
16              | 81.724382    | 81.724382 / (3^16) = 1.89e-6      | 81.951616        

Based on the data above, my conclusion so far is that there is rather precious little to justify the use of ZCX: when no exception handlers are present in the code, the running times are similar under ZCX and SJLJ; when exception handlers are present in the code, there is indeed a penalty for using SJLJ but this penalty is not all that big on some irons. This being said, there is little reason that I can see for having lots of exception handlers in any sane code anyway so I'm really not all that concerned about this honest cost of SJLJ especially when compared with the "you can't kill your own tasks" cost of the so-called ZCX.

Separate from the above but still relevant, there are still quite a few issues remaining with SJLJ including the fact that it's apparently broken on ARM. But every pain in its own time and so for now simply look at the data above and let me know: do you see any real reason why one would *not* simply go with SJLJ?


  1. "Zero-Cost Exceptions" being the full name of this great idea of reducing costs by not doing the work - meaning here by simply choosing to not implement some parts of the Ada standard, namely the asynchronous abort.  

  2. Or its excuse at the very least since rationale might be too much to ask for under those circumstances. 

  3. You can try this yourself: the code is in test_tasks_ada.zip, download, read it, run it and see if the tasks actually abort or not. Try it also with SJLJ (gprbuild --RTS=sjlj) and spot the difference. 

  4. "Setjmp / longjmp" by its name, the older exception handling model that actually implements the Ada standard so it allows asynchronous abort. 

  5. Using the Mersenne-Twister pseudo-random generator

2 Comments »

  1. [...] republican standard. Still, as part of investigating those unwanted surprises and especially the actual differences between ZCX and SJLJ, I gained a lot more experience with compiling GNAT itself in all sorts of configurations and on [...]

    Pingback by GNAT Compilation Notes « Ossasepia — March 4, 2019 @ 4:56 p.m.

  2. [...] Miserable failures and unwanted problems make great blog fodder1 and so here I am, writing yet another set of compilation notes. This time it's all about how to ditch the non-standard ZCX run-time for its well-behaved and otherwise similarly fast ancestor, SJLJ. [...]

    Pingback by Compiling Crystal Space with SJLJ « Ossasepia — March 11, 2019 @ 12:43 p.m.

RSS feed for comments on this post. TrackBack URL

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <pre> <q cite=""> <strike> <strong>

Theme and content by Diana Coman