Results of Testing UDP - Take 1

October 4th, 2018 by Diana Coman

For one full week, between 26 September 2018 and 3 October 2018, my UDP Tester ran on 2 computers, one in the UK and one in Uruguay (UY), sending and receiving UDP messages in both directions. On each side, the receiver ran continuously, logging all UDP messages that it received during the whole interval. By contrast, the sender ran on both sides hourly but at different times so that the communications did not overlap. I don’t expect it would have been any trouble even if they did overlap but this was meant to be a test of UDP under best conditions and for this reason I set the times so that the sender on one end always finished a full run before the one on the other end started. For the same reason, the messages were sent at a rate of at most 1 message per second1.

At each run, the sender sent exactly 2043 UDP messages with lengths between 6 and 2048, each message having a different length. The order of messages was pseudo-random, relying on the Mersenne Twister prng using as seed the local time at the start of the run (in unix format). The sender kept a log of all messages it sent, including destination IP and port, seed used for MT, local time when message was sent and size of message. The receiver also logged basic information about each message: source IP and port, local time and message size as observed by receiver as well as those contained in the message’s own header, number of observed incorrect bits in the message’s payload as well as the expected and actual values of incorrect octets.

A first look at the week-long data yielded a bit of a surprise in that the UY receiver had actually received *more* messages than were sent from the UK! At a closer look, it turned out that 4933 UDP messages arriving at the UY node were actually sent by its own local switch! And moreover, they were all, without exception, recorded as corrupted since neither size nor payload matched the expected values2. At the moment those switch-generated messages are a bit of a mystery - it’s unclear what they are exactly or why and how they appeared. Working hypotheses would be that they are either local dhcp messages (although the port number would be a weird choice for those) or stray frags of bigger UDP messages. My one single attempt to replicate this behaviour while simultaneously capturing everything with tcpdump has so far failed - there were no such unexpected messages at all over several hours of UK sender at work. I might perhaps try again at a later date after I’m done with the more pressing tests that I need for SMG comms or simply process the existing error log and reconstitute the already observed weird messages from there. Anyway, for now I put those anomalous messages to the side and focus instead on the rest of the messages (which were sent as expected either by the node in the UK or by the one in UY). Here’s a summary of the data thus cleaned:

UK node UY node
Total sent: 3459283 3452674
Total received: 3447175 3451836
% received: 99.84%7 99.78%8
Errors received9: 0 0

Arguably the lost messages are of most interest in all the above: can one say perhaps that the largest messages10 get lost more often? Not really or at least not based on this little set of data. Compare the summary stats for three groups of messages: all messages sent from the UK (reflecting as expected the sizes sent and the fact that the same number of messages of each size are sent), all messages lost at UY (i.e. did not make it on the way from UK to UY) and all messages lost at UK (i.e. did not make it on the way from UY to UK):

Data Min 1st Q Median Mean 3rd Q Max
All sent from UK 6 516 1027 1027 1538 2048
Lost at UY (UK->UY) 13 513 1049 1051 1602 2045
Lost at UK (UY->UK) 16 553.5 1072 1061 1576 2047

While the data set of lost messages is quite small (550 messages lost at UK and 745 at UY), note that this is mainly due to the fact that there are relatively few losses overall: less than 0.4% of messages sent got lost on the way. So it would seem that at least under the conditions and on the routes considered11, UDP is not all that unreliable anyway. In any case, those summaries above seem to me remarkably close to one another - meaning that there isn’t any visible evidence that some sizes would get lost more than others, at least not for the set of sizes considered. Arguably sizes of up to 2048 octets of message are quite fine for communications over UDP - or at any rate, just as fine as smaller sizes.

In terms of order of received messages, the UY node received ALL messages precisely in the order in which they were sent but the UK node reported 66 messages in total that arrived out of order. Although this is a tiny number, it is perhaps reasonable to assume that it might increase in worse conditions (e.g. significantly less than 1 second between sending messages).

The actual timings are a bit iffier to investigate since the precision of UDP Tester turns out to be less than what would be needed for such task. Moreover, there is something weird going on with the way I recorded the time because the difference between the two nodes should be of ~34 seconds (UY node local time = UK node local time + 34) but this doesn't quite square with all the data especially at the UY receiver end12. On the more positive side though, at least the measurement bias there is constant for all the data and it doesn't introduce any weird effects so I can still attempt to infer something considering also that observed behaviour suggests that most UDP messages really make it to the other end within 1 second. Consequently, I calculated the delta on both sides as TR - TS at first and then I added (on UK side) respectively subtracted (on UY side) the quantity needed to make the lowest delta 0. So at the UY receiver, delta = TR - TS - 11 while at the UK receiver, delta = TR - TS + 32. With this correction, the summary stats for the delta on both sides turn out to be remarkably similar:

Data Min 1st Q Median Mean 3rd Q Max
Deltas at UK node: 0 5 11 10.60 16 21
Deltas at UY node: 0 5 11 10.62 16 21

Note that I do *not* recommend taking the above delta values for anything really, as the tester's precision in recording time is just not enough for this.

You are of course warmly invited to run your own tests and to play with this dataset in any way you find fit. So here's the data from both nodes, including the additional 4933 messages that the UY node received from its own switch:

  • (~10MB)
  • SHA512SUM: 963b8a1467630eea35532122ab7c2d25cb8741001808841f7cf02b34abb6ad5300adcb1d667dd902b4278dd2b373dc46427b0b0bbc918ee52f326456535a4114

Have fun!

  1. Specifically: the sender had a delay of 1 second between any two consecutive messages. 

  2. The UDP tester simply fills the message up to any length with values calculated as Pos mod 256 where Pos is the position of the respective octet in the full message. 

  3. This is precisely 345928/2043=169 runs. 

  4. This includes a partial 170th run since I stopped the whole test while the UY sender was running already its 170th run. 

  5. 550 messages lost in total. 

  6. 745 messages lost in total. 

  7. 344717 / 345267 * 100 

  8. 345183 / 345928 * 100 

  9. This refers to messages received but with payloads that don’t match the expected values. 

  10. Note that this test capped the messages at 2048 so “largest” here means strictly < 2049 octets. 

  11. The UK node is a “consumer” node i.e. behind a router and on a residential connection; the UY node is S.MG’s test server with Pizarro. 

  12. Considering TR as TimeReceived and TS as TimeSent, at UY receiver the delta should be calculated as TR - (TS + 34) = TR - TS - 34; however, there are entries with TR-TS as low as 11 so basically it would seem that messages arrived before they were even sent. 

Comments feed: RSS 2.0

8 Responses to “Results of Testing UDP - Take 1”

  1. PeterL says:

    "less than 0.4% of messages sent got lost on the way." This should be less than 0.2% of messages. Although, that is also less than 0.4%, so you are technically correct.

  2. Diana Coman says:

    Yeah, overall the observed figure is less than 0.2% strictly speaking but on UY side it is actually slightly more than that so I basically rounded it up - not that it makes much difference since it's not a number you can take as a fixed value anyway (i.e. I wouldn't expect the *same* exact number on *all* repeat tests).

  3. BT says:

    (Dropping comment here, as commenting is disabled on the Reference Code Shelf):

    There is a problem with patch eucrypt_keccak_bitrate_fix.vpatch after it was converted to keccak: this patch intends to update file eucrypt/smg_keccak/smg_keccak.adb with hash 88e40423c88ba2ac7d44225b388794d61719746b02412e2dae4684bcfa72399978d599f9301b4a2be101b41769d3c5b20de6ff94e76a01ff767edc00746b8b96, but files with this hash are not present in any other patch.

    The list of files available to vtron:

    Also, there is a typo in the filename eucrypt_ch10_oeap_tmsr.vpatch (oaep in signature, oeap in vpatch).

  4. Diana Coman says:

    I do see that hash though, namely in ch10 patch:

    And from your list of files available to vtron it would seem that the ch10 vpatch IS present so not sure what is going on there. Perhaps it's either the old vpatch? Or maybe some ordering issue?

    I've checked now again both the hashes with a grep (it shows the hash in the 2 .vpatch files so it seems fine) and then by pressing with vtools - it presses fine here. I'm using now an adapted that calls vpatch from vtools and that is fine too so I'm a bit at a loss. Would you check you got the correct ch10 .vpatch and that it does get pressed *before* the bitrate_fix .vpatch?

  5. BT says:

    That was my mistake, sorry for the noise. I messed up the URL, file eucrypt_ch10_oaep_tmsr.vpatch was actually missing.

  6. Diana Coman says:

    No worries, glad it's sorted!

    And welcome to #trilema btw.

  7. [...] the first test revealed that most UDP packages make it safely at least when sent 1 second apart, I decided that [...]

  8. [...] delay there - although it could end up then either firing too many too quickly resulting in more losses than usual or otherwise becoming a bottleneck for the whole application on top of it. At any rate, the exact [...]

Leave a Reply