Ossa Sepia

December 14, 2021

The Half-Life of UDP Sockets or REUSEADDR

Filed under: Coding,Open Sores — Diana Coman @ 1:33 pm


“After reading deep in the bowels of code…”
“Bowels of code sounds like a pretty awesome haunted house btw”

To start off, here’s how sockets are meant to be conceived of, in the first place, straight from the original source, namely BSD docs 1:

BSD sockets are built on the basic UNIX® model: Everything is a file.

So far so good and indeed, the BSD manual pages for the methods to obtain a socket in the first place fit well with this definition. For instance, to create a new socket, one uses the “socket” function that requires the family (e.g. AF_INET for IPv4), type (e.g. PF_INET) and protocol (e.g. SOCK_DGRAM for UDP sockets) and has the following description:

socket() returns a socket file descriptor (sockFD) which is a small non-negative integer. This file descriptor number should be used for all other socket operations on that socket. If socket() encounters an error, it will return -1, in which case the application should call GetMITLibError() to get the error code.

By contrast to the above though, the Linux manual page 2 for socket comes with a different 3 – improved? you tell me – description of the socket() function that takes the domain (aka the “family” in BSD terms, e.g. AF_INET for IPv4), type (e.g. SOCK_DGRAM) and protocol (e.g. IPPROTO_UDP):

socket() creates an endpoint for communication and returns a descriptor.

Since the Linux page discards most of the specific, concrete information provided by the BSD page and introduces instead a general and undefined “endpoint for communication” term, let’s see what that might be. Perhaps the docs of the well-used Wireshark tool serve as a reasonable source, so here’s their definition of what a network endpoint is specifically when the UDP protocol is used since UDP is what I’m interested in anyway:

A combination of the IP address and the UDP port used, so different UDP ports on the same IP address are different UDP endpoints.

So an endpoint is a pair (IP address, port) and this makes some sense, since this is after all how the packets are delivered, from one (IP address, port) to another. But it’s dubious how would a socket *create* an endpoint as the Linux manual page claims, since the socket function does not require either IP address or port at all. Moreover, the “everything is a file” model seems in no need of “improvements” like this since it works perfectly fine: a socket is defined by a (family, type, protocol) set and it returns a file descriptor to be used for all further operations. There is, of course, the step *after* creating a socket, when the socket is *bound to an endpoint*, meaning that indeed, a pair (IP address, port) is given but it doesn’t bode well when the documentation itself seems rather confused between two steps that are not only separate but literally requiring two different function calls entirely.

Obviously, you can wave away this sort of “improvements” in manual entries, after all it’s just “a small thing” and “unimportant” and “don’t split hairs” over nothing etc. Absolutely, it’s all nice and rosy and look how wonderfully it plays out in practice, step by step:

  1. First, if a socket “creates an endpoint,” it follows quite logically that there can always be at most one single socket for any specific (IP, port) pair. This is already a bit of an odd interpretation of that “everything is a file” approach since it adds to it the equivalent of “there can always be at most one single file descriptor at a time for any specific file” but sure, not a problem yet in itself, since it’s a well thought design and model so everything will work out splendidly, definitely.
  2. Second, when a socket is closed, it follows quite logically that it will release that (IP, port) pair that will thus become available again to be bound to a new socket. Except there’s a snag in practice at this point, since that release is not done immediately and moreover, it can *not* be forced to happen immediately either, no matter what one sets. The manual page makes that much clear on this, stating happily that a socket may “linger” and you can set a timer if that makes you happy but you shouldn’t expect it to make a difference where it matters – because it won’t:

    SO_LINGER
    Sets or gets the SO_LINGER option. The argument is a linger structure.
    struct linger {
    int l_onoff; /* linger active */
    int l_linger; /* how many seconds to linger for */
    };

    When enabled, a close(2) or shutdown(2) will not return until all queued messages for the socket have been successfully sent or the linger timeout has been reached. Otherwise, the call returns immediately and the closing is done in the background. When the socket is closed as part of exit(2), it always lingers in the background.

  3. Third, a lingering socket is effectively a zombie, being as it is quite stinky and very much in the way of the living (as it can make a perfectly legitimate bind call fail simply because of some lingering earlier socket that is entirely outside the control of the caller anyway). To deal with the stinky situation of the socket-zombies, there isn’t apparently any suspicion that such stink needs the rotting thing to be a bit deeper than the very surface and so there’s only the most superficial of fixes: adding a new socket option, SO_REUSEADDR, to simply… relax some of the rules 4:

    SO_REUSEADDR
    Indicates that the rules used in validating addresses supplied in a bind(2) call should allow reuse of local addresses. For AF_INET sockets this means that a socket may bind, except when there is an active listening socket bound to the address. When the listening socket is bound to INADDR_ANY with a specific port then it is not possible to bind to this port for any local address. Argument is an integer boolean flag.

  4. Fourth, when enabling SO_REUSEADDR for UDP sockets, one has indeed achieved full immunity to zombie-sockets interference, gaining at the same time (if on the quiet, as such gains are usually provided) a maimed, half-life for all sockets and otherwise a guaranteed half-way interference from all the other half-alive sockets bound to the same ip,port combination! Such win. What happens in practice is that one can indeed bind one single (IP, port) pair to any number of UDP sockets 5 and everything will… seem to be fine. Specifically, the sending of packets will even *be* fine indeed, as any of the sockets sharing the same (IP, port) will be able to send just fine. For the receiving part though, things go wrong and in just about the worst way possible: it’s not as much that only one socket will receive (this would even be expected for any packet that is unicast, indeed) but the fact that only the *latest opened* socket can see any incoming packets, meaning that the whole “bind” suddenly works only one-way (sending) while also having potential side-effects (since a new bind effectively stops any previously bound socket from receiving any further packets, despite being supposedly alive). Moreover, even the closing of a socket will then have side effects since the “can receive” token will be passed on to whatever other socket the kernel picks out of its hashtable when looking for the given (IP, port).
  5. Fifth, from the bowels of the Linux kernel code, the rot causing the stink above is quite easily identified: the kernel confuses indeed “socket” with “endpoint” at all times and as a result it keeps queues of incoming/outgoing messages per socket, not per (IP, port), effectively assuming thus that there can be at most times at most one socket bound to any given (IP, port) but failing to enforce this restrictive 6 view when SO_REUSEADDR is set.
  6. To replicate the above half-alive UDP sockets from point 4, the simplest of C code will be enough: create several sockets with SO_REUSEADDR and bind them all to the same (IP,port), send through any number of them and see which socket can actually receive anything at all.

    Instead of an ending, I know that “you are not supposed to do that” and even that “nobody uses UDP anymore” if it comes to it. The rot is still there though and the Linux kernel stills provides half-dead UDP sockets when SO_REUSEADDR is set. So if you don’t see any problem with this… might I suggest you add this description as such to the manual page at least?

    1. All sockets are essentially BSD sockets, since all other operating systems merely copied the BSD socket implementation at one point in time or another and then went on to stick on it all sorts of “original developments” since they knew -by default!- better than to follow at least what they copied in the first place, of course. While not unique to operating systems or computing, this approach doesn’t seem to yield better results in this context than it does in any other – if anything, the rather… inflexible nature of computers makes the results worse perhaps. But this is “just how things are” and anyway, “what else is there to do”, right? If right, then… enjoy![]
    2. The manual pages I looked at claim to be “release 3.22 of the Linux man-pages project”.[]
    3. While still noting at the bottom that “socket() appeared in 4.2BSD.”[]
    4. Because yes, that’s *just* the way to “solve” problems, by adding essentially an exception to the existing rules and not being bothered at all about addressing the root cause that is clearly deeper if the rules themselves – aka the *model* of the whole thing* – turn out to be broken. Why take the time to get to the bottom of the issue and fix the damned model if and when it turns out to be broken? Who has the time for that and then “what else could be done”, right?[]
    5. As UDP is connectionless, there isn’t really any problem with this in itself.[]
    6. It is a restrictive view, because it limits that stated “everything is a file” model to single-access.[]

December 3, 2021

They’ll Take Away the Oxen from My Bike

Filed under: Word Therapy — Diana Coman @ 1:54 pm

I grew up during the final gasps of a scientifically created (and therefore better!) way of life 1 that had started by forcibly and purposefully destroying the previous, organically grown way of life that was identified as the most significant obstacle. Obstacle to what, exactly? Why, to progress, obviously! Well, to the *new* progress freshly redefined at that time, of course.

That initial, destructive part had succeeded quite well and even quickly, as it was already done and dusted -if still present in people’s minds and lives 2- by the time my parents were adults. A little later, by the time I was growing up, very few people still talked or had anything to say of either that destruction or the initial enthusiasm that had lead to it grounded as it was in a genuine conviction that the old way was an obstacle and the new way is better. I could glimpse at times the enthusiasm and the conviction in old notes found in abandoned notebooks from the young years of people that were about as old as my grandparents or even older than them, that’s about it. And I think that the silence on it wasn’t as much because there were indeed few people remaining able to speak clearly of such matters but mainly because those remaining, whether few or many, could already see all too well how misplaced that old conviction and its underlying hope had been. What was there to say further of it, after it had been all done and in such a final manner, too, what was there more *for them* to say about it other than nothing at all?

After the destruction phase, the replacement part vigorously proceeded and then stubbornly persisted, following scientifically made plans at all stages (and there were many plans, possibly even plans to have plans and plans on how to produce the required plans, one can’t ever be faulted for planning their work, can they?) and against all obstacles. The bigger the problems were, the stronger the push to go further down the same path for there was after all no way to turn back anyway. So problems were met as they tend to be – with more of the same approach, more scientific planning and more scientific evidence of how well it all works, since it’s quite clear that more of all that is needed to finally push through. It was all going in the right direction and only getting better with every step, doesn’t it sound familiar?

Despite the will and conviction of people though, reality somehow persisted too in growing ever further away from the well-planned heights of achievement, prosperity and overall improvement. And in this sort of widening gap, humour of a certain absurdist bend thrived for sure but it hadn’t sprung out of nowhere and it seemed to me so familiar at the time that it was only years later when I noticed how far back its roots really were. That focused destruction of everything standing tall or strong enough to make a visible target had left nevertheless here and there the more frail and less visible fruits of previous experience gained while painfully going through previous hopeful destructions turning into failed reconstructions and their aftermath, in other words experience gained on that ever revolving wheel of “change” and “progress” – a circular path that one can perhaps hope manages to deviate to some degree before it reaches its end, so as to miss the original starting point by enough to make the trajectory into some sort of spiral at least.

Of these more frail and less visible fruits of others’ experience, there were often some very concise sayings coming to mind, mostly due to their unexpectedly apt description of so-called “new” reality. At the time I thought these sayings to be simply part of the wider language’s “folk wisdom” or such, but in time it turned out that quite a few of them were indeed very specific to my own immediate surroundings, to the extent that it’s quite possible I suppose that they were the production of one of the villages of my great-grandparents or who knows, even directly of one of my great-grandparents themselves, not that I’ll ever have a real clue of it. At any rate, since most of these don’t seem to be part of any other attempted listing of sayings and I find myself thus still surprising others in conversations (Romanians included), I’ll try to collect here those that come currently to mind, together with my attempted translation to English, to have at least the explanation at hand, when needed:

  1. Ori bati capu’ ori bati curu’.

    It’s either your head or your ass that will hurt 3.

  2. Daca vrei ceva facut, da-l unuia ocupat.

    If you want something done, give it to someone busy.

  3. Are cap – sa nu-i ploua-n gat.

    He has a head on his shoulders – to keep his neck dry in the rain.

  4. Capul lui mai fuse la un cur de baba.

    His head has been of use before, to an old hag’s ass.

  5. Lenesu’ la toate zice ca “nu poate!”

    The bum’s reply to everything is “I can’t do it!”

  6. S-a dus si el ca sa fie drumul cu lume.

    He went too, to keep the roads busy.

  7. S-a repezit ca sageata si-a cazut ca balega.

    He rushed like the arrow and flopped like the cow pie.

  8. Vine de parca se duce.

    He’s coming as if he was leaving.

  9. A plecat bou si s-a intors magar.

    He left being an ox and came back an ass. 4

  10. ‘nalt ca bradu’, prost ca gardu’.

    As tall as the pine, as stupid as the fence.

  11. Ti-or lua boii de la bicicleta.

    They’ll take away the oxen from your bike.

  12. Schimbarea domnilor, bucuria nebunilor.

    The change of rulers – the happiness of fools.

  1. What, you thought all those Communist regimes that fell in the ’90s in Eastern Europe started somehow as anything *other* than “doing what is best for everyone” and following “a scientific approach” to clearly and obviously -look at what the calculations say!- improve the lives of all? They even were specifically addressing the “problem” of some things being “too expensive” for some people and therefore in great need of being cheapened no matter what. It’s really a wonder how far the world got in all these passing years![]
  2. The story of my piano teacher touches only superficially and very briefly on only one of many such stories from that time. I suppose I should write more of the ones I know, since the people involved are already gone and it turns out quite frequently that I recall more of the stories they had to tell than is otherwise recorded directly anywhere.[]
  3. This doesn’t work nearly as well in English. The Romanian version takes advantage of the ambiguous use of “a bate” (to beat), basically making the point that either you put in a lot of hard thinking (hence you make your head hurt with it) or you’ll put up with getting it in your ass.[]
  4. This is clearly a variation on the much more commonly known “a plecat bou si s-a intors vaca” meaning “he left being an ox and came back a cow.” Depending on one’s point of view, perhaps it’s better to have become a cow (ready to be milked by others) than an ass.[]

Work on what matters, so you matter too.