The Half-Life of UDP Sockets or REUSEADDR

December 14th, 2021 by Diana Coman

"After reading deep in the bowels of code..."
"Bowels of code sounds like a pretty awesome haunted house btw"

To start off, here's how sockets are meant to be conceived of, in the first place, straight from the original source, namely BSD docs1:

BSD sockets are built on the basic UNIX® model: Everything is a file.

So far so good and indeed, the BSD manual pages for the methods to obtain a socket in the first place fit well with this definition. For instance, to create a new socket, one uses the "socket" function that requires the family (e.g. AF_INET for IPv4), type (e.g. PF_INET) and protocol (e.g. SOCK_DGRAM for UDP sockets) and has the following description:

socket() returns a socket file descriptor (sockFD) which is a small non-negative integer. This file descriptor number should be used for all other socket operations on that socket. If socket() encounters an error, it will return -1, in which case the application should call GetMITLibError() to get the error code.

By contrast to the above though, the Linux manual page2 for socket comes with a different3 - improved? you tell me - description of the socket() function that takes the domain (aka the "family" in BSD terms, e.g. AF_INET for IPv4), type (e.g. SOCK_DGRAM) and protocol (e.g. IPPROTO_UDP):

socket() creates an endpoint for communication and returns a descriptor.

Since the Linux page discards most of the specific, concrete information provided by the BSD page and introduces instead a general and undefined "endpoint for communication" term, let's see what that might be. Perhaps the docs of the well-used Wireshark tool serve as a reasonable source, so here's their definition of what a network endpoint is specifically when the UDP protocol is used since UDP is what I'm interested in anyway:

A combination of the IP address and the UDP port used, so different UDP ports on the same IP address are different UDP endpoints.

So an endpoint is a pair (IP address, port) and this makes some sense, since this is after all how the packets are delivered, from one (IP address, port) to another. But it's dubious how would a socket *create* an endpoint as the Linux manual page claims, since the socket function does not require either IP address or port at all. Moreover, the "everything is a file" model seems in no need of "improvements" like this since it works perfectly fine: a socket is defined by a (family, type, protocol) set and it returns a file descriptor to be used for all further operations. There is, of course, the step *after* creating a socket, when the socket is *bound to an endpoint*, meaning that indeed, a pair (IP address, port) is given but it doesn't bode well when the documentation itself seems rather confused between two steps that are not only separate but literally requiring two different function calls entirely.

Obviously, you can wave away this sort of "improvements" in manual entries, after all it's just "a small thing" and "unimportant" and "don't split hairs" over nothing etc. Absolutely, it's all nice and rosy and look how wonderfully it plays out in practice, step by step:

  1. First, if a socket "creates an endpoint," it follows quite logically that there can always be at most one single socket for any specific (IP, port) pair. This is already a bit of an odd interpretation of that "everything is a file" approach since it adds to it the equivalent of "there can always be at most one single file descriptor at a time for any specific file" but sure, not a problem yet in itself, since it's a well thought design and model so everything will work out splendidly, definitely.
  2. Second, when a socket is closed, it follows quite logically that it will release that (IP, port) pair that will thus become available again to be bound to a new socket. Except there's a snag in practice at this point, since that release is not done immediately and moreover, it can *not* be forced to happen immediately either, no matter what one sets. The manual page makes that much clear on this, stating happily that a socket may "linger" and you can set a timer if that makes you happy but you shouldn't expect it to make a difference where it matters - because it won't:

    Sets or gets the SO_LINGER option. The argument is a linger structure.
    struct linger {
    int l_onoff; /* linger active */
    int l_linger; /* how many seconds to linger for */

    When enabled, a close(2) or shutdown(2) will not return until all queued messages for the socket have been successfully sent or the linger timeout has been reached. Otherwise, the call returns immediately and the closing is done in the background. When the socket is closed as part of exit(2), it always lingers in the background.

  3. Third, a lingering socket is effectively a zombie, being as it is quite stinky and very much in the way of the living (as it can make a perfectly legitimate bind call fail simply because of some lingering earlier socket that is entirely outside the control of the caller anyway). To deal with the stinky situation of the socket-zombies, there isn't apparently any suspicion that such stink needs the rotting thing to be a bit deeper than the very surface and so there's only the most superficial of fixes: adding a new socket option, SO_REUSEADDR, to simply... relax some of the rules4:

    Indicates that the rules used in validating addresses supplied in a bind(2) call should allow reuse of local addresses. For AF_INET sockets this means that a socket may bind, except when there is an active listening socket bound to the address. When the listening socket is bound to INADDR_ANY with a specific port then it is not possible to bind to this port for any local address. Argument is an integer boolean flag.

  4. Fourth, when enabling SO_REUSEADDR for UDP sockets, one has indeed achieved full immunity to zombie-sockets interference, gaining at the same time (if on the quiet, as such gains are usually provided) a maimed, half-life for all sockets and otherwise a guaranteed half-way interference from all the other half-alive sockets bound to the same ip,port combination! Such win. What happens in practice is that one can indeed bind one single (IP, port) pair to any number of UDP sockets5 and everything will... seem to be fine. Specifically, the sending of packets will even *be* fine indeed, as any of the sockets sharing the same (IP, port) will be able to send just fine. For the receiving part though, things go wrong and in just about the worst way possible: it's not as much that only one socket will receive (this would even be expected for any packet that is unicast, indeed) but the fact that only the *latest opened* socket can see any incoming packets, meaning that the whole "bind" suddenly works only one-way (sending) while also having potential side-effects (since a new bind effectively stops any previously bound socket from receiving any further packets, despite being supposedly alive). Moreover, even the closing of a socket will then have side effects since the "can receive" token will be passed on to whatever other socket the kernel picks out of its hashtable when looking for the given (IP, port).
  5. Fifth, from the bowels of the Linux kernel code, the rot causing the stink above is quite easily identified: the kernel confuses indeed "socket" with "endpoint" at all times and as a result it keeps queues of incoming/outgoing messages per socket, not per (IP, port), effectively assuming thus that there can be at most times at most one socket bound to any given (IP, port) but failing to enforce this restrictive6 view when SO_REUSEADDR is set.
  6. To replicate the above half-alive UDP sockets from point 4, the simplest of C code will be enough: create several sockets with SO_REUSEADDR and bind them all to the same (IP,port), send through any number of them and see which socket can actually receive anything at all.

    Instead of an ending, I know that "you are not supposed to do that" and even that "nobody uses UDP anymore" if it comes to it. The rot is still there though and the Linux kernel stills provides half-dead UDP sockets when SO_REUSEADDR is set. So if you don't see any problem with this... might I suggest you add this description as such to the manual page at least?

    1. All sockets are essentially BSD sockets, since all other operating systems merely copied the BSD socket implementation at one point in time or another and then went on to stick on it all sorts of "original developments" since they knew -by default!- better than to follow at least what they copied in the first place, of course. While not unique to operating systems or computing, this approach doesn't seem to yield better results in this context than it does in any other - if anything, the rather... inflexible nature of computers makes the results worse perhaps. But this is "just how things are" and anyway, "what else is there to do", right? If right, then... enjoy! 

    2. The manual pages I looked at claim to be "release 3.22 of the Linux man-pages project". 

    3. While still noting at the bottom that "socket() appeared in 4.2BSD." 

    4. Because yes, that's *just* the way to "solve" problems, by adding essentially an exception to the existing rules and not being bothered at all about addressing the root cause that is clearly deeper if the rules themselves - aka the *model* of the whole thing* - turn out to be broken. Why take the time to get to the bottom of the issue and fix the damned model if and when it turns out to be broken? Who has the time for that and then "what else could be done", right? 

    5. As UDP is connectionless, there isn't really any problem with this in itself. 

    6. It is a restrictive view, because it limits that stated "everything is a file" model to single-access. 

    Comments feed: RSS 2.0

Leave a Reply