Ossasepia

February 7, 2019

Seppuku Job Market: Minimal Dynamic Tasking in Ada

Filed under: Coding — Diana Coman @ 10:44 p.m.

Eulora's server needs a reliable and robust way of performing - preferably in parallel whenever possible - various jobs for all the players that might connect to the game at any given time. Given the parallel requirement, there isn't really any way around the fact that multi-threading is needed. Nevertheless, since multi-threading is by its nature complex enough to give subtle errors and heavy headaches at any time, I'd really much rather make sure any implementation that deals with multiple threads of execution is as small, clear, plain and easy to follow as possible. In other words, if it has to be multi-threaded then it should better be minimal, self-healing, self-adjusting and ruthlessly functional with all and any bells and whistles chucked as far away from it as possible. To drive this point home and keep it in mind at all times1, I'll call this self-reliant unit of the server the Seppuku Job Market or SJM for short.

The list of requirements for the SJM is this:
1. Accept Jobs from all and sundry in a thread-safe manner and execute them in order of their priorities.
2. Generate and kill Worker tasks2 *dynamically* and on an *as-needed basis* to perform jobs as soon as possible but remaining at all times within a pre-set maximum number of Workers3.
3. Creation and destruction of Workers should be reliable and robust: in particular, SJM should run for ever unless explicitly stopped and it should re-spawn Workers as needed, even if they get killed from outside the code (cosmic-ray event or not).
4. Aim to perform jobs in order of their specified priority but taking into account that at most ONE job per player is actually executed at any given time. In other words: do NOT allow a player to hog the whole thing and run as many jobs as they want; this is parallelism aimed to increase the number of players served not merely the number of jobs performed!

Point 1 of requirements hints at the nature of SJM: as it needs to accept jobs from many, unknown sources, it is effectively a "server" of sorts and moreover it is essentially a resource from the point of view of job producers. The best Ada construct that readily fits this description is a protected unit (aka a passive entity that guarantees thread-safe access to the data it encapsulates - in this case to the queue of jobs waiting to be performed). One significant benefit of an Ada protected entity is the fact that it is specifically not a task itself nor is there a task associated with it. Instead, the mutually exclusive access to services provided by a protected unit is ensured by the run-time system and therefore the whole thing has at least one less headache to think of: while Worker tasks may get killed, the SJM itself at least cannot get killed unless the whole program (i.e the main thread of execution of the server itself) gets killed.

Point 2 of requirements (dynamic, self-adjusting number of tasks) means that I'll need to actually create and dispose of tasks programmatically - there is no way to have only statically allocated tasks. In turn, this means that a few restrictions have to go away: No_Allocators, No_Finalization, No_Task_Allocators, No_Tasking, No_Unchecked_Deallocation. The need to drop the No_Finalization and No_Unchecked_Deallocation restrictions comes from the way in which Ada handles memory allocated dynamically even when on the stack. Essentially, dynamically allocated tasks receive memory from a "pool". Once allocated, memory from a pool is reclaimed ONLY when the whole pool goes out of scope or in other words when it can be guaranteed that there is no piece of code left that can actually attempt to access that bit of memory. This is very robust and quite useful of course but in the case of dynamically allocated tasks it means that tasks that finish will STILL effectively occupy memory unless specifically deallocated (with unchecked_deallocation as that's the only way to do it as far as I can tell). In turn, this creates the undesirable but very real and quite horrible possibility that the code will run just fine *until* the pool in which tasks are created runs out of memory because of all previous tasks that finished long time ago but whose space was never reclaimed. To avoid this, the code has to keep track of terminated tasks and explicitly deallocate the memory they occupy before chucking away their pointer and/or re-spawning a replacement Worker (as there is no way to "restart" a task).

Point 3 means that Workers need to be effectively managed based on the evolution of the number of available jobs and the number of Workers themselves. One approach would be of course to have a Supervisor task but the problem then is twofold: first, the Supervisor needs to be aware of changes to the jobs queue as they happen; second, having a Supervisor task creates the potential problem of who supervises the supervisor (esp. with respect to recovery from killed thread since in this case the Supervisor itself might die unexpectedly). Given however that the SJM protected unit effectively guards precisely the jobs queue, it's also in the best position to react promptly to an increase or decrease in jobs and so it follows that it should in fact manage the Workers too. After all, it can do a bit more on receiving a job than merely chucking it into the queue: ideally it would in fact pass it on to a Worker immediately.

While at first sight "take job, spawn Worker, pass it on and let him do it" sounds precisely fine, in practice it's really not fine at all and not least because of the requirement at Point 4: passing a job on to a Worker requires some ordering of jobs (by priority) and even a sort of guarded access to a player since a new job cannot be accepted (and especially cannot be passed on to a Worker for execution) while an existing Worker may still be toiling away on a previous job for the same player. So the SJM needs to find out when a job is finished in order to accept again jobs for that specific player. As always, there are only a few ways to know when something finished: either look for it4 as one rather has to do when Workers are just passive executors of jobs or otherwise expect a signal to be sent back by a more active type of Worker task when it finished the job it had.

This distinction between active and passive Workers (or tasks in general) is quite significant. As passive entities, Workers can at most simply wait to be handed a job or any other signal. Typically, a Worker would be created and handed a job, they would do it and then they would quietly die keeping out of the way of everyone else. This can be a great fit in various cases but I can see several problems with this for Eulora's server: first, Workers cannot be reused even when jobs are available so there is a rather inefficient kill/create overhead5 precisely at busy time when one wants it even less than at any other time; second, the only way for the SJM to find out when a job finished is by a sort of polling i.e. going through the whole set of workers and checking which one is in a terminated state - note that it is not at all clear just *when* should this be done or how would it be triggered (sure, one can use a sort of scheduled event e.g. check it every 3 seconds or some such but it's more of a workaround than addressing the problem); third, the SJM needs to do both Worker creation and Job allocation (i.e. priority ordering + only one job per player at any given time) at the same time and while keeping a job creator waiting.

The first of the above issues (no reuse of Workers) is easily addressed by making Workers active rather than passive: they get created and then they actively ask for a job; once they got a job, they do it and then they go back and report it done, after which they queue again to get another job or perhaps the boot if there are no jobs to be had. And since such active Workers do not finish by default when a task is finished, they need to have rather suicidal tendencies and ask not merely for a job but rather for either a job or permission to seppuku (hopefully in a clean manner though!).

Making Workers active (if suicidal) neatly separates Worker creation from Job allocation: when jobs start pouring in, the SJM can simply create a bunch of Workers and release the job creators before it makes time to actually hand the jobs out to queuing workers. When the jobs keep pouring in, Workers keep working and there's no need to kill them now to only create them a few milliseconds later. Moreover, finished jobs are simply reported and marked as such without any need to poll. In the (hopefully rare) case when a Worker dies unexpectedly before sending the signal that it finished its job, they will be anyway observed sooner or later when the state of Workers is assessed to decide if more or fewer Workers are needed. Essentially the only trouble this approach brings is the added responsibility on the SJM: it controls access to the Job queue for job creators AND for Workers while ALSO effectively managing and keeping track of all Worker-related aspects. But then it's not a Seppuku Job Market for no reason: if it needs to do it, it will have to do it and do it well.

As a proof of concept of the above, I have implemented the SJM precisely as described: as a protected unit that encapsulates a Job queue and manages active Worker tasks, creating and destroying them as needed while also de-allocating memory of any terminated Workers, ensuring that only one Job per player is accepted at any given time and allowing a graceful stop that does not block any job producers that may come at a later time and does not leave dangling Worker tasks either. Jobs are simply record types with a discriminant that specifies their type and therefore the exact form a variable part of the record takes (since each Job type is likely to have specific data structures it requires). Note that I specifically avoided the Object-Oriented option (i.e. tagged type in Ada) with a hierarchy of Job types and reliance on polymorphism for "Complete" to do the right thing depending on the exact type of Job. The reason for this avoidance is mainly that there really isn't much to gain from it as far as I can see at the moment. Similarly, I prefer to not rely on generic containers (for the Job Queue for instance) unless they become clearly and absolutely needed. Finally, I am quite aware of Ada's relevant annexes such as Real-Time Systems and I know that it provides a whole infrastructure of worker pools and jobs with futures even (i.e. a way to provide results at a later time) but they are quite at odds with the significant aim of keeping it all as short6 and clear and easy to follow as possible (not to mention potential issues with the way in which some parts might be implemented using a secondary stack for instance which I specifically do not want to have).

The public part of the EuJobs package is this:

with Interfaces; use Interfaces;
with Data_Structs;
with Ada.Finalization;
with Ada.Unchecked_Deallocation; -- to clean up worker tasks if needed.

package EuJobs is

  pragma Elaborate_Body;

  -- knobs and constants
  Max_Workers : constant Natural := 64;
  Max_Idle_W  : constant Natural := Max_Workers;
  -- max jobs
  Max_Jobs    : constant Natural := Max_Workers * Max_Workers;

  -----------------------WORKERS--------------------
  -- Generic Eulora Workers type: simply perform given Jobs
  subtype Worker_Index is Natural range 1..Max_Workers;
  -- Those are to be FULLY managed (including created/ended) by the Job Market
  -- ACTIVE but suicidal elements:
  --   a worker will keep requesting jobs/permission to seppuku
  --     until allowed to terminate
  -- Pos is a token identifying Worker with the Job Market
  -- NB: ALL workers WILL use this Job Market
  -- NB: do NOT create workers from outside the Job Market!
  task type Worker( Pos: Worker_Index );

  -- needed to dynamically generate Workers
  type Worker_Address is access Worker;
  procedure Free is new Ada.Unchecked_Deallocation(Worker, Worker_Address);

  -- ALL the info that the Job Market holds on workers to manage them
  type Worker_Rec is
    record
      Assigned  : Boolean := False;
      Player_Id : Interfaces.Unsigned_64;
      WA        : Worker_Address;  -- actual pointer to worker
    end record;  

  -- for storing pointers to generated workers including if assigned and id
  type Worker_Array is array( Worker_Index'First ..
                              Worker_Index'Last) of Worker_Rec;

  -- limited controlled type that ensures no dangling workers at Finalize time
  type Controlled_Workers is new Ada.Finalization.Limited_Controlled with
    record
      Workers: Worker_Array;
    end record;

  overriding
  procedure Finalize( S: in out Controlled_Workers );
  overriding
  procedure Initialize( S: in out Controlled_Workers );

  -------------------------------JOBS-------------------
  -- Job types; NB: do NOT map (nor have to) directly on message types!
  type Job_Types is ( Do_Nothing,
                      Print_Job,
                      Create_Acct );

  -- Data structure with relevant information for each type of job
  type Job_Data ( T: Job_Types := Do_Nothing ) is
    record
      -- common information relating to the one requesting this job
      Player_ID: Interfaces.Unsigned_64 := 0;
      Source_IP: Interfaces.Unsigned_32 := 0;
      --NB: this is SOURCE port - reply WILL be sent here, whether RSA or S!
      Source_P : Interfaces.Unsigned_16 := 0;
      -- Message counter, as received
      Counter  : Interfaces.Unsigned_16 := 0;
      Priority : Natural := 0; --lowest possible priority
      case T is
        when Create_Acct =>
          Acct_Info: Data_Structs.Player_RSA;
        when Do_Nothing =>
          null;
        when Print_Job =>
          null;
      end case;
    end record;

  procedure Complete( JD   : in Job_Data );

  subtype Job_Count is Natural range 0..Max_Jobs;
  type Job_Array is array( 1..Max_Jobs ) of Job_Data;
  type Jobs_List is
    record
      Len  : Job_Count := 0;
      JA   : Job_Array;
    end record;

  ---------------------------Job_Market--------------------

  -- FULLY self-managed Job Market for euloran jobs:
  --   -- accepts jobs to do
  --   -- spawns, kills and managed workers that complete the jobs
  -- NB: Job_Market will DISCARD a new job when:
  --    -- it is FULL (i.e. can't handle anymore)
  --    -- it is stopping
  --    -- it already has a job for the same player
  -- Jobs are performed according to specific criteria (not strictly fifo):
  --   - FIFO but ensuring no more than 1 job per player served at any time
  --   - ALSO: there might be other priorities (e.g. type of job)
  protected Job_Market is
    -- adding a new job that needs to be done
    -- this can be ANY derivated type of Job_Data
    -- NB: Added will be true if J was indeed accepted and False otherwise
    entry Add_Job( J     : in Job_Data;
                   Added : out Boolean );

    -- workers request jobs when they are out of work
    -- workers need to provide their token (Pos)
    -- they can get to do: either a job OR seppuku signal.
    procedure Get_Job( Pos    : in Worker_Index;
                       J      : out Job_Data;
                       Seppuku: out Boolean );

    -- workers have to report back when a job is done
    -- (or they get sweeped up eventually if/when they abort).
    procedure Done_Job( Pos: in Worker_Index );

    -- sets in motion the process to stop gracefully:
    --   -- no more jobs received, existing discarded
    --   -- all workers will be given Seppuku signal
    -- NB: NO reverse for this.
    procedure Stop;

    -- for any external supervisors
    -- returns TRUE if it is NOT stopping
    -- returns False if it is stopping
    function Operating(  Waiting_Jobs: out Natural;
                         Idle_Workers: out Natural;
                         Active_Workers: out Natural;
                         Terminated_Workers: out Natural;
                         Is_Full: out Boolean)
      return Boolean;

  private

    -- internal storage of jobs and mgm of workers
    Board  : Jobs_List;

    -- NB: Workers are in the BODY of the package
    --   because they HAVE to be after the body of Finalize

    -- when stopping:
    -- discard new jobs; give out stop on get/done; empty jobs map
    Stopping : Boolean := False;
    Fullboard: Boolean := False;

    -- Retrieves next available job from the Board and returns it in JD
    -- Sets Found to True if an available job was found (i.e. JD is valid)
    -- Sets Found to False (and JD is undefined) if NO available job was found.
    -- NB: this DOES remove the element from the board!
    procedure Get_Available( JD    : out Job_Data;
                             Found : out Boolean );

    -- checks if the given player_id IS currently served by any worker
    function Is_Assigned( Player_ID: in Interfaces.Unsigned_64 )
             return Boolean;

    -- Checks in Board list ONLY if there is a job for this player
    -- Returns True if a job was found (i.e. a job waiting for a worker)
    -- Returns False otherwise.
    -- NB: Player might STILL have a job in progress (assigned to a worker)
    function Has_Waiting_Job( Player_ID: in Interfaces.Unsigned_64 )
             return Boolean;

    -- releases any player_id that might be stuck with aborted workers
    -- *creates* new workers if needed (specific conditions met)
    procedure Manage_Workers;

  end Job_Market;

private

  -- create new Worker with identification token (position) P
  function Create_Worker(P: in Worker_Index)
             return Worker_Address;

end EuJobs;

Workers are very simple tasks with an ID received at creation time to identify them within the Job_Market (very simply by position in the array of Worker addresses). They run a loop in which they request tasks or permission to Seppuku and when they receive either of them they proceed to do as instructed. Perhaps you noticed above that the array of Worker pointers is wrapped inside Controlled_Workers, which is a controlled type. A controlled type in Ada guarantees that the provided Initialize and Finalize routines are run precisely at the stages that their names suggest to enable the type to start off cleanly and to end up cleaning after itself. In the case of Controlled_Workers, the Initialize simply makes sure that the array has all pointers marked as null and moreover as not assigned any tasks while the Finalize goes one more time through the array and finishes off (with abort) any workers that are not null already. Note that the scope of Worker tasks is in fact the package level since the Worker_Address type is declared at this level (and that's how the scope is defined for such types in Ada). You might have noticed also that there is no concrete array of Workers defined anyhere so far: indeed, the array of workers is defined inside the package body for two main reasons: first, it should NOT be accessed by anyone from outside (not even potential children packages at a later time); second, it has to be defined after the bodies of Initialize and Finalize since otherwise it can't be created.

Jobs are barely sketched for now as Job_Data structures with a discriminant to distinguish different types and a variable part for specific data that each type of job needs. The Complete procedure then simply does different things for each type of job in a straightforward manner (at the moment it does something for the print job only for basic testing purposes).

The Job_Market itself is a protected object that offers a handfull of services (aka public entries, procedures or functions): entry Add_Job for job producers to provide their new jobs; procedure Get_Job for Workers who are looking for something to do; procedure Done_Job for Workers who report they finished their previously allocated job; procedure Stop for any higher-level caller who is in a position to turn off the whole Job_Market; function Operating that simply provides information on the current state (i.e. operating or stopping) and status (e.g. number of jobs and workers) of the Job_Market. Note that there are important differences between functions, procedures and entries: functions can only *read* protected data so they are effectively banned from modifying anything, hence Operating being exactly a function as it provides a snapshot of current state and metrics for the Job_Market; procedures can modify data but a call to them is unconditional meaning it gets accepted as soon as the protected object is available and the caller is first in queue for it, without any further restrictions - hence Stop, Done_Job and Get_Job are procedures since there is no constraint on them being called at any time; finally, entries can also modify data but they have entry barriers meaning they accept a call only when certain conditions are met - in this case Get_Job has the simple but necessary condition that either the Job_Market is stopping (in which case callers should not be blocked since it's pointless to wait anyway) or the Job queue is not full since it makes little sense to allow a job producer in just to discard their job for lack of space anyway. Note however that this is merely for completeness here since in practice there will be several other levels of measures taken so that the job queue does NOT become full since that is clearly not a sane way to have the server running.

In addition to the above public services, the Job_Market also has a private part where it keeps the job queue (as a basic array for now - this can easily change at a later time if there is a good reason for the change), a flag to know if it's stopping and one to register if/when the board is full as well as a few helper procedures and functions for its own use. The Get_Available procedure effectively implements the strategy of picking next Job to execute: it's here that priorities are considered really and it's here that there is another check to make sure that no two jobs of the same player are ever executed at the same time. The Is_Assigned procedure checks the set of Workers to see if any of them is performing a job for the specified player. The Has_Waiting_Job on the other hand checks the job queue to see if there is any job from the specified player waiting in the queue. Arguably the most important of those is "Manage_Workers" that does precisely what the name says: it does a headcount of Workers in various states, cleans up any aborted/unexpectedly dead ones, reclaims memory for terminated ones and then, if required, creates new Workers to match the current Job provision. Note that there really are only 64 workers in total (and at any rate this is unlikely to become a huge number) so this headcount of workers is not really terribly costly.

The overall package further has a private function that dynamically creates a new Worker task with the given ID, returning its address. This is more for convenience than anything else since one could easily call new directly so perhaps it will even go away at the next round of trimming the code.

The implementation in eujobs.adb starts with the Initialize and Finalize procedures, declares the Controlled_Workers object and then proceed with the internals of the Job_Market itself:

with Ada.Text_IO; use Ada.Text_IO;

package body EuJobs is

  procedure Finalize( S: in out Controlled_Workers ) is
    -- ALL this needs to do is to make SURE no worker is still running!
  begin
    for I in S.Workers'First .. S.Workers'Last loop
      if S.Workers(I).WA /= null then
        abort S.Workers(I).WA.all;
        S.Workers(I).WA := null;
        S.Workers(I).Assigned := False;
      end if;
    end loop;
  end Finalize;

  procedure Initialize( S: in out Controlled_Workers ) is
  begin
    for I in S.Workers'First .. S.Workers'Last loop
      S.Workers(I).WA := null;
      S.Workers(I).Assigned := False;
    end loop;
  end Initialize;

  -- actual workers slots; workers are managed internally here
  -- this type is needed though, to Finalize properly
  CW: Controlled_Workers;

  protected body Job_Market is
    -- adding a new job that needs to be done
    -- this can be ANY derivated type of Job_Data
    entry Add_Job( J     : in Job_Data;
                   Added : out Boolean )
      when Stopping or    --to unblock producers
           (not Fullboard) is
    begin
      -- if stopping, discard job -- allows callers to finish too...
      -- check Player_ID and add job ONLY if none exist for this player
      if (not Stopping) and
         (not Is_Assigned(J.Player_ID)) and
         (not Has_Waiting_Job(J.Player_ID)) then
        -- board is known to have space, so add to it
        Board.JA(Board.JA'First + Board.Len) := J;
        Board.Len := Board.Len + 1;

        -- job added may mean full board
        FullBoard := Board.Len >= Board.JA'Last;

        -- Quick worker management to adjust if needed
        Manage_Workers;
        -- Let caller know that job was indeed added
        Added := True;
      else
        Added := False; --not added, aka discarded
      end if;
    end Add_Job;

    -- workers request jobs or seppuku when they are out of work
    procedure Get_Job( Pos    : in Worker_Index;
                   J      : out Job_Data;
                   Seppuku: out Boolean ) is
      Found : Boolean;
    begin
      if Stopping then
        -- when stopping: all seppuku
        Seppuku := True;
      else
        -- try first to get some job that should be done
        Get_Available(J, Found);
        if (not Found) then
          Seppuku := True; --since no job is available..
        else
          -- have a job so no seppuku for now
          Seppuku := False;
          -- update Worker record to mark player as being served etc.
          CW.Workers(Pos).Assigned := True;
          CW.Workers(Pos).Player_ID := J.Player_ID;
          -- this SURELY means board is NOT full!
          Fullboard := False;
        end if;
      end if;
      -- LAST: manage workers in ANY CASE!
      Manage_Workers;
    end Get_Job;

    -- workers have to report back when a job is done
    procedure Done_Job( Pos: in Worker_Index ) is
    begin
      -- update record for this worker and let him go
      CW.Workers(Pos).Assigned := False;
    end Done_Job;

    -- aim to stop gracefully:
    --   -- no new jobs stored, existing discarded, workers killed.
    -- NB: NO reverse for this.
    procedure Stop is
    begin
      Stopping := True; -- NO need for anything else, really
    end Stop;

    function Operating(  Waiting_Jobs: out Natural;
                         Idle_Workers: out Natural;
                         Active_Workers: out Natural;
                         Terminated_Workers: out Natural;
                         Is_Full: out Boolean)
      return Boolean is
    begin
      Waiting_Jobs := Natural( Board.Len );
      Is_Full := Fullboard;
      Idle_Workers := 0;
      Active_Workers := 0;
      Terminated_Workers := 0;

      for I in CW.Workers'Range loop
        if CW.Workers(I).WA /= null then
          if CW.Workers(I).WA'Terminated then
            Terminated_Workers := Terminated_Workers+1;
          elsif CW.Workers(I).Assigned then
            Active_Workers := Active_Workers + 1;
          else
            Idle_Workers := Idle_Workers + 1;
          end if;
        end if;
      end loop;
      return (not Stopping);
    end Operating;

    -- anything needed for external load checking (?)

--private stuff

    procedure Get_Available( JD    : out Job_Data;
                             Found : out Boolean ) is
      Pos   : Job_Count;
      P     : Natural := 0; --priority of job found so far
    begin
      Found := False;
      -- ALWAYS walk the FULL set: higher priority might have come in later
      for I in 1 .. Board.Len loop
        if ( (not Found) or (Board.JA(I).Priority > P) ) and
           (not Is_Assigned(Board.JA(I).Player_ID) ) then
          Found := True;
          Pos   := I;
          P     := Board.JA(I).Priority;
          -- but don't copy just yet, as there might be higher priority further
        end if;
      end loop;
      -- retrieve the found job data but ONLY if found!
      if Found then
        JD := Board.JA(Pos);
        -- if not last job, shift to avoid gaps in the array
        if Pos < Board.Len then
          Board.JA(Pos..Board.Len-1) :=
              Board.JA(Pos + 1 .. Board.Len);
        end if;
        -- update count of jobs in the array
        Board.Len := Board.Len -1;
      end if;
    end Get_Available;

    function Is_Assigned( Player_ID: in Interfaces.Unsigned_64 )
             return Boolean is
      Found: Boolean := False;
    begin
      -- walk the array of workers and check
      for I in CW.Workers'Range loop
        if CW.Workers(I).WA /= null and
           CW.Workers(I).Assigned and
-- Will have to rely on .assigned being SET properly by the manager!
--  (not CW.Workers(I).WA'Terminated) and
           CW.Workers(I).Player_ID = Player_ID then
          -- found it!
          Found := True;
          exit;
        end if;
      end loop;
      return Found;
    end Is_Assigned;

    function Has_Waiting_Job( Player_ID: in Interfaces.Unsigned_64 )
             return Boolean is
      Found: Boolean := False;
    begin
      for I in Board.JA'First .. Board.JA'First + Board.Len loop
        if Board.JA(I).Player_ID = Player_ID then
          Found := True;
          exit;
        end if;
      end loop;
      return Found;
    end Has_Waiting_Job;

    procedure Manage_Workers is
      Active_W: Natural := 0;
      Idle_W  : Natural := 0;
      Total_W : Natural := 0;
      To_Create: Natural:= 0;
    begin
      -- release player ids if workers terminated
      -- count also precisely how many are active
      for I in CW.Workers'Range loop
        if CW.Workers(I).WA /= null then
          if CW.Workers(I).WA'Terminated then
            -- this terminated abnormally -> LOG?
            CW.Workers(I).Assigned := False;
            -- claim this space to restart a worker here if needed
            --CW.Workers(I).WA := null;
            -- deallocate it too as otherwise memory space slowly gets lost
            -- NB: Free proc sets it to null anyway
            Free(CW.Workers(I).WA);

          --if NOT null and NOT terminated-> idle or active
          elsif CW.Workers(I).Assigned then
              -- this is an active worker, count it
              Active_W := Active_W + 1;
          else
            -- this is an idle worker, count it
            Idle_W := Idle_W + 1;
          end if;
          -- null workers are simply empty spaces, no need to count them
        end if;
      end loop;
      -- calculate total workers
      Total_W := Active_W + Idle_W;

      if (not Stopping) and
         (Board.Len > Total_W) and
         (Total_W < Max_Workers ) and
         (Idle_W = 0) then
        -- need (perhaps) to create workers: how many?
        To_Create := Board.Len - Total_W;

        -- create them for as long as there is ANY space..
        -- NB: MORE workers MIGHT have terminated meanwhile,
        -- but they won't be null!
        for I in CW.Workers'Range loop
          if CW.Workers(I).WA = null then
            -- found a place, so create a worker
            CW.Workers(I).Assigned := False;
            CW.Workers(I).WA := Create_Worker(I);
            To_Create := To_Create - 1;
            Total_W := Total_W + 1;

            if To_Create <= 0 or Total_W >= Max_Workers then
              exit;
            end if;
          end if;
        end loop;
      end if;
     end Manage_Workers;

  end Job_Market;

  -- Worker body
  task body Worker is
    JD      : Job_Data;
    Seppuku : Boolean := False;
  begin
    -- main Loop: get a job or die, work and repeat.
    Work_Loop:
    loop
      -- ask the Job Market for a job or permission to seppuku
      Job_Market.Get_Job( Pos, JD, Seppuku );

      if Seppuku then
        exit Work_Loop;
      else
        -- do the job
        EuJobs.Complete( JD );
        -- report job done
        Job_Market.Done_Job( Pos );
      end if;
    end loop Work_Loop;
    -- worker is done and will die gracefully!
  end Worker;

  -- Jobs themselves
  procedure Complete( JD   : in Job_Data ) is
    Stop: Boolean;
  begin
     -- do different things for different types of jobs...
    case JD.T is
        when Create_Acct =>
          --Acct_Info: Data_Structs.Player_RSA;
          Stop := False;
        when Set_SKeys =>
          -- SKes: Data_Structs.Serpent_Keyset;
          Stop := False;
        when Mgm_SKeys =>
          --SMgm: Data_Structs.Keys_Mgm;
          Stop := False;
        when Print_Job =>
          Put_Line("Completing: job counter " &
                   Interfaces.Unsigned_16'Image(JD.Counter) &
                   " priority " & Natural'Image(JD.Priority) &
                   " for player " &
                   Interfaces.Unsigned_64'Image(JD.Player_ID) &
                   " from IP:P " & Interfaces.Unsigned_32'Image(JD.Source_IP) &
                   ":" & Interfaces.Unsigned_16'Image(JD.Source_P));
        when others =>
          -- no job or dubious at best, better stop.
          Stop := True;
    end case;
  end Complete;

  function Create_Worker(P: in Worker_Index)
             return Worker_Address is
  begin
    return new Worker(P);
  end;

end EuJobs;

Your thoughts, observations and critiques on the above are welcome below in the comments section. If there is a problem with the above approach or with the code itself I really want to hear of it sooner rather than later since it's of course easier to do something about it now - this is after all the whole reason why I'm publishing this proof of concept so go ahead and point out any faults you see.


  1. Also to reflect some suicidal tendencies of my Workers but that becomes clearer later. 

  2. "Threads" if you prefer non-Ada terminology. 

  3. There isn't much point in having more Workers than your underlying iron can actually support 

  4. blocking until it's done or checking at some intervals 

  5. Ada's documentation claims that dynamic creation of a task has a big overhead anyway so it's best avoided whenever possible but I can't say I have any idea just what "big overhead" means here. 

  6. The full .ads + .adb code+comments shown below is 500 lines, it uses no secondary stack, no heap and no containers or other similar external packages. Even the "use Ada.Text_IO" will go away as it's in there now strictly to allow the Print job to be seen as it completes for testing purposes. 

4 Comments »

  1. > So the SJM needs to find out when a job is finished in order to accept again jobs for that specific player.

    I suspect this is cutting too close to the bone. Why can't it accept more jobs while working on an accepted job ?

    > so that the full does NOT become full

    Queue ?

    Comment by Mircea Popescu — February 7, 2019 @ 11:56 p.m.

  2. It can and in fact a previous version did. The only issue I see with it is that it opens up potential filling of the job queue with jobs from only one player i.e. against the whole idea of parallelism. One can protect against this, of course, either at a higher level or simply by imposing a maximum number of jobs per player kept at any and all times. The advantage is that there might presumably be work that is better done split in several jobs and therefore it's a valid situation to have more than one job per player.

    Indeed, will correct.

    Comment by Diana Coman — February 8, 2019 @ 7:29 a.m.

  3. Depending on the exact nature of the jobs, it may be entirely sensible to have more Workers than the CPU has cores, as it could enable performance improvements with the same sort of CPU-bound/IO-bound process juggling that multi-tasking kernels have been doing since long before multi-threading.

    Speaking of the kernel, have you considered the heavy-weight option of using process-level parallelism? You have all of the kernel's mechanisms for dividing work available already, and which will be managing your program regardless.

    Comment by DangerNorm — February 9, 2019 @ 12:09 a.m.

  4. The number of Workers is a knob essentially: meant to be adjusted if, when and as needed; not written in stone or anything, no.

    It might well be the case that the jobs end up performed in a distributed manner before going for process-level parallelism. In any case, it's certainly an overkill at the moment.

    Comment by Diana Coman — February 10, 2019 @ 1:37 p.m.

RSS feed for comments on this post. TrackBack URL

Leave a comment

Theme and content by Diana Coman