Cooperating sequential processes

E.W.Dijkstra

Department of Mathematics
Technological University
P.O. Box 513
EINDHOVEN
The Netherlands

2. Loosely Connected Processes.

The subject matter of this monograph is the cooperation between loosely connected sequential processes and this section will be devoted to a thorough discussion of a simple, but representative problem, in order to give the reader some feeling for the problems in this area.

In the previous section we have described the nature of a single sequential process, performing its sequence of actions autonomously, i.e. independent of its surroundings as soon as it has been started.

When two or more of such processes have to cooperate with each other, they must be connected, i.e. they must be able to communicate with each other in order to exchange information. As we shall see below, the properties of these means of intercommunication play a vital role.

Furthermore, we have stipulated that the processes should be connected loosely; by this we mean that apart from the (rare) moments of explicit intercommunication, the individual processes themselves are to be regarded as completely independent of each other. In particular we disallow any assumption about the relative speeds of the different processes. (Such an assumption —say"processes geared to the same clock"— could be regarded as implicit intercommunication.) This independence of speed ratios is in strict accordance with our appreciation of the single sequential process: its only essential feature is, that its elementary steps are performed in sequence. If we prefer to observe the performance with a chronometer in our hand, we may do so, but the process itself remains remarkably unaffected by this observation.

I warn the reader that my consistent refusal to make any assumptions about the speed ratios will at first sight appear as a mean trick to make things more difficult than they already are. I feel, however, fully justified in my refusal. Firstly, we may have to cope with situations in which, indeed, very little is known about the speeds. For instance, part of the system may be a manually operated input station, another part of the system might be such, that it can be stopped externally for any period of time, thus reducing its speed temporarily to zero. Secondly —and this is much more important— when we think that we can rely upon certain speed ratios, we shall discover that we have been "pound foolish and penny wise". True that certain mechanisms can be made simpler under the assumption of speed ratio restrictions. The verification, however, that such an assumption is always justified, is in general extremely tricky and the task to make, in a reliable manner, a well behaved structure out of many interlinked components is seriously aggravated when such "analogue interferences" have to be taken into account as well. (For one thing: it will make the proper working a rather unstable equilibrium, sensitive to any change in the different speeds, as may easily arise by replacement of a component by another —say, replacement of a line printer by a faster model— or reprogramming of a certain portion.)

2.1. A Simple Example.

After these introductory remarks I shall discuss the first problem.

We consider two sequential processes, "process 1" and "process 2", which for our purposes can be regarded as cyclic. In each cycle a so-called "critical section" occurs, critical in the sense that the processes have to be constructed in such a way, that at any moment at most one of the two is engaged in its critical section. In order to effectuate this mutual exclusion the two processes have access to a number of common variables. We postulate, that inspecting the present value of such a common variable and assigning a new value to such a common variable are to be regarded as indivisible, non-interfering actions. I.e. when the two processes assign a new value to the same common variable "simultaneously", then the assignments are to be regarded as done the one after the other, the final value of the variable will be one of the two values assigned, but never a "mixture" of the two. Similarly, when one process inspects the value of a common variable "simultaneously" with the assignment to it by the other one, then the first process will find either the old or the new value, but never a mixture.

For our purposes ALGOL 60 as it stands is not suited, as ALGOL 60 has been designed to describe one single sequential process. We therefore propose the following extension to enable us to describe parallellism of execution. When a sequence of statements —separated by semicolons as usual in ALGOL 60— is surrounded by the special statement bracket pair "`parbegin`" and "`parend`", this is to be interpreted as parallel execution of the constituent statements. The whole construction —let us call it "a parallel compound"— can be regarded as a statement. Initiation of a parallel compound implies simultaneous initiation of all its constituent statements, its execution is completed after the completion of the execution of all its constituent statements. E.g.:

`begin S1; parbegin S2; S3; S4; parend; S5 end`

(in which `S1`, `S2`, `S3`. `S4` and `S5` are used to indicate statements) means that after the completion of `S1`, the statements `S2`. `S3` and `S4` will be executed in parallel, and only when they are all finished, then the execution of statement `55` will be initiated.

With the above conventions we can describe our first solution:

```begin integer turn; turn:= 1;       parbegin       process 1: begin Ll: if turn = 2 then goto Ll;                            critical section 1;                            turn:= 2;                            remainder of cycle 1; goto L1                  end;       process 2: begin L2: if turn = 1 then goto L2;                            critical section 2;                            turn:= 1;                            remainder of cycle 2; goto L2                   end       parend end``` .

(Note for the inexperienced ALGOL 60 reader. After "`begin`" in the first line we find the so-called declaration "`integer` `turn`", thereby sticking to the rule of ALGOL 60 that program text is not allowed to refer to variables without having introduced them with the aid of a declaration. As this declaration occurs after the "`begin`" of the outermost statement bracket pair it means that for the whole duration of the program a variable has been introduced that will only take on integer values and to which the program text can refer by means of the name "`turn`".)

The two processes communicate with each other via the common integer "`turn`", the value of which indicates which of the two processes is the first to perform (or rather: to finish) its critical section. From the program it is clear that after the first assignment, the only possible values of the variable "`turn`" are 1 and 2. The condition for process 2 to enter its critical section is that it finds at some moment "`turn ≠ 1`", i.e. "`turn = 2`". But the only way in which the variable "`turn`" can get this value is by the assignment "`turn:= 2`" in process 1. As process 1 performs this assignment only at the completion of its critical section, process 2 can only initiate its critical section after the completion of critical section 1. And critical section 1 could indeed be initiated, because the initial condition "`turn = 1`" implied "`turn ≠ 2`", so that the potential wait cycle, labeled `L1`, was initially inactive. After the assignment "`turn:= 2`" the roles of the two processes are interchanged. (N.B. It is assumed that the only references to the variable "`turn`" are the ones explicitly shown in the program.)

Our solution, though correct, is, however, unnecessarily restrictive: after the completion of critical section 1, the value of the variable "`turn`" becomes "2", and it must be =1 again, before the next entrance into critical section 1. As a result the only admissible succession of critical sections is the strictly alternating one "1,2,1,2,1,2,1,.....", in other words, the two processes are synchronized. In order to stress explicitly that this is not the kind of solution we wanted, we impose the further condition "If one of the processes is stopped well outside its critical section, this is not allowed to lead to potential blocking of the other process.". This makes our previous solution unacceptable and we have to look for another.

Our second effort works with two integers "`c1`" and "`c2`", where c = 0 / 1 respectively will indicate that the corresponding process in inside / outside its critical section respectively. We may try the following construction:

```begin integer c1, c2;       c1:= 1; c2:= 1;       parbegin       process1: begin L1: if c2 = 0 then goto L1;                           c1: = 0;                           critical section 1;                           c1:= 1;                           remainder of cycle 1; goto L1                 end;       process2: begin L2: if c1 = 0 then goto L2;                           c2: = 0;                           critical section 2;                           c2:= 1;                           remainder of cycle 2; goto L2                 end       parend end . ```

The first assignments set both `c`'s = 1, in accordance with the fact that the processes are started outside their critical sections. During the entire execution of critical section 1 the relation "`c1` = 0" holds and the first line of process 2 is effectively a wait "Wait as long as process 1 is in its critical section.". The trial solution gives indeed some protection against simultaneity of critical section execution, but is, alas, too simple, because it is wrong. Let first process 1 find that `c2` = 1; let process 2 inspect `c1` immediately afterwards, then it will (still) find `c1` = 1. Both processes, having found that the other is not in its critical section, will conclude that they can enter their own section safely!

We have been too optimistic, we must play a safer game. Let us invert, at the beginning of the parallel processes, the inspection of the "`c`" of the other and the setting of the own "`c`". We then get the construction:

```begin integer c1, c2;       c1:= 1; c2:= 1;       parbegin       process 1: begin A1: c1:= 0;                        L1: if c2 = 0 then goto Ll;                            critical section 1;                            c1:= 1;                            remainder of cycle 1; goto A1                  end;       process 2: begin A2: c2:= 0;                        L2: if c1 = 0 then goto L2;                            critical section 2;                            c2:= 1;                            remainder of cycle 2; goto A2                   end       parend end``` .

It is worth while to verify that this solution is at least completely safe. Let us focus our attention on the moment that process 1 finds `c2` = 1 and therefore decides to enter its critical section. At this moment we can conclude
1) that the relation "`c1` = 0" already holds and will continue to hold until process 1 has completed the execution of its critical section,
2) that, as "`c2` = 1" holds, process 2 is well outside its critical section, which it cannot enter as long as "`c1` = 0" holds, i.e. as long as process 1 is still engaged in its critical section.
Thus the mutual exclusion is indeed guaranteed.

But this solution, alas, must also be rejected: in its safety measures it has been too drastic, for it contains the danger of definite mutual blocking. When after the assignment "`c1:= 0`" but yet before the inspection of `c2` (both by process 1) process 2 performs the assignment "`c2:= 0`" then both processes have arrived at label `L1` or `L2` respectively and both relations "`c1` = 0" and "`c2` = 0" hold, with the result that both processes will wait upon each other until eternity. Therefore also this solution must be rejected.

It was OK to set one's own "`c`" before inspecting the "`c`" of the other, but it was wrong to stick to one's own `c`-setting and just to wait. This is (somewhat) remedied in the following construction:

```begin integer c1, c2;       c1:= 1; c2:= 1;       parbegin       process 1: begin L1: c1:= 0;                            if c2 = 0 then                                begin c1:= 1; goto L1 end;                            critical section 1;                            c1:= 1;                            remainder of cycle 1; goto L1                  end;       process 2: begin L2: c2:= 0;                            if c1 = 0 then                                begin c2:= 1; goto L2 end;                            critical section 2;                            c2:= 1;                            remainder of cycle 2; goto L2                  end       parend end``` .

This construction is as safe as the previous one and, when the assignments "`c1:= 0`" and "`c2:= 0`" are performed "simultaneously" it will not necessarily lead to mutual blocking ad infinitum, because both processes will reset their own "`c`" back to 1 before restarting the entry rites, thereby enabling the other process to catch the opportunity. But our principles force us to reject also this solution, for the refusal to make any assumptions about the speed ratio implies that we have to cater for all speeds, and the last solution admits the speeds to be so carefully adjusted that the processes inspect the other's "`c`" only in those periods of time that its value is = 0. To make clear that we reject such solutions that only work with some luck, we state our next requirement: "If the two processes are about to enter their critical sections, it must be impossible to devise for them such finite speeds, that the decision which one of the two is the first to enter its critical section is postponed until eternity.".

In passing we note, that the solution just rejected is quite acceptable everyday life. E.g., when two people are talking over the telephone and they are suddenly disconnected, as a rule both try to reestablish the connection. They both dial and if they get the signal "Number Engaged", they put down the receiver and, if not already called, they try "some" seconds later. Of course, this may coincide with the next effort of the other party, but as a rule the connection is reestablished succesfully after very few trials. In our mechanical circumstances, however, we cannot accept this pattern of behaviour: our parties might very well be identical!

Quite a collection of trial solutions have been shown to be incorrect and at some moment people that had played with the problem started to doubt whether it could be solved at all. To the Dutch mathematician Th.J.Dekker the credit is due for the first correct solution. It is, in fact, a mixture of our previous efforts: it uses the "safe sluice" of our last constructions, together with the integer "`turn`" of the first one, but only to resolve the indeterminateness when neither of the two immediately succeeds. The initial value of "`turn`" could have been 2 as well.

```begin integer c1, c2 turn;       c1:= 1; c2:= 1; turn = 1;       parbegin       process 1: begin A1: c1:= 0;                        L1: if c2 = 0 then                                begin if turn = 1 then goto L1;                                      c1:= 1;                                  B1: if turn = 2 then goto B1;                                      goto A1                                end;                            critical section 1;                            turn:= 2; c1:= 1;                            remainder of cycle 1; goto A1                  end;       process 2: begin A2: c2:= 0;                        L2: if c1 = 0 then                               begin if turn = 2 then goto L2;                                     c2:= 1;                                 B2: if turn = 1 then goto B2;                                     goto A2                               end;                            critical section 2;                            turn:= 1; c2:= 1;                            remainder of cycle 2; goto A2                  end       parend end .```

We shall now prove the correctness of this solution. Our first observation is that each process only operates on its own "`c`". As a result process 1 inspects "`c2`" only while "`c1` = 0", it will only enter its critical section provided it finds "`c2` = 1"; for process 2 the analogous observation can be made.

In short, we recognize the safe sluice of our last constructions and the solution is safe in the sense that the two processes can never be in their critical sections simultaneously. The second part of the proof has to show that in case of doubt the decision which of the two will be the first to enter cannot be postponed until eternity. Now we should pay some attention to the integer "`turn`": we note that assignment to this variable only occurs at the end —or, if you wish: as part— of critical sections and therefore we can regard the variable "`turn`" as a constant during this decision process. Suppose that "`turn` = 1". Then process 1 can only cycle via `Ll`, that is with "`c1` = 0" and only as long as it finds "`c2` = 0". But if "`turn` = 1" then process 2 can only cycle via `B2`, but this state implies "`c2` = 1", so that process 1 cannot and is bound to enter its critical section. For "`turn` = 2" the mirrored reasoning applies. As third and final part of the proof we observe that stopping, say, process 1 in "`remainder of cycle 1`" will not restrict process 2: the relation "`c1` = 1" will then hold and process 2 can enter its critical section gaily, quite independent of the current value of "`turn`". And this completes the proof of the correctness of Dekker's solution. Those readers that fail to appreciate its ingenuity are kindly asked to realize, that for them I have prepared the ground by means of a carefully selected set of rejected constructions.