Small intro to threads, race conditions and locking

embroidery-987866_640

Programming with threads has some pitfalls. This post deals with the basic problem with concurrent code execution – race conditions.

Definitions

Code – part of a program

Threads – code executing in parallel (concurrently) with other code

Race condition (in our context) – several threads running in parallel and their overall result depends on the scheduling of the threads.

The problem

Below is a very small naive code that demonstrates the problem.

We have a global variable i which is incremented by several threads (which run concurrently). The code for incrementing i is i = i + 1 . Multiple threads executing the code below have a race condition. One would think that 100 threads doing such increment 100 times each would result i being 10000 at the end.

Incorrect code

THREADS = 100
ITERATIONS = 100

i=0
THREADS.ptimes(F() {
    for(j;ITERATIONS)
        i=i+1
})
echo(i)

Incorrect code explanation

The code is in the NGS language but the logic would be the same across many languages that have the same concurrency model: C, Java, etc.

Side note (skip freely):

This example would not apply to some languages which guarantee atomic variable increment. There would be no problem there. If we change the computation to something more complex, even in these languages we’re back to code similar to the above.

ptimes – “parallel times” function – runs the given code in parallel threads.

N.ptimes(code) – runs N threads executing code.

F() { ... } – literal function

for(j;N) – loops with values of j from zero to N (not including N). Sugar for for(j=0; j<N; j=j+1).

Code summary: in 100 parallel threads do i = i + 1 100 times in each thread.

Incorrect code output

The scheduling of the threads is done by the operating system. Since the scheduling is out of our control, the result is unpredictable when a race condition is present.

The output is actually about somewhere between 7900 and 8400 on my system, different each time I run this code, not the 10000 you might expect.

So, why the result of i = i + 1 is dependent on scheduling of the threads? Let’s examine the two following scheduling alternatives:

Incorrect code threads scheduling alternatives

Scheduling alternative A:

Thread 1 runs all the code (i = i + 1), then thread 2 runs all the code. No problem there. i would be incremented by 2.

Scheduling alternative B:

Thread 1 runs i + 1 : fetch the value of i and add one to it. Thread 2 also runs i + 1. Thread 1 saves the computed value of i + 1 to i. Then thread 2 does the same. The problem is that the fetched value of i was the same for both threads. Summary: both threads fetched the same value of i , incremented it and stored. Total increment of i is one.

Since scheduling A vs B have different outcomes, it’s a race condition, the output is unpredictable and sometimes incorrect. The incorrect code above is purposely such that the output is mostly incorrect.

Fixing the code

Add l = Lock() at the beginning of the code.

Replace i = i + 1 with l.acquire(F() i=i+1) .

Code explanation

Lock() – creates a new lock

l.acquire(code) – acquires the lock l, runs the code and releases the lock.

Locks

Locks provide a way to make sure that only one thread executes the given section of a code. In our case, using the lock allows only “Scheduling alternative A”.

When several threads try to acquire a lock simultaneously, only one will succeed and then enter the code. When this thread finishes executing the given code it releases the lock. After the lock is released, it can be acquired by another thread.

Any locking mechanism must be based on an atomic hardware operation such as Test-and-set or Compare-and-swap. Trying to come up with your implementation of a lock which is based on code only will not work, you will be shifting the race condition from one place to another.

Summary

Watch out for race conditions as they are a common problem when using threads. Use locks to avoid race conditions. For best performance the code which is run when holding a lock should be as small as possible.

See also:

  1. Concurrent data structures
  2. Message Passing Concurrency / Message passing
  3. Event Loop Concurrency

Full incorrect and correct code: https://github.com/ilyash/ngs/blob/6add46e6f60ce398f37f5138ece16cf8fdfd719b/c/demo/locks.ngs

Leave a comment