Concurrency and synchronization in POSIX Bourne Shell (sh or bash)

By Ewan

March 24, 2013

It seems like ages ago now that I found my customer had a process that connected to hundreds of Oracle databases to run predefined SQL for health checks. These databases were hosted all over the world and the SQL could take up to fifteen minutes to complete for a single database (with huge amounts of TNS timeouts). The end result was a CSV file that was ultimately formatted into a spreadsheet to provide management information. It took about a day to obtain this final result.

I thought there was a better way.

Surely the time spent waiting for a slow or distant database could be used for running commands on another database? My solution was to partition the database list into eight groups and process these groups concurrently. Now I just needed some synchronisation as the password system was not thread safe. Looking for a method of using a mutex in a shell script I stumbled across this article amongst others.

Putting this together, here is an example script that creates multiple 'threads', each incrementing a protected counter variable.

#!/bin/sh
#
# Use a mutex to serialize access to a variable.  Test this by using multiple
# processes mutating this variable.  Synchronize all subprocesses and exit after
# a defined number of iterations.

THREADS=20
COUNTFILE=counter
RUNTIME=5

mutex_acquire() {
  # $1 = mutex name (default: lock)
  # $2 = miss sleep time in seconds (default: 1)
  if [ "$1" = "" ]; then lock=.lock; else lock=.$1.lock; fi
  if [ "$2" = "" ]; then sleep=0.01s; else sleep=$2; fi
  locked=false
  while [ $locked = "false" ]; do
    mkdir $lock 2>/dev/null
    rc=$?
    if [ $rc -ne 0 ]; then sleep $sleep
    else locked=true; fi
  done
}

mutex_release() {
  # $1 = mutex name (default: lock)
  if [ "$1" = "" ]; then lock=.lock; else lock=.$1.lock; fi
  rmdir $lock
}

process() {
  echo T$1 starting
  count=1
  while [ $count -le $RUNTIME ]; do
    mutex_acquire
      if [ ! -f $COUNTFILE ]; then echo 1 >$COUNTFILE
      else
        countval=`cat $COUNTFILE`
        echo `expr $countval + 1` >$COUNTFILE
      fi
      echo T$1 counter=`cat $COUNTFILE`
    mutex_release
    count=`expr $count + 1`
  done
  echo T$1 exiting
}

thread=1
while [ $thread -le $THREADS ]; do
  process $thread &
  thread=`expr $thread + 1`
done

wait

mutex_acquire
  rm $COUNTFILE
mutex_release

More recently I read an article on Linux Journal that discussed using signals to limit the processing time of multiple processes in a shell script. This was provoking, but digging deeper it was apparent that there were many race conditions that had not been solved (e.g. killing processes that have already died could actually kill a different process). Turning this on its head, surely it is better to have the top level shell control the execution of the subshells rather than the other way around?

Putting this together, the earlier script is enhanced to exit after a set amount of time.

#!/bin/sh
#
# Use a mutex to serialize access to a variable.  Test this by using multiple
# processes mutating this variable.  Synchronize all subprocesses and exit after
# a specified period of time.

THREADS=20
COUNTFILE=counter
RUNTIME=5

mutex_acquire() {
  # $1 = mutex name (default: lock)
  # $2 = miss sleep time in seconds (default: 1)
  if [ "$1" = "" ]; then lock=.lock; else lock=.$1.lock; fi
  if [ "$2" = "" ]; then sleep=0.01s; else sleep=$2; fi
  locked=false
  while [ $locked = "false" ]; do
    mkdir $lock 2>/dev/null
    rc=$?
    if [ $rc -ne 0 ]; then sleep $sleep
    else locked=true; fi
  done
}

mutex_release() {
  # $1 = mutex name (default: lock)
  if [ "$1" = "" ]; then lock=.lock; else lock=.$1.lock; fi
  rmdir $lock
}

process_start() {
  thread=$1
  trap 'process_stop $thread' ALRM
  echo T$thread starting
  count=1
  while [ 0 -eq 0 ]; do
    mutex_acquire
      if [ ! -f $COUNTFILE ]; then echo 1 >$COUNTFILE
      else
        countval=`cat $COUNTFILE`
        echo `expr $countval + 1` >$COUNTFILE
      fi
      echo T$1 counter=`cat $COUNTFILE`
    mutex_release
    count=`expr $count + 1`
  done
}

process_stop() {
  echo T$1 exiting
  exit 0
}

thread=1
while [ $thread -le $THREADS ]; do
  process_start $thread &
  subpids="$! $subpids"
  thread=`expr $thread + 1`
done

sleep $RUNTIME
mutex_acquire
  kill -ALRM $subpids
  wait
mutex_release

mutex_acquire
  rm $COUNTFILE
mutex_release

Note that the above scripts were developed using dash and tested using bash.

Classifications

GNU/Linux

Shell

Add new comment