It seems like ages ago now that I found my customer had a process that connected to hundreds of Oracle databases to run predefined SQL for health checks. These databases were hosted all over the world and the SQL could take up to fifteen minutes to complete for a single database (with huge amounts of TNS timeouts). The end result was a CSV file that was ultimately formatted into a spreadsheet to provide management information. It took about a day to obtain this final result.
I thought there was a better way.
Surely the time spent waiting for a slow or distant database could be used for running commands on another database? My solution was to partition the database list into eight groups and process these groups concurrently. Now I just needed some synchronisation as the password system was not thread safe. Looking for a method of using a mutex in a shell script I stumbled across this article amongst others.
Putting this together, here is an example script that creates multiple 'threads', each incrementing a protected counter variable.
#!/bin/sh # # Use a mutex to serialize access to a variable. Test this by using multiple # processes mutating this variable. Synchronize all subprocesses and exit after # a defined number of iterations. THREADS=20 COUNTFILE=counter RUNTIME=5 mutex_acquire() { # $1 = mutex name (default: lock) # $2 = miss sleep time in seconds (default: 1) if [ "$1" = "" ]; then lock=.lock; else lock=.$1.lock; fi if [ "$2" = "" ]; then sleep=0.01s; else sleep=$2; fi locked=false while [ $locked = "false" ]; do mkdir $lock 2>/dev/null rc=$? if [ $rc -ne 0 ]; then sleep $sleep else locked=true; fi done } mutex_release() { # $1 = mutex name (default: lock) if [ "$1" = "" ]; then lock=.lock; else lock=.$1.lock; fi rmdir $lock } process() { echo T$1 starting count=1 while [ $count -le $RUNTIME ]; do mutex_acquire if [ ! -f $COUNTFILE ]; then echo 1 >$COUNTFILE else countval=`cat $COUNTFILE` echo `expr $countval + 1` >$COUNTFILE fi echo T$1 counter=`cat $COUNTFILE` mutex_release count=`expr $count + 1` done echo T$1 exiting } thread=1 while [ $thread -le $THREADS ]; do process $thread & thread=`expr $thread + 1` done wait mutex_acquire rm $COUNTFILE mutex_release
More recently I read an article on Linux Journal that discussed using signals to limit the processing time of multiple processes in a shell script. This was provoking, but digging deeper it was apparent that there were many race conditions that had not been solved (e.g. killing processes that have already died could actually kill a different process). Turning this on its head, surely it is better to have the top level shell control the execution of the subshells rather than the other way around?
Putting this together, the earlier script is enhanced to exit after a set amount of time.
#!/bin/sh # # Use a mutex to serialize access to a variable. Test this by using multiple # processes mutating this variable. Synchronize all subprocesses and exit after # a specified period of time. THREADS=20 COUNTFILE=counter RUNTIME=5 mutex_acquire() { # $1 = mutex name (default: lock) # $2 = miss sleep time in seconds (default: 1) if [ "$1" = "" ]; then lock=.lock; else lock=.$1.lock; fi if [ "$2" = "" ]; then sleep=0.01s; else sleep=$2; fi locked=false while [ $locked = "false" ]; do mkdir $lock 2>/dev/null rc=$? if [ $rc -ne 0 ]; then sleep $sleep else locked=true; fi done } mutex_release() { # $1 = mutex name (default: lock) if [ "$1" = "" ]; then lock=.lock; else lock=.$1.lock; fi rmdir $lock } process_start() { thread=$1 trap 'process_stop $thread' ALRM echo T$thread starting count=1 while [ 0 -eq 0 ]; do mutex_acquire if [ ! -f $COUNTFILE ]; then echo 1 >$COUNTFILE else countval=`cat $COUNTFILE` echo `expr $countval + 1` >$COUNTFILE fi echo T$1 counter=`cat $COUNTFILE` mutex_release count=`expr $count + 1` done } process_stop() { echo T$1 exiting exit 0 } thread=1 while [ $thread -le $THREADS ]; do process_start $thread & subpids="$! $subpids" thread=`expr $thread + 1` done sleep $RUNTIME mutex_acquire kill -ALRM $subpids wait mutex_release mutex_acquire rm $COUNTFILE mutex_release
Note that the above scripts were developed using dash and tested using bash.