Programster's Blog

Tutorials focusing on Linux, programming, and open-source

Easily Parallelize Commands in Linux (Multiprocessing)

Table of Contents

About

A while ago I created a script in Python that would

  • automatically find out the number of vCPUs on a computer.
  • read a file which is supposed to list commands on each line.
  • execute each command on its own process, and running only the number of processes as there are vCPUs.
    • This is important as we want the speed of parallelizaton but not to overload the computer, causing lots of wasted overhead in the CPU swapping in and out of processes.

Do not use these scripts to execute a series of commands that are dependent on each other and need to be executed in a certain order. E.g. each command needs to be independent.

Simple Xargs Way

The python script was just 24 lines long and worked well. However, if you are in Linux, you can achieve the same result in just two lines using xargs.

NUM_PROCS=`cat /proc/cpuinfo | awk '/^processor/{print $3}'| wc -l`
< $@ xargs -d '\n' -P $NUM_PROCS -I {} /bin/bash -c "{}"

This first line figures out how many vCPUs you have in a way that should hopefully work on any computer, no matter how your processors are indexed etc.

With regards to the second line:

  • the < $@ is passing the first argument (which should be the path to the file of commands) to standard input.
  • The -d '\n' is specifying that the newline is being used for the delimiter between arguments to xargs.
  • The -P $NUM_PROCS tells xargs to maintain a number of processes equal to the number of vCPUs we have (you can manually set $NUM_PROCS if you desired). * -I {} is telling xargs to substitute {} in the bit coming up with a line we read from the file (an argument passed to xargs).
  • The /bin/bash -c "{}" is telling xargs to use bash to execute the line which is passed in as a string.

Example Usage

  • Copy and paste the two line script into a file called multiprocess in your $PATH.
  • Create a text file with all the commands you want to run in parallel. E.g.
sudo virsh snapshot-create server1.programster.org
sudo virsh snapshot-create server2.programster.org
sudo virsh snapshot-create server3.programster.org
sudo virsh snapshot-create server4.programster.org
# more commands here...
  • Now execute multiprocess /path/to/commands/file to process those commands in parallel.

Advanced BASH Script Version

This version of the script allows you to optionally specify the number of parallel threads you wish to have executing.

#!/bin/bash

# Script to run a file full of BASH commands in parallel
# Usage :
# multiprocess file-full-of-bash-commands.sh" 
# multiprocess  file-full-of-bash-commands.sh 16

if [[ $2 ]]; then
    NUM_PROCS=$2
    < $1 xargs -d '\n' -P $NUM_PROCS -I {} /bin/bash -c "{}"
else
    NUM_PROCS=`cat /proc/cpuinfo | awk '/^processor/{print $3}'| wc -l`
    < $1 xargs -d '\n' -P $NUM_PROCS -I {} /bin/bash -c "{}"
fi

Example Usage

The command below will have your computer execute the commands in parallel using 4 separate processes (4 jobs at a time):

NUM_PARALLEL_JOBS=4
multiprocess /path/to/my/commands/file.txt $NUM_PARALLEL_JOBS

However, if you wish for the computer to automatically determine the number of jobs to run based on the number of CPU threads then just don't specify the number of parallel jobs:

multiprocess /path/to/my/commands/file.txt

References

Last updated: 23rd September 2023
First published: 16th August 2018

This blog is created by Stuart Page

I'm a freelance web developer and technology consultant based in Surrey, UK, with over 10 years experience in web development, DevOps, Linux Administration, and IT solutions.

Need support with your infrastructure or web services?

Get in touch