Python Multithreading - You Could Be Wasting Time
Python is an excellent scripting language that seems especially suitable for Linux applications. You may have heard that python has multi-threading support, which is true, and could dramatically improve your application's performance as demonstrated by running the following two examples below, which are trying to simulate connecting to 4 different network computers.
Single Threaded Implementation
#!/usr/bin/python import time import threading def connect(): print "connecting..." time.sleep(3) print "connected." # Single threaded for i in range(4): connect()
#!/usr/bin/python import time import threading def connect(): print "connecting..." time.sleep(3) print "connected." # multi threaded threads =  for i in range(4): t = threading.Thread(target = connect) threads.append(t) t.start()
You will see that it takes 12 seconds to run in the single-threaded implementation, but just 3 in the multi-threaded one. This also works well with File I/O, not just networking.
Global Interpreter Lock (GIL)
Unfortunately, python suffers from the global interpreter lock
#!/usr/bin/python import threading def thread_test(): print "starting thread" counter = 0 while 1 == 1: counter = 2 return threads =  for i in range(4): t = threading.Thread(target = thread_test) threads.append(t) t.start()
The program should run infinitely, allowing you to monitor your CPU to see how hard it is working. You may have expected 4 cores to be maxed out, but actually, they aren't due to the GIL.
If you have a CPU intenstive application, you can work around the GIL issue in python by using multiprocessing instead of multithreading. This is essentially running multiple processes instead of trying to run multiple threads in a single process. Luckily for us, there are python libraries for multiprocessing to help automate this in your code so that it is farily simple to implement.
Below is an example that uses multiprocessing [source], by taking a filepath as an argument, and executing each line in that file as a command in a single process. It makes sure to only be running as many processes as your CPU count which is the total number of threads your CPU can handle in parallel.
#!/usr/bin/env python """ Script that takes a path to a file as an argument, and executes each line in that file with a thread a pool equal in size to the cpu_count of the computer, thus fully utilizing the CPU. """ #!/usr/bin/env python import multiprocessing import sys import subprocess def process_line(line_command): subprocess.check_call(line_command, shell=True) num_cpu = multiprocessing.cpu_count() job_pool = multiprocessing.Pool(num_cpu) # Fetch the commands file path which # contains each command we wish # to run on a separate line cmds_fp = sys.argv lines = [line.strip() for line in open(cmds_fp, 'r')] job_pool.map(process_line, lines)
Below is a video from Jack of Some demonstrating that Python is technically a multithreaded language, and how the GIL may get in your way, but if you are using libraries like NumPy, this may not be so much of an issue.
Python multithreading is useful for removing I/O bottlenecks, but if your application is CPU intensive and it cannot use multiprocessing, then you may need to use another language such as Java or Rust instead.
- The two code examples for demonstrating multithreading are based on Darren's blog - Basic Python Multithreading
First published: 16th August 2018