A Quick Primer on Python Concurrency

10/26/2017


Python is often thought of as a single threaded language but there are several avenues for executing tasks concurrently.
The threading module allows us to spin up native operating system threads to execute multiple tasks concurrently. The threading API has methods for creating thread objects and then using the object to start and join on the underlying thread.

# define the function to execute in a thread def do_some_work(val): print ("doing some work in thread") print ("echo: {}".format(val)) return val = "text" # create thread object passing in the target function and optional args in constructor t = threading.Thread(target=do_some_work, args=(val,)) # start the thread t.start() # block execution of the main thread until thread t is completed t.join()

The threading module also provides several synchronization and inter-thread communication mechanisms for when threads need to communicate and coordinate with each other, or for when multiple threads are mutating the same area of memory. Locks and Queues are the most common of those Synchronization methods but Python also provides RLocks, Semaphores, Conditions, Events and Barrier implementations in the threading API.
lock = threading.Lock() ### assume that code below runs in multiple threads ### lock.acquire() # acquire the lock preventing other threads from doing so try:      # access shared resource finally:      lock.release() # release the lock so that other blocked threads can now run
queue = Queue() ## assume the code below runs in a separate thread t1 ### def producer(queue):      item = make_an_item()      queue.put(item) ## assume the code below runs in a separate thread t2 ### def consumer(queue):      item = queue.get() #gets item put in the queue by another thread. Blocks if item not there yet      queue.task_done() # marks the last item retrieved as done

However, the current implementation of cpython has a Global Interpreter Lock (GIL) to make Python easier to implement and faster to run for single threaded programs. But as a result of the GIL, which only allows one thread to run at a time, Threading is not suitable for CPU-bound tasks (tasks in which most of the time is spent performing a computation instead of waiting on IO). So instead we have the multiprocessing package. The multiprocessing package uses processes instead of threads as the actors of parallel execution. And the multiprocessing API tries to mimic the threading API as much as possible, to reduce the amount of dissonance between the two and to make switching easier.
# define the function to execute in a new process def generate_hash(text):      return hashlib.sha384(text).hexdigest() text = "some long text here…" if __name__ == '__main__':     # create process object passing in the target function and optional args in constructor     p = multiprocessing.Process(target= generate_hash,args=(text,))     # start and join the process     p.start()     p.join()

One of the major areas where there is a difference between the threading and multiprocessing APIs is in the implementation of shared state. Threads automatically share memory with each other but processes don't. So, special accommodations must be made to allow processes to communicate and share state. Processes can either allocate and use OS shared memory areas, or can communicate with a server process which maintains shared data.
The concurrent.futures module provides a layer of abstraction over both concurrency mechanisms (threads and processes).
It was also the introduction of Futures into Python. In Python, a future represents a pending result and it also allows us to manage the execution of the computation that produces the result. Future API methods include result(), cancel() and add_done_callback(fn)
# define the function to execute in a new process def generate_hash(text):      return hashlib.sha384(text).hexdigest() text = b"some long text here..." executor = ProcessPoolExecutor() # can be replaced with `ThreadpoolExecutor()` future_result = executor.submit(generate_hash, text) # submit a job to the pool, immediately returns a future object

Finally, the most recent addition to the Python concurrency family is the asyncio module. asyncio brings single threaded asynchronous programming to Python. It provides an event loop which runs specialized functions called coroutines, A coroutine has the ability to pause itself and yield control back to the event loop when it needs to wait for IO or some other long running task. The event loop can then go on and execute other coroutines and resume the prior coroutine when an event occurs that indicates that the IO or long running task is complete. As a result, we have multiple tasks running on the same thread and yielding to one another instead of blocking.
# a coroutine function as denoted by the async keyword async def delayed_hello():      print("Hello ")      # the coroutine will pause here and yield back to the event loop     await asyncio.sleep(1)     print("World!") # get the event loop loop = asyncio.get_event_loop() # pass the coroutine to the event loop for execution loop.run_until_complete(delayed_hello()) loop.close()

There are several resources that provided an in-depth look into Python concurrency like the Python Module of the Week blog and the Python Parallel Programming Cookbook. If you are a Pluralsight user, you can also checkout my Pluralsight course Python Concurrency: Getting Started.

You Might Also Like

0 comments