Python Concurrency & Parallelism: Threading, Multiprocessing & AsyncIO
1. What are concurrency and parallelism in Python?
Q: Concurrency vs Parallelism?
Concurrency: Managing multiple tasks by interleaving their execution, not necessarily simultaneously. Suitable for I/O-bound tasks (e.g., network requests).
Parallelism: Executing multiple tasks simultaneously, typically on multiple CPU cores. Suitable for CPU-bound tasks (e.g., computations).
Python Context:
- Global Interpreter Lock (GIL): Limits true parallelism in CPython for threading, as only one thread executes Python bytecode at a time.
- Solutions: Threading for concurrency (I/O-bound), multiprocessing for parallelism (CPU-bound), AsyncIO for asynchronous concurrency.
Use Case: Threading/AsyncIO for network I/O; multiprocessing for heavy computations.
2. What is threading in Python?
Q: What is threading?
Threading: Running multiple threads within a single process, sharing memory.
Module: threading.
Key Classes/Functions: threading.Thread, threading.Lock (for synchronization).
Use Case: I/O-bound tasks (e.g., downloading files, database queries).
Limitation: GIL prevents true parallelism for CPU-bound tasks in CPython.
3. Can you give an example of threading?
import threading
import time
def download_file(file_name):
print(f"Starting download: {file_name}")
time.sleep(2) # Simulate I/O delay
print(f"Finished download: {file_name}")
# Create threads
threads = [
threading.Thread(target=download_file, args=(f"file_{i}.txt",))
for i in range(3)
]
# Start and join threads
start_time = time.time()
for thread in threads:
thread.start()
for thread in threads:
thread.join()
print(f"Total time: {time.time() - start_time:.2f} seconds")
Note: Threads run concurrently, reducing total time for I/O-bound tasks. sleep simulates I/O delay; threads overlap execution. join() ensures main program waits for threads to finish.
4. What is multiprocessing in Python?
Q: What is multiprocessing?
Multiprocessing: Running multiple processes, each with its own memory space and Python interpreter, bypassing the GIL.
Module: multiprocessing.
Key Classes/Functions: multiprocessing.Process, multiprocessing.Pool, multiprocessing.Queue (for communication).
Use Case: CPU-bound tasks (e.g., mathematical computations, data processing).
Limitation: Higher memory usage due to separate processes.
5. Can you give an example of multiprocessing?
import multiprocessing
import time
def compute_square(num):
print(f"Computing square of {num}")
return num * num
if __name__ == "__main__":
numbers = [1, 2, 3, 4]
# Using Pool for parallelism
start_time = time.time()
with multiprocessing.Pool(processes=2) as pool:
results = pool.map(compute_square, numbers)
print(f"Squares: {results}")
print(f"Total time: {time.time() - start_time:.2f} seconds")
Note: Pool distributes tasks across processes, enabling parallelism. if __name__ == "__main__": prevents recursive imports. Faster for CPU-bound tasks compared to threading.
6. What is AsyncIO in Python?
Q: What is AsyncIO?
AsyncIO: A framework for asynchronous programming using coroutines, event loops, and the async/await syntax.
Module: asyncio.
Key Concepts:
async def: Defines an asynchronous function (coroutine).await: Pauses coroutine execution until a task completes.asyncio.run(): Runs the main coroutine.asyncio.gather(): Runs multiple coroutines concurrently.
Use Case: I/O-bound tasks (e.g., HTTP requests, async file operations).
Limitation: Requires async-compatible libraries (e.g., aiohttp).
7. Can you give an example of AsyncIO basics?
import asyncio
import time
async def fetch_data(name):
print(f"Starting fetch: {name}")
await asyncio.sleep(2) # Simulate async I/O
print(f"Finished fetch: {name}")
return f"Data from {name}"
async def main():
# Run coroutines concurrently
results = await asyncio.gather(
fetch_data("source1"),
fetch_data("source2")
)
return results
if __name__ == "__main__":
start_time = time.time()
results = asyncio.run(main())
print(f"Results: {results}")
print(f"Total time: {time.time() - start_time:.2f} seconds")
Note: asyncio.sleep simulates async I/O. asyncio.gather runs coroutines concurrently, reducing total time. asyncio.run manages the event loop.
8. Comprehensive example + Best Practices
Q: When to choose threading, multiprocessing, or AsyncIO?
- Threading: Simple I/O-bound tasks, shared memory needed.
- Multiprocessing: CPU-bound tasks requiring true parallelism.
- AsyncIO: High-concurrency I/O (thousands of connections), modern async libraries.
Q: Best practices?
- Profile your code to identify I/O vs CPU bound.
- Use
if __name__ == "__main__":in multiprocessing scripts. - Avoid shared state race conditions with locks/queues.
- Prefer AsyncIO over threading for new I/O-heavy projects.
- Handle exceptions and cleanup properly.
Comprehensive Example:
import threading
import multiprocessing
import asyncio
import time
# Threading: I/O-bound task
def download_task(url):
print(f"Thread downloading {url}")
time.sleep(2) # Simulate I/O
return f"Downloaded {url}"
# Multiprocessing: CPU-bound task
def compute_task(num):
print(f"Process computing {num}")
return sum(i * i for i in range(num))
# AsyncIO: Async I/O-bound task
async def async_fetch(name):
print(f"Async fetching {name}")
await asyncio.sleep(2) # Simulate async I/O
return f"Fetched {name}"
def run_threading(urls):
threads = [threading.Thread(target=download_task, args=(url,)) for url in urls]
start_time = time.time()
for thread in threads:
thread.start()
for thread in threads:
thread.join()
return time.time() - start_time
def run_multiprocessing(numbers):
start_time = time.time()
with multiprocessing.Pool(processes=2) as pool:
results = pool.map(compute_task, numbers)
return results, time.time() - start_time
async def run_asyncio(names):
start_time = time.time()
results = await asyncio.gather(*(async_fetch(name) for name in names))
return results, time.time() - start_time
if __name__ == "__main__":
try:
# Threading
urls = ["url1", "url2", "url3"]
thread_time = run_threading(urls)
print(f"Threading time: {thread_time:.2f} seconds\n")
# Multiprocessing
numbers = [1000, 2000, 3000]
mp_results, mp_time = run_multiprocessing(numbers)
print(f"Multiprocessing results: {mp_results}")
print(f"Multiprocessing time: {mp_time:.2f} seconds\n")
# AsyncIO
names = ["source1", "source2"]
async_results, async_time = asyncio.run(run_asyncio(names))
print(f"AsyncIO results: {async_results}")
print(f"AsyncIO time: {async_time:.2f} seconds")
except Exception as e:
print(f"Error: {e}")