Python Concurrency & Parallelism: Threading, Multiprocessing & AsyncIO

1. What are concurrency and parallelism in Python?

Q: Concurrency vs Parallelism?

Concurrency: Managing multiple tasks by interleaving their execution, not necessarily simultaneously. Suitable for I/O-bound tasks (e.g., network requests).

Parallelism: Executing multiple tasks simultaneously, typically on multiple CPU cores. Suitable for CPU-bound tasks (e.g., computations).

Python Context:

Use Case: Threading/AsyncIO for network I/O; multiprocessing for heavy computations.

2. What is threading in Python?

Q: What is threading?

Threading: Running multiple threads within a single process, sharing memory.

Module: threading.

Key Classes/Functions: threading.Thread, threading.Lock (for synchronization).

Use Case: I/O-bound tasks (e.g., downloading files, database queries).

Limitation: GIL prevents true parallelism for CPU-bound tasks in CPython.

3. Can you give an example of threading?

import threading
import time

def download_file(file_name):
    print(f"Starting download: {file_name}")
    time.sleep(2)  # Simulate I/O delay
    print(f"Finished download: {file_name}")

# Create threads
threads = [
    threading.Thread(target=download_file, args=(f"file_{i}.txt",))
    for i in range(3)
]

# Start and join threads
start_time = time.time()
for thread in threads:
    thread.start()
for thread in threads:
    thread.join()

print(f"Total time: {time.time() - start_time:.2f} seconds")
Starting download: file_0.txt Starting download: file_1.txt Starting download: file_2.txt Finished download: file_0.txt Finished download: file_1.txt Finished download: file_2.txt Total time: 2.02 seconds

Note: Threads run concurrently, reducing total time for I/O-bound tasks. sleep simulates I/O delay; threads overlap execution. join() ensures main program waits for threads to finish.

4. What is multiprocessing in Python?

Q: What is multiprocessing?

Multiprocessing: Running multiple processes, each with its own memory space and Python interpreter, bypassing the GIL.

Module: multiprocessing.

Key Classes/Functions: multiprocessing.Process, multiprocessing.Pool, multiprocessing.Queue (for communication).

Use Case: CPU-bound tasks (e.g., mathematical computations, data processing).

Limitation: Higher memory usage due to separate processes.

5. Can you give an example of multiprocessing?

import multiprocessing
import time

def compute_square(num):
    print(f"Computing square of {num}")
    return num * num

if __name__ == "__main__":
    numbers = [1, 2, 3, 4]
    
    # Using Pool for parallelism
    start_time = time.time()
    with multiprocessing.Pool(processes=2) as pool:
        results = pool.map(compute_square, numbers)
    
    print(f"Squares: {results}")
    print(f"Total time: {time.time() - start_time:.2f} seconds")
Computing square of 1 Computing square of 2 Computing square of 3 Computing square of 4 Squares: [1, 4, 9, 16] Total time: 0.05 seconds

Note: Pool distributes tasks across processes, enabling parallelism. if __name__ == "__main__": prevents recursive imports. Faster for CPU-bound tasks compared to threading.

6. What is AsyncIO in Python?

Q: What is AsyncIO?

AsyncIO: A framework for asynchronous programming using coroutines, event loops, and the async/await syntax.

Module: asyncio.

Key Concepts:

Use Case: I/O-bound tasks (e.g., HTTP requests, async file operations).

Limitation: Requires async-compatible libraries (e.g., aiohttp).

7. Can you give an example of AsyncIO basics?

import asyncio
import time

async def fetch_data(name):
    print(f"Starting fetch: {name}")
    await asyncio.sleep(2)  # Simulate async I/O
    print(f"Finished fetch: {name}")
    return f"Data from {name}"

async def main():
    # Run coroutines concurrently
    results = await asyncio.gather(
        fetch_data("source1"),
        fetch_data("source2")
    )
    return results

if __name__ == "__main__":
    start_time = time.time()
    results = asyncio.run(main())
    print(f"Results: {results}")
    print(f"Total time: {time.time() - start_time:.2f} seconds")
Starting fetch: source1 Starting fetch: source2 Finished fetch: source1 Finished fetch: source2 Results: ['Data from source1', 'Data from source2'] Total time: 2.01 seconds

Note: asyncio.sleep simulates async I/O. asyncio.gather runs coroutines concurrently, reducing total time. asyncio.run manages the event loop.

8. Comprehensive example + Best Practices

Q: When to choose threading, multiprocessing, or AsyncIO?

Q: Best practices?

Comprehensive Example:

import threading
import multiprocessing
import asyncio
import time

# Threading: I/O-bound task
def download_task(url):
    print(f"Thread downloading {url}")
    time.sleep(2)  # Simulate I/O
    return f"Downloaded {url}"

# Multiprocessing: CPU-bound task
def compute_task(num):
    print(f"Process computing {num}")
    return sum(i * i for i in range(num))

# AsyncIO: Async I/O-bound task
async def async_fetch(name):
    print(f"Async fetching {name}")
    await asyncio.sleep(2)  # Simulate async I/O
    return f"Fetched {name}"

def run_threading(urls):
    threads = [threading.Thread(target=download_task, args=(url,)) for url in urls]
    start_time = time.time()
    for thread in threads:
        thread.start()
    for thread in threads:
        thread.join()
    return time.time() - start_time

def run_multiprocessing(numbers):
    start_time = time.time()
    with multiprocessing.Pool(processes=2) as pool:
        results = pool.map(compute_task, numbers)
    return results, time.time() - start_time

async def run_asyncio(names):
    start_time = time.time()
    results = await asyncio.gather(*(async_fetch(name) for name in names))
    return results, time.time() - start_time

if __name__ == "__main__":
    try:
        # Threading
        urls = ["url1", "url2", "url3"]
        thread_time = run_threading(urls)
        print(f"Threading time: {thread_time:.2f} seconds\n")
        
        # Multiprocessing
        numbers = [1000, 2000, 3000]
        mp_results, mp_time = run_multiprocessing(numbers)
        print(f"Multiprocessing results: {mp_results}")
        print(f"Multiprocessing time: {mp_time:.2f} seconds\n")
        
        # AsyncIO
        names = ["source1", "source2"]
        async_results, async_time = asyncio.run(run_asyncio(names))
        print(f"AsyncIO results: {async_results}")
        print(f"AsyncIO time: {async_time:.2f} seconds")
    except Exception as e:
        print(f"Error: {e}")
Thread downloading url1 Thread downloading url2 Thread downloading url3 Threading time: 2.02 seconds Process computing 1000 Process computing 2000 Process computing 3000 Multiprocessing results: [332833500, 2667334000, 8999550000] Multiprocessing time: 0.06 seconds Async fetching source1 Async fetching source2 AsyncIO results: ['Fetched source1', 'Fetched source2'] AsyncIO time: 2.01 seconds