Concurrency & Parallel Algorithms in C: POSIX Threads, Parallel Sum & Concurrent Data Structures
1. What is Parallelism in computing?
Parallelism is the process of executing multiple tasks or computations simultaneously, leveraging multiple processors or cores to improve performance. It contrasts with sequential execution, where tasks run one after another.
2. What are the types of parallelism?
- Data Parallelism: Divide data into chunks and process each chunk independently (e.g., parallel matrix addition).
- Task Parallelism: Divide tasks into independent units that run concurrently (e.g., running different algorithms simultaneously).
- Pipeline Parallelism: Break a task into stages, with each stage processed concurrently (e.g., streaming data processing).
3. What is Concurrency vs. Parallelism?
- Concurrency: Managing multiple tasks that may not execute simultaneously but are interleaved (e.g., thread scheduling on a single core).
- Parallelism: Executing multiple tasks simultaneously on multiple cores/processors.
- Key Difference: Concurrency enables parallelism but can exist without it (e.g., multitasking on a single CPU).
4. What are common parallel programming models in C?
- POSIX Threads (pthreads): Low-level thread management for concurrency.
- OpenMP: High-level directives for parallelism (requires compiler support, e.g., gcc).
- MPI (Message Passing Interface): For distributed systems (less common for beginners).
- Fork/Join: Divide tasks and merge results (can be implemented with pthreads).
5. Can you give an example of a parallel algorithm using pthreads in C?
Example: Parallel sum of an array using pthreads, dividing the array into chunks.
#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
#define N 1000
#define THREADS 4
int arr[N], partialSums[THREADS];
typedef struct { int start, end; } Range;
void* sumRange(void* arg) {
Range* range = (Range*)arg;
if (range->start >= range->end || range->start < 0 || range->end > N) {
printf("Invalid range\n");
return NULL;
}
int sum = 0;
for (int i = range->start; i < range->end; i++) {
sum += arr[i];
}
partialSums[range->start / (N / THREADS)] = sum;
return NULL;
}
int main() {
for (int i = 0; i < N; i++) arr[i] = 1; // Initialize array with 1s
pthread_t threads[THREADS];
Range ranges[THREADS];
// Divide array into THREADS parts
for (int i = 0; i < THREADS; i++) {
ranges[i].start = i * (N / THREADS);
ranges[i].end = (i + 1) * (N / THREADS);
if (pthread_create(&threads[i], NULL, sumRange, &ranges[i])) {
printf("Thread creation failed\n");
return 1;
}
}
// Wait for threads to finish
for (int i = 0; i < THREADS; i++) {
pthread_join(threads[i], NULL);
}
// Combine partial sums
int totalSum = 0;
for (int i = 0; i < THREADS; i++) {
totalSum += partialSums[i];
}
printf("Total Sum: %d\n", totalSum); // Output: Total Sum: 1000
return 0;
}
Note: Compile with -pthread (e.g., gcc -pthread file.c).
6. What are the time and space complexities of parallel algorithms?
- Time Complexity: Ideally reduced by a factor of P (number of processors), but overhead (e.g., thread creation, synchronization) may limit speedup (Amdahl’s Law). Example: Parallel sum above:
O(n/P)work per thread, plusO(P)overhead. - Space Complexity: Same as sequential (
O(n)), but additional space for thread stacks or synchronization primitives (O(P)).
7. What are the advantages and disadvantages of parallelism?
- Advantages: Faster execution on multicore systems, scalability for large data.
- Disadvantages: Synchronization overhead, complex debugging, potential for race conditions or deadlocks.
8. What are key challenges in parallel programming?
- Race Conditions: Multiple threads accessing shared data inconsistently.
- Deadlocks: Threads waiting indefinitely for resources.
- Load Balancing: Ensuring even work distribution among threads.
- Scalability: Diminishing returns with more processors (Amdahl’s Law).
9. What are Concurrent Data Structures?
Concurrent data structures are designed to handle simultaneous access by multiple threads safely and efficiently, minimizing contention and ensuring thread-safety. Examples include concurrent queues, stacks, hash tables, and linked lists.
10. Why are Concurrent Data Structures important?
They enable safe data sharing in multithreaded programs, preventing race conditions and ensuring correctness without excessive locking, which can degrade performance.
11. What are common techniques for concurrent data structures?
- Locks (Mutexes): Ensure exclusive access to shared data.
- Lock-Free Structures: Use atomic operations (e.g., Compare-and-Swap) to avoid locks.
- Read-Write Locks: Allow multiple readers or one writer for better concurrency.
- Fine-Grained Locking: Lock specific parts of a data structure (e.g., per-node in a list).
- Atomic Operations: Hardware-supported operations (e.g.,
__sync_fetch_and_addin GCC).
12. Can you give an example of a concurrent data structure (e.g., Lock-Based Queue) in C?
Example: Thread-safe queue using a mutex for concurrent access.
#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
#define SIZE 100
struct Queue {
int items[SIZE];
int front, rear;
pthread_mutex_t mutex;
};
void initQueue(struct Queue* q) {
q->front = 0;
q->rear = -1;
if (pthread_mutex_init(&q->mutex, NULL)) {
printf("Mutex initialization failed\n");
exit(1);
}
}
void enqueue(struct Queue* q, int value) {
pthread_mutex_lock(&q->mutex);
if (q->rear < SIZE - 1) {
q->items[++q->rear] = value;
printf("Enqueued %d\n", value);
} else {
printf("Queue full\n");
}
pthread_mutex_unlock(&q->mutex);
}
int dequeue(struct Queue* q) {
pthread_mutex_lock(&q->mutex);
if (q->front <= q->rear) {
int value = q->items[q->front++];
pthread_mutex_unlock(&q->mutex);
return value;
}
pthread_mutex_unlock(&q->mutex);
return -1; // Empty
}
void* producer(void* arg) {
struct Queue* q = (struct Queue*)arg;
for (int i = 1; i <= 5; i++) {
enqueue(q, i);
}
return NULL;
}
void* consumer(void* arg) {
struct Queue* q = (struct Queue*)arg;
for (int i = 0; i < 5; i++) {
int value = dequeue(q);
if (value != -1) printf("Dequeued %d\n", value);
}
return NULL;
}
int main() {
struct Queue q;
initQueue(&q);
pthread_t prod, cons;
if (pthread_create(&prod, NULL, producer, &q) || pthread_create(&cons, NULL, consumer, &q)) {
printf("Thread creation failed\n");
return 1;
}
pthread_join(prod, NULL);
pthread_join(cons, NULL);
pthread_mutex_destroy(&q.mutex);
return 0;
}
Note: Compile with -pthread (e.g., gcc -pthread file.c). Output may interleave due to concurrent execution.
13. What are the challenges with concurrent data structures?
- Contention: Multiple threads competing for locks reduces performance.
- Deadlocks: Improper lock ordering can cause threads to stall.
- Scalability: Coarse-grained locks limit parallelism; fine-grained or lock-free designs are complex.
- Correctness: Ensuring atomicity and consistency across threads.
14. What are examples of concurrent data structures?
- Concurrent Queue: Thread-safe FIFO (e.g., above example).
- Concurrent Stack: Thread-safe LIFO using mutexes or atomic operations.
- Concurrent Hash Table: Uses per-bucket locks or lock-free techniques.
- Concurrent Linked List: Fine-grained locking per node or lock-free updates.
15. What are the performance considerations for concurrent data structures?
- Lock-Based: Simple but may suffer from contention (
O(1)operations with low contention). - Lock-Free: Higher throughput but complex to implement, relies on atomic operations (e.g., CAS).
- Scalability: Depends on workload and number of threads; lock-free scales better for high contention.
16. Can you give an example of a lock-free concurrent operation in C?
Example: Atomic counter using __sync_fetch_and_add (GCC extension).
#include <stdio.h>
#include <pthread.h>
volatile int counter = 0;
void* increment(void* arg) {
for (int i = 0; i < 1000; i++) {
__sync_fetch_and_add(&counter, 1); // Atomic increment
}
return NULL;
}
int main() {
pthread_t threads[4];
for (int i = 0; i < 4; i++) {
if (pthread_create(&threads[i], NULL, increment, NULL)) {
printf("Thread creation failed\n");
return 1;
}
}
for (int i = 0; i < 4; i++) {
pthread_join(threads[i], NULL);
}
printf("Counter: %d\n", counter); // Output: Counter: 4000 (4 threads * 1000 increments)
return 0;
}
Note: Compile with -pthread. __sync_fetch_and_add is a GCC extension; use stdatomic.h (C11) for portability (e.g., atomic_fetch_add).