Advanced DSA Topics in C: Persistent Data Structures, Suffix Automaton & Heavy-Light Decomposition
1. What is a Persistent Data Structure?
A Persistent Data Structure preserves previous versions of itself after modifications, allowing access to any historical version without fully copying the structure. It achieves this through structural sharing, where unchanged parts are reused across versions. Persistence is useful in functional programming, version control systems, and concurrent environments.
2. What are the types of persistence?
- Partial Persistence: Allows updates to the latest version and queries on any version.
- Full Persistence: Allows updates and queries on any version, creating branching histories.
- Confluent Persistence: Allows merging of versions.
3. How do Persistent Data Structures work?
They use node-copying or path-copying: When modifying a node, copy it and update pointers, reusing unchanged subtrees. This ensures immutability and version history with O(log n) time/space overhead for balanced structures like trees.
4. What are common Persistent Data Structures?
- Persistent Arrays (using fat nodes or path copying).
- Persistent Linked Lists (copy nodes on modification).
- Persistent Binary Search Trees (e.g., persistent AVL or Red-Black trees).
5. Can you give an example of a Persistent Linked List in C?
Example: A simple persistent singly linked list where insertion creates a new version without modifying the old one.
#include <stdio.h>
#include <stdlib.h>
struct Node {
int data;
struct Node* next;
};
struct Node* createNode(int data) {
struct Node* newNode = (struct Node*)malloc(sizeof(struct Node));
if (!newNode) {
printf("Memory allocation failed\n");
exit(1);
}
newNode->data = data;
newNode->next = NULL;
return newNode;
}
// Insert at front, returning a new version (persistent)
struct Node* insertPersistent(struct Node* head, int data) {
struct Node* newHead = createNode(data);
newHead->next = head; // Share the rest of the list
return newHead; // Old head remains unchanged
}
void printList(struct Node* head) {
if (!head) {
printf("(empty)\n");
return;
}
while (head) {
printf("%d ", head->data);
head = head->next;
}
printf("\n");
}
void freeList(struct Node* head) {
while (head) {
struct Node* temp = head;
head = head->next;
free(temp);
}
}
int main() {
struct Node* version1 = NULL;
version1 = insertPersistent(version1, 3); // Version 1: 3
version1 = insertPersistent(version1, 2); // Version 2: 2 -> 3
version1 = insertPersistent(version1, 1); // Version 3: 1 -> 2 -> 3
printf("Version 3: ");
printList(version1); // Output: 1 2 3
// Create a new version from an old one
struct Node* version4 = insertPersistent(version1->next, 4); // From "2 -> 3", insert 4: 4 -> 2 -> 3
printf("Version 4: ");
printList(version4); // Output: 4 2 3
printf("Version 3 (unchanged): ");
printList(version1); // Output: 1 2 3
freeList(version1);
freeList(version4);
return 0;
}
6. What are the time and space complexities of Persistent Data Structures?
- Time Complexity:
O(log n)per operation for balanced trees (e.g., persistent BST). - Space Complexity:
O(log n)per update (path copying), totalO(n + m log n)for m versions.
7. What are the advantages and disadvantages of Persistent Data Structures?
- Advantages: Immutable, supports versioning, thread-safe for read-only operations.
- Disadvantages: Higher memory usage due to copying, more complex implementation.
8. What are the applications of Persistent Data Structures?
- Version control systems (e.g., Git-like branching).
- Functional programming (e.g., immutable lists).
- Undo/redo mechanisms in editors.
- Concurrent programming (safe sharing without locks).
9. What is a Suffix Automaton?
A Suffix Automaton is a finite automaton that accepts all suffixes of a given string and represents them in a compact way. It’s the smallest automaton recognizing all suffixes, useful for string matching and processing.
10. How does a Suffix Automaton work?
- Built incrementally by adding characters to the string.
- Each state represents a set of substrings (endpositions).
- Transitions represent character extensions.
- Suffix links connect states to their longest proper suffix states, enabling compression.
11. What are the key components of a Suffix Automaton?
- States: Represent equivalence classes of substrings.
- Transitions: From state s on character c to a new or existing state.
- Suffix Links: Link to the state representing the longest proper suffix.
- Terminal States: States corresponding to suffixes of the string.
12. Can you give an example of building a Suffix Automaton in C?
Example: Simplified Suffix Automaton construction for a string (full implementation is complex; this shows basic structure).
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define ALPHABET_SIZE 26
#define MAXN 1000
struct SAMNode {
int len; // Length of the longest string in this state
int link; // Suffix link
int next[ALPHABET_SIZE]; // Transitions
};
struct SuffixAutomaton {
struct SAMNode* nodes;
int size, last;
};
void initSAM(struct SuffixAutomaton* sam, int maxn) {
sam->nodes = (struct SAMNode*)malloc(maxn * 2 * sizeof(struct SAMNode));
if (!sam->nodes) {
printf("Memory allocation failed\n");
exit(1);
}
sam->size = sam->last = 1; // Root state 0, initial state 1
memset(sam->nodes, 0, maxn * 2 * sizeof(struct SAMNode));
sam->nodes[0].len = -1; // Root
for (int i = 0; i < ALPHABET_SIZE; i++) sam->nodes[0].next[i] = 1;
}
int addChar(struct SuffixAutomaton* sam, char c) {
int cur = sam->size++, p = sam->last;
sam->nodes[cur].len = sam->nodes[p].len + 1;
int idx = c - 'a';
while (p && !sam->nodes[p].next[idx]) {
sam->nodes[p].next[idx] = cur;
p = sam->nodes[p].link;
}
if (!p) {
sam->nodes[cur].link = 1;
} else {
int q = sam->nodes[p].next[idx];
if (sam->nodes[p].len + 1 == sam->nodes[q].len) {
sam->nodes[cur].link = q;
} else {
int clone = sam->size++;
sam->nodes[clone] = sam->nodes[q];
sam->nodes[clone].len = sam->nodes[p].len + 1;
sam->nodes[cur].link = sam->nodes[q].link = clone;
while (p && sam->nodes[p].next[idx] == q) {
sam->nodes[p].next[idx] = clone;
p = sam->nodes[p].link;
}
}
}
sam->last = cur;
return cur;
}
void buildSAM(struct SuffixAutomaton* sam, char* str) {
initSAM(sam, strlen(str));
for (int i = 0; str[i]; i++) {
if (str[i] < 'a' || str[i] > 'z') {
printf("Invalid character\n");
exit(1);
}
addChar(sam, str[i]);
}
}
void freeSAM(struct SuffixAutomaton* sam) {
free(sam->nodes);
}
int main() {
char str[] = "abc";
struct SuffixAutomaton sam;
buildSAM(&sam, str);
printf("Suffix Automaton built with %d states\n", sam.size); // Output: Suffix Automaton built with 4 states
freeSAM(&sam);
return 0;
}
Note: This is a simplified implementation; full Suffix Automaton requires additional logic for queries.
13. What are the time and space complexities of Suffix Automaton?
- Construction Time:
O(n)(linear in string length). - Space Complexity:
O(n * alphabet_size)(states and transitions). - Query Time:
O(m)for pattern matching (m = pattern length).
14. What are the advantages and disadvantages of Suffix Automaton?
- Advantages: Compact representation of all suffixes, efficient for multiple pattern matching or substring queries.
- Disadvantages: Complex to implement, higher constant factors than simpler string algorithms like KMP.
15. What are the applications of Suffix Automaton?
- String matching and searching (faster than Naive for multiple patterns).
- Longest common substring.
- Palindrome counting.
- Suffix-based compression.
16. What is Heavy-Light Decomposition?
Heavy-Light Decomposition (HLD) is a technique to decompose a tree into disjoint paths (chains), allowing efficient queries (e.g., path sums, updates) by mapping paths to segments that can be queried with data structures like Segment Trees or Fenwick Trees. It’s useful for tree-based problems with path queries.
17. How does Heavy-Light Decomposition work?
- DFS Size Calculation: Compute subtree sizes.
- Heavy Child Selection: For each node, mark the child with the largest subtree as "heavy"; others are "light."
- Decompose into Chains: Chain heavy edges together; light edges start new chains.
- Assign Indices: Assign chain indices and positions for segment tree mapping.
- Query/Update: Break paths into
O(log n)segments and query the underlying data structure.
18. What are the key components of Heavy-Light Decomposition?
- Chains: Sequences of heavy edges.
- Chain Head: Starting node of each chain.
- Position Array: Maps nodes to positions in a 1D array for segment queries.
- Underlying DS: Segment Tree or BIT for range operations on chains.
19. Can you give a simplified example of Heavy-Light Decomposition in C?
Example: Basic HLD setup for a tree (full implementation is complex; this shows DFS and decomposition).
#include <stdio.h>
#include <stdlib.h>
#define V 5
struct Node {
int vertex;
struct Node* next;
};
struct Graph {
struct Node* adjList[V];
};
void initGraph(struct Graph* g) {
for (int i = 0; i < V; i++) {
g->adjList[i] = NULL;
}
}
void addEdge(struct Graph* g, int src, int dest) {
struct Node* newNode = (struct Node*)malloc(sizeof(struct Node));
if (!newNode) {
printf("Memory allocation failed\n");
exit(1);
}
newNode->vertex = dest;
newNode->next = g->adjList[src];
g->adjList[src] = newNode;
// Add reverse edge for undirected tree
newNode = (struct Node*)malloc(sizeof(struct Node));
if (!newNode) {
printf("Memory allocation failed\n");
exit(1);
}
newNode->vertex = src;
newNode->next = g->adjList[dest];
g->adjList[dest] = newNode;
}
void freeGraph(struct Graph* g) {
for (int i = 0; i < V; i++) {
struct Node* temp = g->adjList[i];
while (temp) {
struct Node* next = temp->next;
free(temp);
temp = next;
}
}
}
int size[V], parent[V], heavy[V], chainHead[V], chainIndex[V], pos[V], index = 0;
void dfsSize(struct Graph* g, int u, int p) {
size[u] = 1;
parent[u] = p;
heavy[u] = -1;
struct Node* temp = g->adjList[u];
while (temp) {
int v = temp->vertex;
if (v != p) {
dfsSize(g, v, u);
size[u] += size[v];
if (heavy[u] == -1 || size[v] > size[heavy[u]]) heavy[u] = v;
}
temp = temp->next;
}
}
void hldDecompose(struct Graph* g, int u, int p, int head, int chain) {
chainHead[u] = head;
chainIndex[u] = chain;
pos[u] = index++;
if (heavy[u] != -1) hldDecompose(g, heavy[u], u, head, chain); // Heavy child in same chain
struct Node* temp = g->adjList[u];
while (temp) {
int v = temp->vertex;
if (v != p && v != heavy[u]) {
hldDecompose(g, v, u, v, chain + 1); // Light child starts new chain
}
temp = temp->next;
}
}
int main() {
struct Graph g;
initGraph(&g);
addEdge(&g, 0, 1);
addEdge(&g, 0, 2);
addEdge(&g, 1, 3);
addEdge(&g, 1, 4);
dfsSize(&g, 0, -1);
hldDecompose(&g, 0, -1, 0, 0);
for (int i = 0; i < V; i++) {
printf("Node %d: Chain %d, Pos %d\n", i, chainIndex[i], pos[i]);
}
freeGraph(&g);
return 0;
}
Note: This is a simplified implementation; full HLD requires a Segment Tree or Fenwick Tree for queries.
20. What are the time and space complexities of Heavy-Light Decomposition?
- Preprocessing Time:
O(n)(DFS for sizes and decomposition). - Query/Update Time:
O(log² n)(break path intoO(log n)segments, each queried inO(log n)). - Space Complexity:
O(n)(arrays for chain heads, positions, etc.).
21. What are the advantages and disadvantages of Heavy-Light Decomposition?
- Advantages: Enables efficient path queries on trees (e.g., sum, max on path), combines with other DS like Segment Trees.
- Disadvantages: Complex to implement, requires careful handling of chains and positions.
22. What are the applications of Heavy-Light Decomposition?
- Tree path queries/updates (e.g., sum, max on path).
- Heavy-light path problems in competitive programming.
- Network optimization with tree structures.