Advanced SQL Concepts: Window Functions, CTEs, Recursive Queries, PIVOT/UNPIVOT & Comprehensive Examples
Table of Contents
- 1. What are window functions in SQL?
- 2. Example of window functions
- 3. What are CTEs in SQL?
- 4. Example of a CTE
- 5. What are recursive queries in SQL?
- 6. Example of a recursive query
- 7. What are PIVOT and UNPIVOT in SQL?
- 8. Example of PIVOT and UNPIVOT
- 9. Comprehensive example combining all concepts
- 10. Common mistakes
- 11. Best practices
1. What are window functions in SQL?
Window functions perform calculations across a set of rows (a "window") defined by a PARTITION BY and/or ORDER BY clause, without collapsing rows like aggregate functions. They return a value for each row, preserving the row structure.
Types:
- Aggregate:
SUM,AVG,COUNT, etc., over a window. - Ranking:
ROW_NUMBER,RANK,DENSE_RANK,NTILE. - Value:
LAG,LEAD,FIRST_VALUE,LAST_VALUE. - Analytic:
CUME_DIST,PERCENT_RANK.
Syntax:
SELECT column, window_function() OVER (PARTITION BY column1 ORDER BY column2) FROM table_name;Use Case: Ranking employees by salary within departments, calculating running totals, or accessing previous/next row values.
2. Can you give an example of window functions?
-- Create sample table
CREATE TABLE Employees ( EmployeeID INT PRIMARY KEY, FirstName VARCHAR(50), DepartmentID INT, Salary DECIMAL(10, 2)
); -- Insert sample data
INSERT INTO Employees (EmployeeID, FirstName, DepartmentID, Salary)
VALUES (1, 'John', 1, 60000.00), (2, 'Jane', 2, 55000.00), (3, 'kristal', 1, 65000.00), (4, 'Bob', 2, 58000.00), (5, 'hari', 1, 62000.00); -- Window functions: Ranking and running total
SELECT FirstName, DepartmentID, Salary, ROW_NUMBER() OVER (PARTITION BY DepartmentID ORDER BY Salary DESC) AS SalaryRank, SUM(Salary) OVER (PARTITION BY DepartmentID ORDER BY Salary) AS RunningTotal, LAG(Salary, 1) OVER (PARTITION BY DepartmentID ORDER BY Salary) AS PreviousSalary
FROM Employees; Output:
FirstName | DepartmentID | Salary | SalaryRank | RunningTotal | PreviousSalary
----------|--------------|----------|------------|--------------|---------------
kristal | 1 | 65000.00 | 1 | 65000.00 | NULL
hari | 1 | 62000.00 | 2 | 127000.00 | 65000.00
John | 1 | 60000.00 | 3 | 187000.00 | 62000.00
Bob | 2 | 58000.00 | 1 | 58000.00 | NULL
Jane | 2 | 55000.00 | 2 | 113000.00 | 58000.00 Note:
ROW_NUMBER()assigns ranks within each department.SUM(... ORDER BY Salary)calculates a running total within departments.LAG()retrieves the previous salary in the same department.
3. What are CTEs in SQL?
A Common Table Expression (CTE) is a temporary result set defined within a WITH clause, which can be referenced multiple times in the main query. CTEs improve readability and maintainability compared to subqueries.
Syntax:
WITH cte_name AS ( SELECT column1 FROM table_name WHERE condition
)
SELECT * FROM cte_name WHERE condition;Use Case: Simplifying complex queries, breaking down logic, or preparing data for joins.
4. Can you give an example of a CTE?
-- CTE: Calculate average salary per department and join with employees
WITH DeptAvg AS ( SELECT DepartmentID, AVG(Salary) AS AvgSalary FROM Employees GROUP BY DepartmentID
)
SELECT e.FirstName, e.Salary, d.AvgSalary, e.Salary - d.AvgSalary AS SalaryDiff
FROM Employees e
JOIN DeptAvg d ON e.DepartmentID = d.DepartmentID
WHERE e.Salary > d.AvgSalary; Output:
FirstName | Salary | AvgSalary | SalaryDiff
----------|----------|-----------|------------
kristal | 65000.00 | 62333.33 | 2666.67
hari | 62000.00 | 62333.33 | -333.33
Bob | 58000.00 | 56500.00 | 1500.00 Note: The CTE DeptAvg computes department averages, which are joined with Employees to find employees with above-average salaries.
5. What are recursive queries in SQL?
Recursive queries, typically implemented using recursive CTEs, allow a query to reference itself to process hierarchical or recursive data (e.g., organizational charts, tree structures).
Structure:
- Anchor Query: The base case (starting rows).
- Recursive Query: References the CTE to build additional rows.
Syntax:
WITH RECURSIVE cte_name AS ( -- Anchor query SELECT column1 FROM table_name WHERE condition UNION -- Recursive query SELECT column1 FROM table_name t JOIN cte_name c ON t.column = c.column
)
SELECT * FROM cte_name;Support: Available in PostgreSQL, SQL Server, Oracle; MySQL 8.0+; SQLite with limitations.
6. Can you give an example of a recursive query?
-- Create table for organizational hierarchy
CREATE TABLE Employees ( EmployeeID INT PRIMARY KEY, FirstName VARCHAR(50), ManagerID INT, Salary DECIMAL(10, 2)
); -- Insert sample data (hierarchy: kristal -> John -> hari)
INSERT INTO Employees (EmployeeID, FirstName, ManagerID, Salary)
VALUES (1, 'kristal', NULL, 65000.00), (2, 'John', 1, 60000.00), (3, 'hari', 2, 55000.00), (4, 'Jane', NULL, 58000.00); -- Recursive CTE: Traverse employee hierarchy
WITH RECURSIVE OrgChart AS ( -- Anchor: Top-level employees (no manager) SELECT EmployeeID, FirstName, ManagerID, 1 AS Level FROM Employees WHERE ManagerID IS NULL UNION ALL -- Recursive: Employees reporting to those in OrgChart SELECT e.EmployeeID, e.FirstName, e.ManagerID, o.Level + 1 FROM Employees e JOIN OrgChart o ON e.ManagerID = o.EmployeeID
)
SELECT o.FirstName, o.Level, (SELECT FirstName FROM Employees m WHERE m.EmployeeID = o.ManagerID) AS ManagerName
FROM OrgChart o
ORDER BY o.Level, o.FirstName; Output:
FirstName | Level | ManagerName
----------|-------|------------
kristal | 1 | NULL
Jane | 1 | NULL
John | 2 | kristal
hari | 3 | John Note: The recursive CTE builds the hierarchy, starting with top-level employees (NULL ManagerID) and recursively adding subordinates.
7. What are PIVOT and UNPIVOT in SQL?
PIVOT: Transforms rows into columns, creating a crosstab or summary table (e.g., turning department salaries into columns by department).
UNPIVOT: Transforms columns into rows, reversing a pivot operation.
Syntax:
-- PIVOT
SELECT * FROM table_name
PIVOT ( aggregate_function(column) FOR pivot_column IN (value1, value2)
) AS pivot_table;
-- UNPIVOT
SELECT * FROM table_name
UNPIVOT ( value_column FOR name_column IN (column1, column2)
) AS unpivot_table;Support: Native in SQL Server, Oracle; MySQL/PostgreSQL use conditional aggregation; SQLite lacks native support.
8. Can you give an example of PIVOT and UNPIVOT?
-- Create sample table
CREATE TABLE Sales ( SaleID INT PRIMARY KEY, DepartmentID INT, Year INT, Revenue DECIMAL(10, 2)
); -- Insert sample data
INSERT INTO Sales (SaleID, DepartmentID, Year, Revenue)
VALUES (1, 1, 2023, 10000.00), (2, 1, 2024, 12000.00), (3, 2, 2023, 8000.00), (4, 2, 2024, 9000.00); -- PIVOT: Summarize revenue by department and year
-- SQL Server/Oracle syntax
SELECT * FROM Sales
PIVOT ( SUM(Revenue) FOR Year IN (2023, 2024)
) AS PivotTable; -- MySQL/PostgreSQL workaround (conditional aggregation)
SELECT DepartmentID, SUM(CASE WHEN Year = 2023 THEN Revenue ELSE 0 END) AS "2023", SUM(CASE WHEN Year = 2024 THEN Revenue ELSE 0 END) AS "2024"
FROM Sales
GROUP BY DepartmentID; -- UNPIVOT: Convert pivoted data back to rows (SQL Server/Oracle)
-- First, create a pivoted table
CREATE TABLE PivotedSales ( DepartmentID INT, "2023" DECIMAL(10, 2), "2024" DECIMAL(10, 2)
); INSERT INTO PivotedSales (DepartmentID, "2023", "2024")
VALUES (1, 10000.00, 12000.00), (2, 8000.00, 9000.00); SELECT * FROM PivotedSales
UNPIVOT ( Revenue FOR Year IN ("2023", "2024")
) AS UnpivotTable; Output (PIVOT, MySQL/PostgreSQL):
DepartmentID | 2023 | 2024
-------------|----------|--------
1 | 10000.00 | 12000.00
2 | 8000.00 | 9000.00 Output (UNPIVOT, SQL Server/Oracle):
DepartmentID | Year | Revenue
-------------|------|--------
1 | 2023 | 10000.00
1 | 2024 | 12000.00
2 | 2023 | 8000.00
2 | 2024 | 9000.00 Note:
- MySQL/PostgreSQL use
CASEfor pivoting due to lack of nativePIVOT. UNPIVOTreverses the pivoted structure, supported natively in SQL Server/Oracle.
9. Can you provide a comprehensive example using window functions, CTEs, recursive queries, and PIVOT/UNPIVOT?
-- Create sample tables
CREATE TABLE Employees ( EmployeeID INT PRIMARY KEY, FirstName VARCHAR(50), DepartmentID INT, ManagerID INT, Salary DECIMAL(10, 2)
); CREATE TABLE Sales ( SaleID INT PRIMARY KEY, DepartmentID INT, Year INT, Revenue DECIMAL(10, 2)
); -- Insert sample data
INSERT INTO Employees (EmployeeID, FirstName, DepartmentID, ManagerID, Salary)
VALUES (1, 'kristal', 1, NULL, 65000.00), (2, 'John', 1, 1, 60000.00), (3, 'hari', 1, 2, 55000.00), (4, 'Jane', 2, NULL, 58000.00); INSERT INTO Sales (SaleID, DepartmentID, Year, Revenue)
VALUES (1, 1, 2023, 10000.00), (2, 1, 2024, 12000.00), (3, 2, 2023, 8000.00), (4, 2, 2024, 9000.00); -- Comprehensive query
WITH RECURSIVE OrgChart AS ( -- Anchor: Top-level employees SELECT EmployeeID, FirstName, ManagerID, 1 AS Level FROM Employees WHERE ManagerID IS NULL UNION ALL -- Recursive: Subordinates SELECT e.EmployeeID, e.FirstName, e.ManagerID, o.Level + 1 FROM Employees e JOIN OrgChart o ON e.ManagerID = o.EmployeeID
),
DeptSales AS ( -- CTE: Pivot sales data (MySQL/PostgreSQL style) SELECT DepartmentID, SUM(CASE WHEN Year = 2023 THEN Revenue ELSE 0 END) AS "2023", SUM(CASE WHEN Year = 2024 THEN Revenue ELSE 0 END) AS "2024" FROM Sales GROUP BY DepartmentID
)
SELECT o.FirstName, o.Level, e.Salary, ROW_NUMBER() OVER (PARTITION BY e.DepartmentID ORDER BY e.Salary DESC) AS SalaryRank, d."2023" AS Revenue2023, d."2024" AS Revenue2024, (SELECT FirstName FROM Employees m WHERE m.EmployeeID = o.ManagerID) AS ManagerName
FROM OrgChart o
JOIN Employees e ON o.EmployeeID = e.EmployeeID
JOIN DeptSales d ON e.DepartmentID = d.DepartmentID
WHERE e.Salary > ( SELECT AVG(Salary) FROM Employees WHERE DepartmentID = e.DepartmentID
)
ORDER BY o.Level, e.Salary DESC
LIMIT 3; -- MySQL/PostgreSQL/SQLite (use TOP 3 for SQL Server) Output:
FirstName | Level | Salary | SalaryRank | Revenue2023 | Revenue2024 | ManagerName
----------|-------|----------|------------|-------------|-------------|------------
kristal | 1 | 65000.00 | 1 | 10000.00 | 12000.00 | NULL
John | 2 | 60000.00 | 2 | 10000.00 | 12000.00 | kristal
Jane | 1 | 58000.00 | 1 | 8000.00 | 9000.00 | NULL Description:
- Recursive CTE:
OrgChartbuilds the employee hierarchy. - CTE:
DeptSalespivots sales data by year. - Window Function:
ROW_NUMBER()ranks employees by salary within departments. - Subquery: Filters employees with above-average department salaries and retrieves manager names.
- Integrates prior concepts (
JOIN,WHERE,GROUP BY).
10. What are common mistakes when using advanced SQL queries?
Window Functions:
- Omitting
PARTITION BYorORDER BY, leading to incorrect windows. - Using window functions in
WHEREdirectly (must use a subquery or CTE). - Overusing large windows, slowing queries.
CTEs:
- Nesting too many CTEs, reducing readability.
- Forgetting to reference CTEs in the main query, causing errors.
- Using CTEs when subqueries or joins are simpler.
Recursive Queries:
- Missing termination conditions, causing infinite loops (use
LIMITor constraints). - Poorly indexing recursive join columns, slowing performance.
- Overcomplicating hierarchies when simpler queries suffice.
PIVOT/UNPIVOT:
- Assuming native
PIVOTsupport in MySQL or SQLite (useCASE). - Misaligning pivot values, leading to missing or incorrect columns.
- Using
PIVOTfor dynamic columns without preprocessing (requires dynamic SQL).
11. What are best practices for advanced SQL queries?
Window Functions:
- Use
PARTITION BYto group data logically; includeORDER BYfor ordered calculations. - Test window functions with small datasets to verify results.
- Use
OVERclauses sparingly to avoid performance issues.
CTEs:
- Name CTEs descriptively (e.g.,
DeptAvgfor department averages). - Break complex queries into multiple CTEs for clarity.
- Use CTEs for reusable intermediate results instead of nested subqueries.
Recursive Queries:
- Ensure a clear anchor query and termination condition to prevent infinite recursion.
- Index columns used in recursive joins (e.g.,
CREATE INDEX ON Employees(ManagerID)). - Test recursive queries with small hierarchies first.
PIVOT/UNPIVOT:
- Use
CASEstatements for portability in MySQL/PostgreSQL. - Validate pivot values to ensure all expected columns are included.
- Use
UNPIVOTto normalize pivoted data for further analysis.
General:
- Comment queries for clarity (e.g.,
-- Rank employees by salary). - Use
EXPLAINorEXPLAIN ANALYZEto optimize performance. - Index columns used in
WHERE,JOIN, orPARTITION BY. - Test queries incrementally to ensure correctness.
- Balance complexity with readability (e.g., prefer CTEs over deeply nested subqueries).