Common Table Expressions in SQL

Ayush Prashar

Date: 11th November, 2024

Contents

Agenda

Introduction to CTEs
Basic Syntax of a CTE
Types of CTEs
- Non-Recursive CTEs
- Recursive CTEs
Multiple CTEs in a Single Query
Advanced CTEs Techniques
- Nested CTEs
- Applying Window Functions in CTEs
Limitations of CTEs Techniques

Introduction to CTEs

A Common Table Expression (CTE) is a powerful feature in SQL, providing a way to create a temporary result set that you can refer to multiple times within a single query. This result set exists only for the duration of the query it’s associated with, making it particularly useful for organizing complex queries, improving readability, and simplifying debugging and maintenance. CTEs allow you to structure queries in a more modular way by breaking them down into smaller, more manageable parts, which can lead to better-optimized and more understandable SQL code.

Defined using the WITH keyword, CTEs enable a name to be assigned to a specific query. You can think of a CTE as a temporary, named subquery that behaves like a virtual table. Once defined, this virtual table can be referenced multiple times in the main query. This helps avoid the need to write the same subquery repeatedly, making the query cleaner and reducing the chance of errors.

Why use CTEs?

Improved Readability: CTEs allow you to break complex queries into smaller parts, making SQL code easier to read and maintain.
Reusability: Once defined, CTEs can be referenced multiple times within the same query, eliminating redundant code and simplifying updates.
Recursive Queries: CTEs support recursion, which is especially useful for working with hierarchical data like organizational structures.
Enhanced Modularity: CTEs let you tackle parts of a query step-by-step, enabling cleaner logic and better debugging.
Temporary Scope: CTEs act as temporary views within a query, helping process data without affecting the underlying database structure.

Basic Syntax of a CTE

The structure of a CTE begins with the WITH keyword, followed by the CTE name and an AS keyword, and finally, a set of parentheses that encloses the CTE's query. This query generates a temporary result set that you can reference in the main query.

Syntax Breakdown:

WITH cte_name AS (
-- Define your query here
SELECT column1, column2, ...
FROM table_name
WHERE condition
)
-- Main query that uses the CTE
SELECT columns
FROM cte_name;

Parts of the Syntax:

WITH: The keyword that starts the CTE. It signals the beginning of a temporary result set.
CTE Name (cte_name):

This is the alias for the CTE and acts as a temporary table name.
Naming conventions are similar to naming tables or columns in SQL.

AS: This keyword follows the CTE name and precedes the CTE’s query definition.
Parentheses: The CTE query is wrapped in parentheses (...), which defines the result set of the CTE.
CTE Query:

This is the actual query that defines what the CTE will contain.
It can include any valid SQL query, such as SELECT statements with JOIN, GROUP BY, or other clauses.

Main Query: After defining the CTE, the main query (which can follow the CTE or refer to it multiple times within a larger SQL statement) retrieves data from the CTE, treating it as a temporary table.

Example:

Let’s look at an example of a CTE that selects employees who earn over $50,000.

WITH HighEarningEmployees AS (
SELECT EmployeeID, FirstName, LastName, Salary
FROM Employees
WHERE Salary > 50000
)
SELECT EmployeeID, FirstName, LastName
FROM HighEarningEmployees;

Explanation:

WITH HighEarningEmployees AS (...):

HighEarningEmployees is the name of the CTE. The query inside the parentheses selects employees with a salary above $50,000.

Main Query:

The SELECT statement following the CTE references HighEarningEmployees, selecting EmployeeID, FirstName, and LastName from the filtered result.

Types of CTEs

There are two main types of Common Table Expressions (CTEs) in SQL: Non-Recursive CTEs and Recursive CTEs. Each has distinct use cases and capabilities.

Non-Recursive CTEs

A Non-Recursive CTE is the simplest form of a CTE. It is used to structure and organize complex SQL queries by creating a temporary result set that can be referenced within the main query. Non-recursive CTEs do not involve any recursive operations; they simply hold the result of a single, non-recursive query.

Syntax:

WITH cte_name AS (
SELECT column1, column2, ...
FROM table_name
WHERE condition
)
SELECT columns
FROM cte_name;

Example of Non-Recursive CTE: Using Employee Table

You want to list employees with a salary over $50,000.

WITH HighEarningEmployees AS (
SELECT EmployeeID, FirstName, LastName, Salary
FROM Employees
WHERE Salary > 50000
)
SELECT EmployeeID, FirstName, LastName
FROM HighEarningEmployees;

Advantages of Non-Recursive CTEs

Code Readability: They simplify complex queries by breaking them down into parts.
Reusability: You can reuse the same result set within a query multiple times.
Encapsulation: Allows performing calculations or aggregations in one step, making it easy to reference in subsequent queries.

Use Cases for Non-Recursive CTEs

Data Transformation: Use CTEs to preprocess or filter data before applying additional operations in the main query.
Intermediate Calculations: Simplify intermediate steps, such as calculating subtotals, averages, or rankings.
Modular Complex Queries: Break down complex queries with multiple joins or aggregations into manageable sections.

Recursive CTEs

A Recursive CTE is a CTE that references itself, allowing for repetitive operations on hierarchical or recursive data. Recursive CTEs are useful when dealing with data structures that have a self-referential relationship, such as organizational charts, bill of materials, or hierarchical file structures.

Syntax

The structure of a Recursive CTE has two main parts:

Anchor Member: The initial query that provides the base result set.
Recursive Member: A query that references the CTE itself, allowing for repeated execution based on the result set of the previous iteration.

WITH Recursive_CTE AS (
-- Anchor member
SELECT column1, column2
FROM table_name
WHERE condition

UNION ALL

-- Recursive member (references Recursive_CTE)
SELECT column1, column2
FROM table_name
INNER JOIN Recursive_CTE ON condition
)
SELECT *
FROM Recursive_CTE;

Example of Recursive CTE

We have an Employees table with EmployeeID, ManagerID, and FirstName. Each employee is managed by another employee, forming a hierarchy. We want to find all employees in the reporting hierarchy under a specific manager, say, ManagerID = 1.

WITH EmployeeHierarchy AS (
-- Anchor member: Start from the top manager (e.g., ManagerID = 1)
SELECT EmployeeID, ManagerID, FirstName, 1 AS Level
FROM Employees
WHERE ManagerID = 1

UNION ALL

-- Recursive member: Get employees reporting to the ones found in the previous level
SELECT e.EmployeeID, e.ManagerID, e.FirstName, eh.Level + 1
FROM Employees e
INNER JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
)
SELECT EmployeeID, FirstName, Level
FROM EmployeeHierarchy;

In this example:

The Anchor Member starts with employees who report directly to ManagerID = 1.
The Recursive Member then retrieves employees reporting to the current level’s employees, incrementing the hierarchy level each time.
The recursion continues until no more employees are found in the hierarchy

Key Considerations in Recursive CTEs

Cycle Detection: Recursive CTEs must be carefully designed to avoid infinite loops, which occur if the recursive part keeps joining indefinitely. SQL engines generally impose a maximum recursion level (often 100 by default), which can be adjusted.
Recursion Depth: You can control recursion depth using a MAXRECURSION option in some SQL engines (e.g., OPTION (MAXRECURSION n) in SQL Server).

Advantages of Recursive CTEs

Simplifies Hierarchical Data Queries: Recursive CTEs are ideal for querying hierarchical data without complex self-joins.
Modular Approach: They offer a structured, easy-to-understand approach to dealing with repetitive or nested data structures.
Efficient Tree Traversal: Recursive CTEs simplify traversing trees or hierarchical relationships in data, which would be difficult with traditional joins.

Use Cases for Recursive CTEs

Organizational Hierarchies: Fetching reporting relationships in a company.
File System Directories: Traversing folder structures.
Parent-Child Relationships: Querying relationships like bill of materials, where items have component parts.
Graph and Tree Structures: Navigating graph-like data where nodes have multiple children or parent relationships.

Examples of Combining Non-Recursive and Recursive CTEs

CTEs can also be combined in a query where you might use multiple non-recursive CTEs along with a recursive CTE.

WITH DepartmentTotal AS (
-- Non-recursive CTE to calculate total salary per department
SELECT DepartmentID, SUM(Salary) AS TotalSalary
FROM Employees
GROUP BY DepartmentID
),
EmployeeHierarchy AS (
-- Recursive CTE to build hierarchy for a specific department
SELECT EmployeeID, ManagerID, FirstName, DepartmentID, 1 AS Level
FROM Employees
WHERE DepartmentID = 1 AND ManagerID IS NULL

UNION ALL

SELECT e.EmployeeID, e.ManagerID, e.FirstName, e.DepartmentID, eh.Level + 1
FROM Employees e
INNER JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
)
-- Main query using both CTEs
SELECT eh.EmployeeID, eh.FirstName, eh.Level, dt.TotalSalary
FROM EmployeeHierarchy eh
JOIN DepartmentTotal dt ON eh.DepartmentID = dt.DepartmentID;

In this example:

DepartmentTotal (a non-recursive CTE) calculates the total salary for each department.
EmployeeHierarchy (a recursive CTE) builds an employee hierarchy for a specific department.
The main query combines results from both CTEs to produce a hierarchical list of employees with their department’s total salary.

Multiple CTEs in a Single Query

Using multiple Common Table Expressions (CTEs) in a single query can be very useful when you need to structure complex queries with multiple stages of data processing. Each CTE can perform a separate task, and together they allow you to build a more readable and manageable query.

Syntax for Multiple CTEs:oSyntaxCTEsrMultiple CTEsEs

WITH
cte1 AS (
-- First CTE definition
SELECT ...
),
cte2 AS (
-- Second CTE definition that can reference cte1
SELECT ...
FROM cte1
),
cte3 AS (
-- Third CTE definition that can reference cte1 and cte2
SELECT ...
FROM cte2
)
-- Final SELECT statement that uses one or more of the defined CTEs
SELECT ...
FROM cte3;

In this structure:

cte1 is defined first.
cte2 can reference cte1, allowing data to be processed in stages.
cte3 can then use both cte1 and cte2 as sources.
Finally, you have a SELECT statement that can reference any of the CTEs.

Example: Multiple CTEs

Consider the Employees table we created earlier. Let’s say we want to:

Calculate the average salary in each department.
Identify employees with salaries above their department's average.
Find the names of these employees along with their managers’ names.

WITH
DeptAvgSalary AS (
-- CTE 1: Calculate the average salary per department
SELECT
DepartmentID,
AVG(Salary) AS AvgSalary
FROM
Employees
GROUP BY
DepartmentID
),
HighEarners AS (
-- CTE 2: Find employees earning above the department average
SELECT
e.EmployeeID,
e.FirstName,
e.LastName,
e.Salary,
e.DepartmentID,
e.ManagerID
FROM
Employees e
JOIN
DeptAvgSalary d
ON
e.DepartmentID = d.DepartmentID
WHERE
e.Salary > d.AvgSalary
),
Managers AS (
-- CTE 3: Add manager names for employees earning above the department average
SELECT
h.EmployeeID,
h.FirstName AS EmployeeFirstName,
h.LastName AS EmployeeLastName,
h.Salary,
m.FirstName AS ManagerFirstName,
m.LastName AS ManagerLastName
FROM
HighEarners h
LEFT JOIN
Employees m
ON
h.ManagerID = m.EmployeeID
)
-- Final SELECT statement that retrieves high earners and their managers
SELECT
EmployeeFirstName,
EmployeeLastName,
Salary,
ManagerFirstName,
ManagerLastName
FROM
Managers;

Explanation of Each Step:

DeptAvgSalary CTE:

Calculates the average salary in each department.
This CTE gives us the average salary for each department (DepartmentID and AvgSalary), which we’ll use to find high earners.

HighEarners CTE:

Selects employees whose salaries are above their department’s average.
This CTE joins Employees with DeptAvgSalary to filter employees who have salaries higher than AvgSalary within their DepartmentID.

Managers CTE:

Uses HighEarners to retrieve additional information, specifically the manager names for each high-earning employee.
It joins the HighEarners CTE with the Employees table to get the names of the managers based on the ManagerID.

Final Query:

Selects the high-earning employees along with their managers’ names from the Managers CTE, making the query output cleaner and organized.

Benefits of Using Multiple CTEs

Readability: By breaking down complex operations, each CTE handles a specific part of the problem, making it easier to read and debug.
Reusability: CTEs can reference each other, allowing intermediate results to be reused in subsequent CTEs or the final query.
Maintainability: If any part of the query needs to change, it can be modified within its respective CTE without affecting other parts of the query.

Advanced CTEs Techniques

Advanced Common Table Expressions (CTEs) techniques allow for even greater flexibility and functionality in SQL.

Nested CTEs (Using CTEs within CTEs)

Nested CTEs involve placing one CTE inside another to break down complex calculations. This can be useful when performing multiple transformations in stages or when simplifying logic.

Example: Multi-step Transformation with Nested CTEs

Imagine you want to calculate the average salary per department, then find employees earning above average, and finally determine the count of such high earners per department.

WITH
DeptAvgSalary AS (
SELECT
DepartmentID,
AVG(Salary) AS AvgSalary
FROM
Employees
GROUP BY
DepartmentID
),
HighEarners AS (
SELECT
e.EmployeeID,
e.FirstName,
e.DepartmentID,
e.Salary
FROM
Employees e
JOIN
DeptAvgSalary d ON e.DepartmentID = d.DepartmentID
WHERE
e.Salary > d.AvgSalary
),
HighEarnerCount AS (
SELECT
DepartmentID,
COUNT(EmployeeID) AS HighEarnerCount
FROM
HighEarners
GROUP BY
DepartmentID
)
SELECT * FROM HighEarnerCount;

In this example:

DeptAvgSalary calculates average salary per department.
HighEarners selects employees above their department’s average.
HighEarnerCount provides the count of such high earners by department.

Applying Window Functions within CTEs

CTEs can benefit greatly from window functions, which allow complex analytics, ranking, running totals, moving averages, and more within each partition of data.

Example: Ranking Employees by Salary within Departments

WITH RankedSalaries AS (
SELECT
EmployeeID,
FirstName,
DepartmentID,
Salary,
RANK() OVER (PARTITION BY DepartmentID ORDER BY Salary DESC) AS SalaryRank
FROM
Employees
)
SELECT
EmployeeID,
FirstName,
DepartmentID,
Salary,
SalaryRank
FROM
RankedSalaries
WHERE
SalaryRank = 1; -- Only show top earners per department

Here:

RANK() assigns ranks within each department, with the highest salary ranked as 1.
The main query filters only the top earners in each department.

Limitations of CTEs Techniques

Performance Issues

Materialization: Many database systems may materialize CTEs as temporary tables, which can lead to performance overhead, especially if the CTE is referenced multiple times in a query. This can result in slower execution times compared to subqueries or derived tables.
Optimization Limitations: Some databases do not optimize CTEs as effectively as regular queries. The query optimizer may not consider CTEs when generating execution plans, which could lead to inefficient query performance.

2. Scope and Lifetime

Limited Scope: CTEs are only valid within the execution of the single SQL statement that they are defined in. They cannot be referenced in other queries, procedures, or CTEs outside their immediate context.
Lifetime of CTEs: Once the query that uses a CTE is executed, the CTE is discarded, meaning it cannot be reused in subsequent queries. This can lead to code duplication if the same logic is needed in multiple places.

3. Recursive CTE Limitations

Recursion Depth: Many databases impose limits on the recursion depth of recursive CTEs. For example, SQL Server limits recursive CTEs to 100 levels of recursion by default, which can hinder queries that need deeper hierarchies.
Infinite Recursion Risks: If the termination condition is not well-defined in a recursive CTE, it may lead to infinite recursion, resulting in errors or crashes in the database.

4. Readability and Complexity

Readability Concerns: While CTEs can improve query readability, overly complex CTE structures or nested CTEs can make queries difficult to understand and maintain, especially for those unfamiliar with the structure.
Overuse: Relying heavily on CTEs for complex queries might lead to convoluted code that is harder to debug and understand, particularly when multiple CTEs are nested.

5. Limited Functionality Compared to Views

No Indexing: Unlike indexed views (materialized views), CTEs cannot be indexed, which can impact performance when dealing with large datasets or frequently queried data.
Static Definition: CTEs are defined in the query itself and do not store any schema information. In contrast, views can encapsulate complex logic and provide a static interface for users.

Related Blogs