bosscoder_logo
Right arrow

Common Table Expressions in SQL

author image

Ayush Prashar

Date: 11th November, 2024

feature image

Contents

    Agenda

    • Introduction to CTEs
    • Basic Syntax of a CTE
    • Types of CTEs
      • Non-Recursive CTEs
      • Recursive CTEs
    • Multiple CTEs in a Single Query
    • Advanced CTEs Techniques
      • Nested CTEs
      • Applying Window Functions in CTEs
    • Limitations of CTEs Techniques

    Introduction to CTEs

    A Common Table Expression (CTE) is a powerful feature in SQL, providing a way to create a temporary result set that you can refer to multiple times within a single query. This result set exists only for the duration of the query it’s associated with, making it particularly useful for organizing complex queries, improving readability, and simplifying debugging and maintenance. CTEs allow you to structure queries in a more modular way by breaking them down into smaller, more manageable parts, which can lead to better-optimized and more understandable SQL code.

    Defined using the WITH keyword, CTEs enable a name to be assigned to a specific query. You can think of a CTE as a temporary, named subquery that behaves like a virtual table. Once defined, this virtual table can be referenced multiple times in the main query. This helps avoid the need to write the same subquery repeatedly, making the query cleaner and reducing the chance of errors.

    Why use CTEs?

    1. Improved Readability: CTEs allow you to break complex queries into smaller parts, making SQL code easier to read and maintain.
    2. Reusability: Once defined, CTEs can be referenced multiple times within the same query, eliminating redundant code and simplifying updates.
    3. Recursive Queries: CTEs support recursion, which is especially useful for working with hierarchical data like organizational structures.
    4. Enhanced Modularity: CTEs let you tackle parts of a query step-by-step, enabling cleaner logic and better debugging.
    5. Temporary Scope: CTEs act as temporary views within a query, helping process data without affecting the underlying database structure.

    Basic Syntax of a CTE

    The structure of a CTE begins with the WITH keyword, followed by the CTE name and an AS keyword, and finally, a set of parentheses that encloses the CTE's query. This query generates a temporary result set that you can reference in the main query.

    Syntax Breakdown:

    WITH cte_name AS (
        -- Define your query here
        SELECT column1, column2, ...
        FROM table_name
        WHERE condition
    )
    -- Main query that uses the CTE
    SELECT columns
    FROM cte_name;

    Parts of the Syntax:

    1. WITH: The keyword that starts the CTE. It signals the beginning of a temporary result set.
    2. CTE Name (cte_name):
    • This is the alias for the CTE and acts as a temporary table name.
    • Naming conventions are similar to naming tables or columns in SQL.
    1. AS: This keyword follows the CTE name and precedes the CTE’s query definition.
    2. Parentheses: The CTE query is wrapped in parentheses (...), which defines the result set of the CTE.
    3. CTE Query:
    • This is the actual query that defines what the CTE will contain.
    • It can include any valid SQL query, such as SELECT statements with JOIN, GROUP BY, or other clauses.
    1. Main Query: After defining the CTE, the main query (which can follow the CTE or refer to it multiple times within a larger SQL statement) retrieves data from the CTE, treating it as a temporary table.

    Example:

    Let’s look at an example of a CTE that selects employees who earn over $50,000.

    WITH HighEarningEmployees AS (
        SELECT EmployeeID, FirstName, LastName, Salary
        FROM Employees
        WHERE Salary > 50000
    )
    SELECT EmployeeID, FirstName, LastName
    FROM HighEarningEmployees;

    Explanation:

    1. WITH HighEarningEmployees AS (...):
    • HighEarningEmployees is the name of the CTE. The query inside the parentheses selects employees with a salary above $50,000.
    1. Main Query:
    • The SELECT statement following the CTE references HighEarningEmployees, selecting EmployeeID, FirstName, and LastName from the filtered result.

    Types of CTEs

    There are two main types of Common Table Expressions (CTEs) in SQL: Non-Recursive CTEs and Recursive CTEs. Each has distinct use cases and capabilities.

    Non-Recursive CTEs

    A Non-Recursive CTE is the simplest form of a CTE. It is used to structure and organize complex SQL queries by creating a temporary result set that can be referenced within the main query. Non-recursive CTEs do not involve any recursive operations; they simply hold the result of a single, non-recursive query.

    Syntax:

    WITH cte_name AS (
        SELECT column1, column2, ...
        FROM table_name
        WHERE condition
    )
    SELECT columns
    FROM cte_name;

    Example of Non-Recursive CTE: Using Employee Table

    You want to list employees with a salary over $50,000.

    WITH HighEarningEmployees AS (
        SELECT EmployeeID, FirstName, LastName, Salary
        FROM Employees
        WHERE Salary > 50000
    )
    SELECT EmployeeID, FirstName, LastName
    FROM HighEarningEmployees;

    Advantages of Non-Recursive CTEs

    • Code Readability: They simplify complex queries by breaking them down into parts.
    • Reusability: You can reuse the same result set within a query multiple times.
    • Encapsulation: Allows performing calculations or aggregations in one step, making it easy to reference in subsequent queries.

    Use Cases for Non-Recursive CTEs

    • Data Transformation: Use CTEs to preprocess or filter data before applying additional operations in the main query.
    • Intermediate Calculations: Simplify intermediate steps, such as calculating subtotals, averages, or rankings.
    • Modular Complex Queries: Break down complex queries with multiple joins or aggregations into manageable sections.

    Recursive CTEs

    A Recursive CTE is a CTE that references itself, allowing for repetitive operations on hierarchical or recursive data. Recursive CTEs are useful when dealing with data structures that have a self-referential relationship, such as organizational charts, bill of materials, or hierarchical file structures.

    Syntax

    The structure of a Recursive CTE has two main parts:

    • Anchor Member: The initial query that provides the base result set.
    • Recursive Member: A query that references the CTE itself, allowing for repeated execution based on the result set of the previous iteration.

    WITH Recursive_CTE AS (
        -- Anchor member
        SELECT column1, column2
        FROM table_name
        WHERE condition

        UNION ALL

        -- Recursive member (references Recursive_CTE)
        SELECT column1, column2
        FROM table_name
        INNER JOIN Recursive_CTE ON condition
    )
    SELECT *
    FROM Recursive_CTE;

    Example of Recursive CTE

    We have an Employees table with EmployeeID, ManagerID, and FirstName. Each employee is managed by another employee, forming a hierarchy. We want to find all employees in the reporting hierarchy under a specific manager, say, ManagerID = 1.

    WITH EmployeeHierarchy AS (
        -- Anchor member: Start from the top manager (e.g., ManagerID = 1)
        SELECT EmployeeID, ManagerID, FirstName, 1 AS Level
        FROM Employees
        WHERE ManagerID = 1

        UNION ALL

        -- Recursive member: Get employees reporting to the ones found in the previous level
        SELECT e.EmployeeID, e.ManagerID, e.FirstName, eh.Level + 1
        FROM Employees e
        INNER JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
    )
    SELECT EmployeeID, FirstName, Level
    FROM EmployeeHierarchy;

    In this example:

    • The Anchor Member starts with employees who report directly to ManagerID = 1.
    • The Recursive Member then retrieves employees reporting to the current level’s employees, incrementing the hierarchy level each time.
    • The recursion continues until no more employees are found in the hierarchy

    Key Considerations in Recursive CTEs

    • Cycle Detection: Recursive CTEs must be carefully designed to avoid infinite loops, which occur if the recursive part keeps joining indefinitely. SQL engines generally impose a maximum recursion level (often 100 by default), which can be adjusted.
    • Recursion Depth: You can control recursion depth using a MAXRECURSION option in some SQL engines (e.g., OPTION (MAXRECURSION n) in SQL Server).

    Advantages of Recursive CTEs

    • Simplifies Hierarchical Data Queries: Recursive CTEs are ideal for querying hierarchical data without complex self-joins.
    • Modular Approach: They offer a structured, easy-to-understand approach to dealing with repetitive or nested data structures.
    • Efficient Tree Traversal: Recursive CTEs simplify traversing trees or hierarchical relationships in data, which would be difficult with traditional joins.

    Use Cases for Recursive CTEs

    • Organizational Hierarchies: Fetching reporting relationships in a company.
    • File System Directories: Traversing folder structures.
    • Parent-Child Relationships: Querying relationships like bill of materials, where items have component parts.
    • Graph and Tree Structures: Navigating graph-like data where nodes have multiple children or parent relationships.

    Examples of Combining Non-Recursive and Recursive CTEs

    CTEs can also be combined in a query where you might use multiple non-recursive CTEs along with a recursive CTE.

    WITH DepartmentTotal AS (
        -- Non-recursive CTE to calculate total salary per department
        SELECT DepartmentID, SUM(Salary) AS TotalSalary
        FROM Employees
        GROUP BY DepartmentID
    ),
    EmployeeHierarchy AS (
        -- Recursive CTE to build hierarchy for a specific department
        SELECT EmployeeID, ManagerID, FirstName, DepartmentID, 1 AS Level
        FROM Employees
        WHERE DepartmentID = 1 AND ManagerID IS NULL


        UNION ALL

        SELECT e.EmployeeID, e.ManagerID, e.FirstName, e.DepartmentID, eh.Level + 1
        FROM Employees e
        INNER JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
    )
    -- Main query using both CTEs
    SELECT eh.EmployeeID, eh.FirstName, eh.Level, dt.TotalSalary
    FROM EmployeeHierarchy eh
    JOIN DepartmentTotal dt ON eh.DepartmentID = dt.DepartmentID;

    In this example:

    • DepartmentTotal (a non-recursive CTE) calculates the total salary for each department.
    • EmployeeHierarchy (a recursive CTE) builds an employee hierarchy for a specific department.
    • The main query combines results from both CTEs to produce a hierarchical list of employees with their department’s total salary.

    Multiple CTEs in a Single Query

    Using multiple Common Table Expressions (CTEs) in a single query can be very useful when you need to structure complex queries with multiple stages of data processing. Each CTE can perform a separate task, and together they allow you to build a more readable and manageable query.

    Syntax for Multiple CTEs:oSyntaxCTEsrMultiple CTEsEs

    WITH
        cte1 AS (
            -- First CTE definition
            SELECT ...
        ),
        cte2 AS (
            -- Second CTE definition that can reference cte1
            SELECT ...
            FROM cte1
        ),
        cte3 AS (
            -- Third CTE definition that can reference cte1 and cte2
            SELECT ...
            FROM cte2
        )
    -- Final SELECT statement that uses one or more of the defined CTEs
    SELECT ...
    FROM cte3;

    In this structure:

    1. cte1 is defined first.
    2. cte2 can reference cte1, allowing data to be processed in stages.
    3. cte3 can then use both cte1 and cte2 as sources.
    4. Finally, you have a SELECT statement that can reference any of the CTEs.

    Example: Multiple CTEs 

    Consider the Employees table we created earlier. Let’s say we want to:

    1. Calculate the average salary in each department.
    2. Identify employees with salaries above their department's average.
    3. Find the names of these employees along with their managers’ names.

    WITH
        DeptAvgSalary AS (
            -- CTE 1: Calculate the average salary per department
            SELECT
                DepartmentID,
                AVG(Salary) AS AvgSalary
            FROM
                Employees
            GROUP BY
                DepartmentID
        ),
        HighEarners AS (
            -- CTE 2: Find employees earning above the department average
            SELECT
                e.EmployeeID,
                e.FirstName,
                e.LastName,
                e.Salary,
                e.DepartmentID,
                e.ManagerID
            FROM
                Employees e
            JOIN
                DeptAvgSalary d
            ON
                e.DepartmentID = d.DepartmentID
            WHERE
                e.Salary > d.AvgSalary
        ),
        Managers AS (
            -- CTE 3: Add manager names for employees earning above the department average
            SELECT
                h.EmployeeID,
                h.FirstName AS EmployeeFirstName,
                h.LastName AS EmployeeLastName,
                h.Salary,
                m.FirstName AS ManagerFirstName,
                m.LastName AS ManagerLastName
            FROM
                HighEarners h
            LEFT JOIN
                Employees m
            ON
                h.ManagerID = m.EmployeeID
        )
    -- Final SELECT statement that retrieves high earners and their managers
    SELECT
        EmployeeFirstName,
        EmployeeLastName,
        Salary,
        ManagerFirstName,
        ManagerLastName
    FROM
        Managers;

    Explanation of Each Step:

    1. DeptAvgSalary CTE:
    • Calculates the average salary in each department.
    • This CTE gives us the average salary for each department (DepartmentID and AvgSalary), which we’ll use to find high earners.
    1. HighEarners CTE:
    • Selects employees whose salaries are above their department’s average.
    • This CTE joins Employees with DeptAvgSalary to filter employees who have salaries higher than AvgSalary within their DepartmentID.
    1. Managers CTE:
    • Uses HighEarners to retrieve additional information, specifically the manager names for each high-earning employee.
    • It joins the HighEarners CTE with the Employees table to get the names of the managers based on the ManagerID.
    1. Final Query:
    • Selects the high-earning employees along with their managers’ names from the Managers CTE, making the query output cleaner and organized.

    Benefits of Using Multiple CTEs

    • Readability: By breaking down complex operations, each CTE handles a specific part of the problem, making it easier to read and debug.
    • Reusability: CTEs can reference each other, allowing intermediate results to be reused in subsequent CTEs or the final query.
    • Maintainability: If any part of the query needs to change, it can be modified within its respective CTE without affecting other parts of the query.

    Advanced CTEs Techniques

    Advanced Common Table Expressions (CTEs) techniques allow for even greater flexibility and functionality in SQL.

    Nested CTEs (Using CTEs within CTEs)

    Nested CTEs involve placing one CTE inside another to break down complex calculations. This can be useful when performing multiple transformations in stages or when simplifying logic.

    Example: Multi-step Transformation with Nested CTEs

    Imagine you want to calculate the average salary per department, then find employees earning above average, and finally determine the count of such high earners per department.

    WITH
        DeptAvgSalary AS (
            SELECT
                DepartmentID,
                AVG(Salary) AS AvgSalary
            FROM
                Employees
            GROUP BY
                DepartmentID
        ),
        HighEarners AS (
            SELECT
                e.EmployeeID,
                e.FirstName,
                e.DepartmentID,
                e.Salary
            FROM
                Employees e
            JOIN
                DeptAvgSalary d ON e.DepartmentID = d.DepartmentID
            WHERE
                e.Salary > d.AvgSalary
        ),
        HighEarnerCount AS (
            SELECT
                DepartmentID,
                COUNT(EmployeeID) AS HighEarnerCount
            FROM
                HighEarners
            GROUP BY
                DepartmentID
        )
    SELECT * FROM HighEarnerCount;

    In this example:

    1. DeptAvgSalary calculates average salary per department.
    2. HighEarners selects employees above their department’s average.
    3. HighEarnerCount provides the count of such high earners by department.

    Applying Window Functions within CTEs

    CTEs can benefit greatly from window functions, which allow complex analytics, ranking, running totals, moving averages, and more within each partition of data.

    Example: Ranking Employees by Salary within Departments

    WITH RankedSalaries AS (
        SELECT
            EmployeeID,
            FirstName,
            DepartmentID,
            Salary,
            RANK() OVER (PARTITION BY DepartmentID ORDER BY Salary DESC) AS SalaryRank
        FROM
            Employees
    )
    SELECT
        EmployeeID,
        FirstName,
        DepartmentID,
        Salary,
        SalaryRank
    FROM
        RankedSalaries
    WHERE
        SalaryRank = 1-- Only show top earners per department

    Here:

    • RANK() assigns ranks within each department, with the highest salary ranked as 1.
    • The main query filters only the top earners in each department.

    Limitations of CTEs Techniques

    Performance Issues

    • Materialization: Many database systems may materialize CTEs as temporary tables, which can lead to performance overhead, especially if the CTE is referenced multiple times in a query. This can result in slower execution times compared to subqueries or derived tables.
    • Optimization Limitations: Some databases do not optimize CTEs as effectively as regular queries. The query optimizer may not consider CTEs when generating execution plans, which could lead to inefficient query performance.

        2. Scope and Lifetime

    • Limited Scope: CTEs are only valid within the execution of the single SQL statement that they are defined in. They cannot be referenced in other queries, procedures, or CTEs outside their immediate context.
    • Lifetime of CTEs: Once the query that uses a CTE is executed, the CTE is discarded, meaning it cannot be reused in subsequent queries. This can lead to code duplication if the same logic is needed in multiple places.

        3. Recursive CTE Limitations

    • Recursion Depth: Many databases impose limits on the recursion depth of recursive CTEs. For example, SQL Server limits recursive CTEs to 100 levels of recursion by default, which can hinder queries that need deeper hierarchies.
    • Infinite Recursion Risks: If the termination condition is not well-defined in a recursive CTE, it may lead to infinite recursion, resulting in errors or crashes in the database.

        4. Readability and Complexity

    • Readability Concerns: While CTEs can improve query readability, overly complex CTE structures or nested CTEs can make queries difficult to understand and maintain, especially for those unfamiliar with the structure.
    • Overuse: Relying heavily on CTEs for complex queries might lead to convoluted code that is harder to debug and understand, particularly when multiple CTEs are nested.

        5. Limited Functionality Compared to Views

    • No Indexing: Unlike indexed views (materialized views), CTEs cannot be indexed, which can impact performance when dealing with large datasets or frequently queried data.
    • Static Definition: CTEs are defined in the query itself and do not store any schema information. In contrast, views can encapsulate complex logic and provide a static interface for users.