Pseudorandom Knowledge

SQL Server: recursion in queries

Some data is hierarchical or recursive in nature. Such data can be stored in a table by having it point back to itself.

Id ParentId Name
1 NULL /
2 1 var/
3 1 boot/
4 3 vmlinuz
5 3 initrd.img

To query this table recursively we can use the WITH clause to create a Common Table Expression. A CTE can be useful instead of temporary tables or sub queries. But here we use it for its desirable property of being able to reference itself.

WITH Branches (Id, Name)
AS
(
    SELECT t.Id, CAST(t.Name AS VARCHAR(512))
    FROM Tree t
    WHERE t.ParentId IS NULL
 
    UNION ALL
 
    SELECT t.Id, CAST(b.Name + t.Name AS VARCHAR(512)) AS Name
    FROM Tree t INNER JOIN Branches b ON t.ParentId = b.Id
)
SELECT Name FROM Branches ORDER BY Name

A recursive WITH query consists of three parts. First we select all the root elements. In this case there’s just one but it could be many more.

Secondly we select all the children. This is where the magic happens because the second SELECT will be called recursively until there are no more children. By the way, the CASTs are necessary when concatenating strings.

Finally we use the result in an ordinary SELECT query.

Name
/
/boot/
/boot/initrd.img
/boot/vmlinuz
/var/

Recursion in tables

There is a better way to store hierarchical data in SQL Server, by using a specific SQL Server type called hierarchyid.

HierarchyId Name
/ /
/1/ var/
/2/ boot/
/2/1/ vmlinuz
/2/2/ initrd.img

The query to get the same results as above would then look like follows.

WITH Branches (HierarchyId, Name)
AS
(
    SELECT t.HierarchyId, CAST(t.Name AS VARCHAR(512))
    FROM Tree t
    WHERE t.HierarchyId = hierarchyid::GetRoot()

    UNION ALL

    SELECT t.HierarchyId, CAST(b.Name + t.Name AS VARCHAR(512)) AS Name
    FROM Tree t INNER JOIN Branches b ON t.HierarchyId.GetAncestor(1) = b.HierarchyId
)
SELECT Name FROM Branches ORDER BY Name

This query doesn't seem much simpler, but there are other queries that are easier with this approach. Another, very important, benefit of using hierarchyid is data integrity. For example, it becomes to introduce loops and, as long as the column has a unique constraint, it is impossible to add more than one root.