Pseudorandom Knowledge

SQL Server: aggregative queries

Quite often we need to aggregate results from many rows. Here are a couple of techniques for doing this in Transact-SQL, most of these queries will probably only work in SQL Server.

Id Name
1 Denmark
2 Finland
3 Iceland
4 Norway
5 Sweden

In order to have some data to play with I have made up some competition results between the five Nordic countries. The data is held in two tables; Nations (above) and Medals (below). For simplicity’s sake the medal values are stored as CHAR(6) with a CHECK constraint instead of adding another table.

Id NationId Value
1 1 Gold
2 1 Gold
3 1 Silver
4 1 Silver
5 1 Bronze
6 1 Bronze
7 1 Bronze
8 2 Gold
9 2 Gold
10 4 Gold
11 4 Gold
12 4 Bronze
13 4 Bronze
14 4 Bronze
15 5 Silver
16 5 Silver
17 5 Silver
18 5 Silver

GROUP BY

First of we have the standard GROUP BY clause. This trivial example counts the total amount of medals for each nation.

SELECT
    n.Name AS Nation,
    COUNT(m.Value) AS Medals
FROM
    Nations n
    LEFT JOIN Medals m ON m.NationId = n.Id
GROUP BY
    n.Name
ORDER BY
    Medals DESC,
    Nation
Nation Medals
Denmark 7
Norway 5
Sweden 4
Finland 2
Iceland 0

XML PATH

FOR XML is intended to be used to transform result sets into XML. But it can also be abused to concatenate text. For example a comma separated list of medals. Here the STUFF function takes care of the leading comma and the SUBSTRING call is a simple trick to sort the medal values in the natural order (Gold, Silver, Bronze).

SELECT
    n.Name AS Nation,
    COALESCE(STUFF((
        SELECT
            ', ' + CAST(COUNT(m.Value) AS VARCHAR(5))
                + ' ' + RTRIM(m.Value)
        FROM
            Medals m
        WHERE
            m.NationId = n.Id
        GROUP BY
            m.Value
        ORDER BY
            SUBSTRING(m.Value, 3, 2)
        FOR XML PATH('')
    ), 1, 2, ''), '') AS Medals
FROM
    Nations n
ORDER BY
    Nation
Nation Medals
Denmark 2 Gold, 2 Silver, 3 Bronze
Finland 2 Gold
Iceland
Norway 2 Gold, 3 Bronze
Sweden 4 Silver

PIVOT

Sometimes we may want to rotate rows to columns or columns to rows. This is kind of what the PIVOT and UNPIVOT operators in a FROM clause do. Here the result is a pretty useful medal table.

SELECT
    p.Nation,
    COALESCE(p.Gold, 0) AS Gold,
    COALESCE(p.Silver, 0) AS Silver,
    COALESCE(p.Bronze, 0) AS Bronze
FROM
    (SELECT
        n.Name AS Nation,
        m.Value
    FROM
        Nations n
        LEFT JOIN Medals m ON m.NationId = n.Id
    ) o
    PIVOT(
        COUNT(o.Value)
    FOR
        o.Value IN (Gold, Silver, Bronze)
    ) p
ORDER BY
    Gold DESC,
    Silver DESC,
    Bronze DESC,
    Nation
Nation Gold Silver Bronze
Denmark 2 2 3
Norway 2 0 3
Finland 2 0 0
Sweden 0 4 0
Iceland 0 0 0

GROUPING SETS

The GROUPING SETS, CUBE and ROLLUP operators are extensions to the GROUP BY clause that allows grouping by more than one set. This example uses it to include a total of all awarded medals per nation together with the totals for each value. The GROUPING function provides a way to distinguish between a NULLs that are the result of missing data and NULLs that are the result of grouping.

SELECT
    n.Name AS Nation,
    COUNT(m.Value) AS Quantity,
    COALESCE(m.Value, 'Total') AS Value
FROM
    Nations n
    LEFT JOIN Medals m ON m.NationId = n.Id
GROUP BY GROUPING SETS
    (
        (n.Name, m.Value),
        (n.Name)
    )
HAVING
    m.Value IS NOT NULL
    OR GROUPING(m.Value) = 1
ORDER BY
    Nation,
    COALESCE(SUBSTRING(m.Value, 3, 2), 'Z')
Nation Quantity Value
Denmark 2 Gold
Denmark 2 Silver
Denmark 3 Bronze
Denmark 7 Total
Finland 2 Gold
Finland 2 Total
Iceland 0 Total
Norway 2 Gold
Norway 3 Bronze
Norway 5 Total
Sweden 4 Silver
Sweden 4 Total

APPLY

CROSS APPLY and OUTER APPLY are similar to INNER JOIN and OUTER JOIN but with the difference that the left side is always evaluated first. This allows for some queries that are impossible otherwise or can give performance benefits over alternative queries. Here it is used to get the top 3 medals for each nation, something that may be complicated to do in other ways.

SELECT
    n.Name AS Nation,
    t.Value
FROM
    Nations n
    CROSS APPLY
    (SELECT
        TOP 3
        m.Value
    FROM
        Medals m
    WHERE
        m.NationId = n.Id
    ORDER BY
        SUBSTRING(m.Value, 3, 2)
    ) t
ORDER BY
    Nation
Nation Value
Denmark Gold
Denmark Gold
Denmark Silver
Finland Gold
Finland Gold
Norway Gold
Norway Gold
Norway Bronze
Sweden Silver
Sweden Silver
Sweden Silver

OVER

The OVER clause is used to apply a function over its own range. This would normally be a partitioned range for aggregative functions or an ordered range for ranking functions. In this case we use it to return the total amount of medals awarded in the competition and then calculate how many percent of the total each nation has earned.

SELECT
    n.Name AS Nation,
    CAST(ROUND(
        100.0
        * COUNT(m.Value)
        / SUM(COUNT(m.Value)) OVER ()
    , 0) AS INT) AS Percentage
FROM
    Nations n
    LEFT JOIN Medals m ON m.NationId = n.Id
GROUP BY
    n.Name
ORDER BY
    Percentage DESC,
    Nation
Nation Percentage
Denmark 39
Norway 28
Sweden 22
Finland 11
Iceland 0