SQL Server: aggregative queries
Quite often we need to aggregate results from many rows. Here are a couple of techniques for doing this in Transact-SQL, most of these queries will probably only work in SQL Server.
Id | Name |
1 | Denmark |
2 | Finland |
3 | Iceland |
4 | Norway |
5 | Sweden |
In order to have some data to play with I have made up some competition results between the five Nordic countries. The data is held in two tables; Nations (above) and Medals (below). For simplicity’s sake the medal values are stored as CHAR(6) with a CHECK constraint instead of adding another table.
Id | NationId | Value |
1 | 1 | Gold |
2 | 1 | Gold |
3 | 1 | Silver |
4 | 1 | Silver |
5 | 1 | Bronze |
6 | 1 | Bronze |
7 | 1 | Bronze |
8 | 2 | Gold |
9 | 2 | Gold |
10 | 4 | Gold |
11 | 4 | Gold |
12 | 4 | Bronze |
13 | 4 | Bronze |
14 | 4 | Bronze |
15 | 5 | Silver |
16 | 5 | Silver |
17 | 5 | Silver |
18 | 5 | Silver |
GROUP BY
First of we have the standard GROUP BY clause. This trivial example counts the total amount of medals for each nation.
SELECT
n.Name AS Nation,
COUNT(m.Value) AS Medals
FROM
Nations n
LEFT JOIN Medals m ON m.NationId = n.Id
GROUP BY
n.Name
ORDER BY
Medals DESC,
Nation
Nation | Medals |
Denmark | 7 |
Norway | 5 |
Sweden | 4 |
Finland | 2 |
Iceland | 0 |
XML PATH
FOR XML is intended to be used to transform result sets into XML. But it can also be abused to concatenate text. For example a comma separated list of medals. Here the STUFF function takes care of the leading comma and the SUBSTRING call is a simple trick to sort the medal values in the natural order (Gold, Silver, Bronze).
SELECT
n.Name AS Nation,
COALESCE(STUFF((
SELECT
', ' + CAST(COUNT(m.Value) AS VARCHAR(5))
+ ' ' + RTRIM(m.Value)
FROM
Medals m
WHERE
m.NationId = n.Id
GROUP BY
m.Value
ORDER BY
SUBSTRING(m.Value, 3, 2)
FOR XML PATH('')
), 1, 2, ''), '') AS Medals
FROM
Nations n
ORDER BY
Nation
Nation | Medals |
Denmark | 2 Gold, 2 Silver, 3 Bronze |
Finland | 2 Gold |
Iceland | |
Norway | 2 Gold, 3 Bronze |
Sweden | 4 Silver |
PIVOT
Sometimes we may want to rotate rows to columns or columns to rows. This is kind of what the PIVOT and UNPIVOT operators in a FROM clause do. Here the result is a pretty useful medal table.
SELECT
p.Nation,
COALESCE(p.Gold, 0) AS Gold,
COALESCE(p.Silver, 0) AS Silver,
COALESCE(p.Bronze, 0) AS Bronze
FROM
(SELECT
n.Name AS Nation,
m.Value
FROM
Nations n
LEFT JOIN Medals m ON m.NationId = n.Id
) o
PIVOT(
COUNT(o.Value)
FOR
o.Value IN (Gold, Silver, Bronze)
) p
ORDER BY
Gold DESC,
Silver DESC,
Bronze DESC,
Nation
Nation | Gold | Silver | Bronze |
Denmark | 2 | 2 | 3 |
Norway | 2 | 0 | 3 |
Finland | 2 | 0 | 0 |
Sweden | 0 | 4 | 0 |
Iceland | 0 | 0 | 0 |
GROUPING SETS
The GROUPING SETS, CUBE and ROLLUP operators are extensions to the GROUP BY clause that allows grouping by more than one set. This example uses it to include a total of all awarded medals per nation together with the totals for each value. The GROUPING function provides a way to distinguish between a NULLs that are the result of missing data and NULLs that are the result of grouping.
SELECT
n.Name AS Nation,
COUNT(m.Value) AS Quantity,
COALESCE(m.Value, 'Total') AS Value
FROM
Nations n
LEFT JOIN Medals m ON m.NationId = n.Id
GROUP BY GROUPING SETS
(
(n.Name, m.Value),
(n.Name)
)
HAVING
m.Value IS NOT NULL
OR GROUPING(m.Value) = 1
ORDER BY
Nation,
COALESCE(SUBSTRING(m.Value, 3, 2), 'Z')
Nation | Quantity | Value |
Denmark | 2 | Gold |
Denmark | 2 | Silver |
Denmark | 3 | Bronze |
Denmark | 7 | Total |
Finland | 2 | Gold |
Finland | 2 | Total |
Iceland | 0 | Total |
Norway | 2 | Gold |
Norway | 3 | Bronze |
Norway | 5 | Total |
Sweden | 4 | Silver |
Sweden | 4 | Total |
APPLY
CROSS APPLY and OUTER APPLY are similar to INNER JOIN and OUTER JOIN but with the difference that the left side is always evaluated first. This allows for some queries that are impossible otherwise or can give performance benefits over alternative queries. Here it is used to get the top 3 medals for each nation, something that may be complicated to do in other ways.
SELECT
n.Name AS Nation,
t.Value
FROM
Nations n
CROSS APPLY
(SELECT
TOP 3
m.Value
FROM
Medals m
WHERE
m.NationId = n.Id
ORDER BY
SUBSTRING(m.Value, 3, 2)
) t
ORDER BY
Nation
Nation | Value |
Denmark | Gold |
Denmark | Gold |
Denmark | Silver |
Finland | Gold |
Finland | Gold |
Norway | Gold |
Norway | Gold |
Norway | Bronze |
Sweden | Silver |
Sweden | Silver |
Sweden | Silver |
OVER
The OVER clause is used to apply a function over its own range. This would normally be a partitioned range for aggregative functions or an ordered range for ranking functions. In this case we use it to return the total amount of medals awarded in the competition and then calculate how many percent of the total each nation has earned.
SELECT
n.Name AS Nation,
CAST(ROUND(
100.0
* COUNT(m.Value)
/ SUM(COUNT(m.Value)) OVER ()
, 0) AS INT) AS Percentage
FROM
Nations n
LEFT JOIN Medals m ON m.NationId = n.Id
GROUP BY
n.Name
ORDER BY
Percentage DESC,
Nation
Nation | Percentage |
Denmark | 39 |
Norway | 28 |
Sweden | 22 |
Finland | 11 |
Iceland | 0 |