May 2019 T-SQL Tuesday #114: Puzzles

This month’s T-SQL Tuesday is hosted by Matthew McGiffen. His challenged is to come up with a puzzle.  My entry involves a simple request that is more difficult to solve than it seems on the surface.

TSQL2SDAY-300x300

I was recently asked by one of my siblings during a family party how to find the row with the maximum count from a dataset. Yes, I do have that geeky of a family. He was working with MySQL, but I figured the solution would be similar to T-SQL. Since neither type of database was handy during dinner, I just explained that you have to separate the logic with something like a subquery.

He had his own data, but I’ll explain with a similar scenario in AdventureWorks. The requirement is to find the customer and order count for the customer or customers who have placed the most orders. You could write a query like this which returns the rows first along with all the other rows.

 

SELECT CustomerID, COUNT(*) CountOfOrders
FROM Sales.SalesOrderHeader
GROUP BY CustomerID
ORDER BY COUNT(*) DESC;

One row is returned for each customer, not exactly what is needed.

puzzle1

You could use TOP to return just one row, but this is still not accurate since two customers have the maximum count, and only one of them will show up in these results.

SELECT TOP(1) CustomerID, COUNT(*) CountOfOrders
FROM Sales.SalesOrderHeader
GROUP BY CustomerID
ORDER BY COUNT(*) DESC;

puzzle2

If nesting aggregate functions was possible, you could filter in the HAVING clause. This query won’t work!

SELECT TOP(1) CustomerID, COUNT(*) CountOfOrders
FROM Sales.SalesOrderHeader
GROUP BY CustomerID
HAVING COUNT(*) = MAX(COUNT(*));

puzzle3

What about using a CTE (common table expression) or derived table to find the customers and counts first? You might think a query like this would work:

WITH Counts AS (
SELECT CustomerID, COUNT(*) CountOfOrders
FROM Sales.SalesOrderHeader
GROUP BY CustomerID
)
SELECT CustomerID, CountOfOrders, MAX(CountOfOrders) AS MaxCount
FROM Counts
GROUP BY CustomerID, CountOfOrders
HAVING CountOfOrders = MAX(CountOfOrders);

puzzle4

Unfortunately, since it’s grouped by CustomerID and CountOfOrders, the Max(CountOfOrders) is returned for each group. This is not the solution.

What about my favorite type of function, windowing functions? By adding an OVER clause to MAX(CountOfOrders), the overall max count can be returned.

WITH Counts AS (
SELECT CustomerID, COUNT(*) CountOfOrders
FROM Sales.SalesOrderHeader
GROUP BY CustomerID
)
SELECT CustomerID, CountOfOrders, MAX(CountOfOrders) OVER() AS MaxCount
FROM Counts
GROUP BY CustomerID, CountOfOrders
HAVING CountOfOrders = MAX(CountOfOrders);

puzzle5

The problem is that you cannot directly filter a windowing function. This returns an error:

WITH Counts AS (
SELECT CustomerID, COUNT(*) CountOfOrders
FROM Sales.SalesOrderHeader
GROUP BY CustomerID
)
SELECT CustomerID, CountOfOrders, MAX(CountOfOrders) OVER() AS MaxCount
FROM Counts
GROUP BY CustomerID, CountOfOrders
HAVING CountOfOrders = MAX(CountOfOrders) OVER();

puzzle6

The beauty of window aggregates is that you can have multiple levels of aggregation in the same query. What about moving the MAX(Count) to the CTE? You may think this looks strange since you can’t nest aggregate functions, but in this case, the window function is operating on an aggregate function!

WITH Counts AS (
SELECT CustomerID, COUNT(*) CountOfOrders,
MAX(COUNT(*)) OVER() AS MaxCount
FROM Sales.SalesOrderHeader
GROUP BY CustomerID
)
SELECT CustomerID, CountOfOrders, MaxCount
FROM Counts;

puzzle7

The only thing left to do is to add a filter, and this time it will work since the aggregations are completed in the CTE.

WITH Counts AS (
SELECT CustomerID, COUNT(*) CountOfOrders,
MAX(COUNT(*)) OVER() AS MaxCount
FROM Sales.SalesOrderHeader
GROUP BY CustomerID
)
SELECT CustomerID, CountOfOrders, MaxCount
FROM Counts
WHERE CountOfOrders = MaxCount;

puzzle8

There are many other ways of solving this. You could use a view or a derived table (subquery) for example, but I think this is the best method. Another thing to note is that the solution will require three levels if you don’t use the windowing function. Here is an example:

WITH Counts AS (
SELECT CustomerID, COUNT(*) CountOfOrders
FROM Sales.SalesOrderHeader
GROUP BY CustomerID
),
MaxCount AS (
SELECT MAX(CountOfOrders) AS MaxCount
FROM Counts
)
SELECT CustomerID, CountOfOrders, MaxCount
FROM Counts
CROSS APPLY MaxCount
WHERE CountOfOrders = MaxCount;

puzzle9

The first level returns the list of customers and counts. The second finds the overall maximum count. The outer query uses CROSS APPLY to join the two CTEs. Now that the aggregation is completed before the outer query, the filter works.

I thought this was a nice puzzle for T-SQL Tuesday because the initial requirements sound so simple. The solution is not difficult but also not intuitive.

 

About Kathi Kellenberger

I am the editor of the online journal Simple Talk. I love talking about SQL Server to anyone who will listen, just ask my seven year old granddaughter. I love to write and teach. I am so humbled and honored to say I am a Data Platform MVP, a volunteer with LaunchCode and co-leader of PASS Women in Technology Virtual Group.
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s