Getting Data to Group Better

A lot of the time you’re going to be asked to group items. Sometimes, you have to look at the exact same data in multiple ways. It can be just flat ugly to look at but it does what you need to. Well, today we can talk about a way to make it a bit prettier!

Let’s look at our dbo.Person1 table that we worked with earlier. Today, I want to find a count of all of the persons in each of the following categories: ZipCode, Gender, and Email Domains. And just for fun, let’s add in there where each of those categories cross – for example, Zipcode and Gender, ZipCode and Email Domain, etc…

Most people would think all kinds of awful thoughts at this point about all of the GROUP BY statements you’ll have to write. For anyone wondering – this is one way to do it. Notice all kinds of UNION statements and I’m sure someone is wondering if that’s truly all of the combinations. And we don’t want to go into the maintenance on this if things do happen to change…

SELECT NULL AS EmailDomain, NULL AS Gender, ZipCode, COUNT(1)
FROM dbo.Person1
GROUP BY ZipCode
UNION ALL
SELECT NULL, Gender, NULL, COUNT(1)
FROM dbo.Person1
GROUP BY Gender
UNION ALL
SELECT RIGHT(Email, LEN(Email) - CHARINDEX('@', Email)), NULL, NULL, COUNT(1)
FROM dbo.Person1
GROUP BY RIGHT(Email, LEN(Email) - CHARINDEX('@', Email))
UNION ALL
SELECT NULL, Gender, ZipCode, COUNT(1)
FROM dbo.Person1
GROUP BY Gender, ZipCode
UNION ALL
SELECT RIGHT(Email, LEN(Email) - CHARINDEX('@', Email)), Gender, NULL, COUNT(1)
FROM dbo.Person1
GROUP BY RIGHT(Email, LEN(Email) - CHARINDEX('@', Email)), Gender
UNION ALL 
SELECT RIGHT(Email, LEN(Email) - CHARINDEX('@', Email)), NULL, ZipCode, COUNT(1)
FROM dbo.Person1
GROUP BY RIGHT(Email, LEN(Email) - CHARINDEX('@', Email)), ZipCode
UNION ALL 
SELECT RIGHT(Email, LEN(Email) - CHARINDEX('@', Email)), Gender, ZipCode, COUNT(1)
FROM dbo.Person1
GROUP BY RIGHT(Email, LEN(Email) - CHARINDEX('@', Email)), Gender, ZipCode

If there is a fourth category added to the mix that someone wants to see, you might be tempted to punch that person! (Though I don’t condone the violence, I completely agree with the sentiment.)

With SQL Server 2016, you have some new goodness with GROUPING SETS functionality. You now just need to tell SQL Server the different ways to categorize your data.

SELECT RIGHT(Email, LEN(Email) - CHARINDEX('@', Email)) AS EmailDomain, 
	Gender, 
	ZipCode, 
	COUNT(1)
FROM dbo.Person1
GROUP BY 
	GROUPING SETS (
		(RIGHT(Email, LEN(Email) - CHARINDEX('@', Email)), Gender, ZipCode),
		(RIGHT(Email, LEN(Email) - CHARINDEX('@', Email)), Gender),
		(RIGHT(Email, LEN(Email) - CHARINDEX('@', Email)), ZipCode),
		(RIGHT(Email, LEN(Email) - CHARINDEX('@', Email))),
		(Gender, ZipCode),
		(Gender),
		(ZipCode))

You’ll notice that that code is a heck of a lot cleaner. We just have to put in the GROUP BY clause the wonderful GROUPING SETS keyphrase and then tell SQL Server all of the ways to categorize that we want our data categorized.

If someone wants to add another category to our groups, they’ll just need to add the categories separated by commas. The code is fairly clean and fairly readable by anyone.

Even better item, though, is that it is much better performance-wise than the original query we came up with!

Next time that you want to categorize multiple columns in one query, be sure to remember GROUPING SETS just to make your life easier.

With that, my friends, I must bid you adieu. But next time, just know that we’ll add even more functionality to GROUPING SETS to make your life just a little bit easier and find out lots of goodness when it comes to data.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.