Distinctity. Yes, it’s a thing.

There are a lot of things that can be said about the keyword DISTINCT within Snowflake. One of them you’re probably used to, and one of them you probably aren’t.

SELECT DISTINCT DEPARTMENT_ID FROM public.employees;

In this case, DISTINCT is simply ensuring we have unique data. If you remove the DISTINCT keyword, the version I’m looking at will display 290 rows. Many of them are the same, but some are different. With the DISTINCT keyword, I get a total of 16 rows, non-sorted.

So, yes, think – uniqueness.

Now, the more fun – “new-ish” – version of the DISTINCT keyword.

Let’s take two values – A and B. Let’s define A = 7 and B = 2.

Snowflake will allow you to ask if A IS DISTINCT FROM B. Thankfully, in this case, it is.

Now, we all kinda get this one. But what if we define A = 7 and B = NULL? If we’ve been dealing with data for a while, we know that NULL is one of those magic words that make things sometimes not work as we would expect it to.

The only thing that will not show up data-wise is if you actually put the same value on both sides of this equation. For example, 2 NULLs or 2 character strings. Everything else will show up here!

Now, we do have where we only want those wild cases where things are equal – to make that happen we do the following:

Now, this is actually kinda cool because to write this out the long way would take this long string:

((A <> B OR A IS NULL OR B IS NULL) AND NOT (A IS NULL AND B IS NULL))

I don’t know about you, but I much prefer the shorter version. Way more readable, at least to me.

Now, for all of you SQL Server fans, don’t feel like you’re the red-headed stepchild. With SQL Server 2022, you too can have this functionality. Obviously, the same holds true for Azure, Managed Instance, and Fabric.

With that my friends, I hope you learned a little bit about DISTINCT and some of the things you can do with it in both SQL Server and Snowflake.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.