Different ways to get to the same thing

When you’re trying to complete a task for the business, the most important thing is to get the data right. Second most important task is to get the data quickly.

One of the ways we can get better at speed is to attempt several slightly different ways that can get you (hopefully) the same data. Some tables work better with one query while some work better with another query.

Let’s work through a scenario in Snowflake and we’ll see which one is faster under “normal” conditions.

The business asks that we find all of the employees in our NewEmployees table who were hired in 2005 and still work with us.

One way that we can get this data is by the following query:

SELECT *
FROM Public.NewEmployee
WHERE YEAR(HireDate) = 2005
AND ExitDate IS NULL;

My results show 25 employees (and hopefully yours does too).

More importantly – in our scenario today – this takes 155 ms.

Next, let’s try for a small change to see how Snowflake handles it.

SELECT *
FROM Public.NewEmployee
WHERE YEAR(HireDate) = '2005'
AND ExitDate IS NULL;

Notice that really the only change here is that I made the year a string instead of an integer.

The good thing is that it gave the same number of results back. The bad news is that it took over 5 times as long to get the answer! This means a mental note to ourselves that we shouldn’t probably try this trick again…

Next, we can try messing with dates to see if there are better ways to get the data. Here, I’m just asking to see if it’s between Jan 1st and Dec 31 of the year in question.

SELECT *
FROM Public.NewEmployee
WHERE HireDate BETWEEN '2005-01-01' AND '2005-12-31'
AND ExitDate IS NULL;

A little faster than our initial query just looking at dates! 8 milliseconds doesn’t sound like a lot but if you’re running the same query thousands or millions of times per day, this can add up.

This time, I want to try converting the Jan 1 and Dec 31 strings to a date – so that Snowflake doesn’t have to do a conversion behind the scenes.

SELECT *
FROM Public.NewEmployee
WHERE HireDate BETWEEN '2005-01-01'::date AND '2005-12-31'::date
AND ExitDate IS NULL;

And Snowflake returns it in:

Just a little bit faster!

Maybe, we can try something besides the BETWEEN clause….

SELECT *
FROM Public.NewEmployee
WHERE (HireDate >= '2005-01-01'::date AND 
    HireDate <= '2005-12-31'::date)
AND ExitDate IS NULL;

Argh! Looks like breaking apart the BETWEEN into 2 different clauses was not the way to go – at least this time.

Now, what did I really prove with all of these different queries and watching the clock?

  1. There are many ways to get to the same answer.
  2. Some ways just work better than others.
  3. When you need to run a query hundreds of times a day, even milliseconds can make a difference.

And so with that, we have come to the end of another fun post. Hopefully, you can see that without doing anything very complicated I can test some queries that make significant differences in speed, especially when you’re running the same query hundreds of times – or even hundreds of queries once.

As a previous manager of mine would state – “Computer Science is a science, not an art form.” (We definitely had multiple discussions about how he wasn’t quite right on this one.) Hopefully, you can see that there is some creativity, aka art, in being able to work with the queries – knowing there is more than one way to make things happen – but also there is lots of data, aka science, to show how right or wrong you are.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.