Questioning the Top Results in Snowflake

One of the things you may have noticed after reading our last post on Top (found here) is that sometimes SAMPLE doesn’t give the answer you want.

For example, we can run the same query to get 20% of the table. Remember that this table has 290 rows in total.

As you can see on the right, there are 69 rows in 20% of the table.

I don’t know about y’all, but 20% of 290 is 58 with my fancy calculator.

Just because I like to give Snowflake a chance, I run my query again. And this time – a completely different answer. Yes, that’s 50 rows this time.

So, what’s going on here? Is Snowflake confused?

This time, we’re actually asking the wrong question of Snowflake. When we ask Snowflake for a sample of 20% of the rows, we’re actually saying “For each row in this table, there is a 20% chance that the row will be returned.”

Very different question. Very different result. It does very much explain why we get different answers each time.

This, my friends, is why we should be checking our answers and asking the correct questions of our databases. Yes, I promise – next time I’ll show you a real way to get a percentage of a table back in response to our queries.

As a side note – if you ask SAMPLE for a certain number of rows – it does give back that number of rows each time.

I want to finish out today’s post with what it felt like when you’re calling the query above and getting different answers. Hopefully, everyone knows the true definition of insanity… According to Albert Einstein – “Insanity is doing the same thing over and over and expecting different results.”

2 comments

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.