Now that we know so much about getting data from Snowflake using the query history, we should be able to get all kinds of data that will be helpful for us. In the next few queries, I want to provide some queries I use that should make your Snowflake usage a bit more useful.
The first one I want to tackle today is seeing what the longest-running queries are:
SELECT query_id
,total_elapsed_time/1000 as ElapsedTimeSeconds
,user_name
,query_text
,start_time
FROM snowflake.account_usage.query_history
WHERE start_time > current_date - 30
ORDER BY total_elapsed_time desc
limit 100;
In my SQL Server environment, I like to keep an eye on the queries that run each week, which take a good amount of time. If, for some reason, they’re not on the list one week – or they take a good amount of time longer than is normal -, then it’s time to check into what is going on – even if there are no true errors or warnings happening.
Another of my favorite queries is this one:
select *
from snowflake.account_usage.query_history
where warehouse_size is null
and start_time > current_date - 90
This shows those queries that haven’t needed a warehouse in the last 90 days. You might be wondering – “Sherpa, doesn’t everything need a warehouse in Snowflake?”
It’s actually not required for all tasks. If you were to run the following query:
SELECT COUNT(1) FROM MyTable;
Snowflake can actually get all of the data it needs to do this without querying the database at all – just the metadata that Snowflake can get it’s little grubby hands on with no problems. Also, all of our queries that I have shown over the last few posts do not require a warehouse – since they deal with query history.
Now, probably my most useful query that managers love to see. This query shows the most expensive, i.e. processor intensive, queries – and how many times they’re called – in the last 7 days.
SELECT query_parameterized_hash,
COUNT(1) as NumberOfExecutions,
any_value(query_text) as ExampleText,
SUM(credits_used_cloud_services) as TotalCreditsUsed
FROM snowflake.account_usage.query_history
WHERE start_time > current_date - 7
GROUP BY query_parameterized_hash
ORDER BY TotalCreditsUsed desc
limit 100;
Not only can I now show which departments are more expensive than others, but I can show their worst performing queries and how often they’re run! I cannot tell you how extremely useful this data can be to make queries faster for a development team and a company.
In my next blog post, I have a few more queries to share with you that can help you find very useful information for your Snowflake instance and for your teams. Hopefully, you’ll be able to use the queries I’ve shown today to make your life just a little bit easier. Until next time my friends!
2 comments