Even more Great Queries Using Query History

In our last post, we talked about some of my favorite queries I use in Snowflake to see various items of interest – such as finding the worst-performing queries. For today’s post, though, I want to talk about performance tuning.

Yes, you read that right. We’re going to use query history to do some fun performance tuning.

Finding Queries with Poor Pruning:

SELECT
        query_text as Query,
        partitions_scanned as PScanned,
        partitions_total as PTotal,
        partitions_scanned/partitions_total as PercentageScanned,
        bytes_scanned/power(1024,3) as TotalGBytesScanned, 
FROM snowflake.account_usage.query_history
WHERE
    start_time > current_date - 7
    AND bytes_scanned/POWER(1024,3) > 1
    AND fraction_scanned > 0.8
LIMIT 100

This shows 100 of the queries that have been run in the past 7 days that have had to scan 80% of the tables and the data is more than 1 GB. This shows that your table is not very well clustered, if at all, thus making your queries have to work harder than they probably should.

As you work with the data from this query, be prepared to adjust the number of bytes_scanned frequently. With a system that sees a lot of daily use, you may want to only look at items over 40 GB for the first few weeks as you – or your team – work through various items that make the system both slow and expensive.

Queries Spilling to Disk

SELECT
        query_text,
        start_time,
        partitions_scanned as PScanned,
        partitions_total as PTotal,
        bytes_scanned/power(1024,3) as TotalGBytesScanned, -- convert to gigabytes
        bytes_spilled_to_local_storage/power(1024,3) as SpilledToLocalStorageGB,
        bytes_spilled_to_remote_storage/power(1024,3) as SpilledToRemoteStorageGB,
        SpilledToLocalStorageGB + SpilledToRemoteStorageGB as TotalSpillageInGB
FROM snowflake.account_usage.query_history
WHERE
    start_time > current_date - 7
    and TotalSpillageInGB> 0
ORDER BY TotalSpillageInGB desc
LIMIT 100

When the warehouse that you’re running the query on runs out of memory, then the query begins to start using the local disk for its temporary query results. When there is no space left on the local disk, it starts pushing data to the remote disk. This, when done on Snowflake, can be both slower than a normal query and expensive.

Possible ways to make these queries faster are by processing less data – by restricting the data processed or by splitting up the data processed into smaller chunks. If neither of those is possible, then it may be cheaper to use a larger warehouse size, i.e. going from L to XL.

And that, my friends, are the queries I have that help me work with Snowflake. Hopefully, they will be useful for you in working with others and making your queries work that much better. Until next time my friends!


We may spill to disk, but hopefully we never spill the wine!

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.