The Power of Memoizable Functions in Snowflake

If you’ve been working with data for several years like I have – mostly using the SQL language – then I have a term for you that other languages, like JavaScript or Python, have had for a few years. The term is “memoizable” and it means, in a nutshell, to remember. A memoizable function caches the results so that it can return the resultset in record time, given the same parameters.

Sadly, SQL Server does not have this ability at least not yet. As we all know with Microsoft, it could come with any upcoming version of SQL Server or not at all. Snowflake, on the other hand, allows for memoizable functions – at least since early 2023.

First, let’s look at a “regular” function and then we’ll compare it to a memoizable function and review the results.

CREATE OR REPLACE FUNCTION LargeOrderCustomers()
returns array
as $$
select array_agg(o_custkey)
from (
      select o_custkey, sum(o_totalprice) total
      from snowflake_sample_data.tpch_sf100.orders
      group by o_custkey
      having total > 7500000
)
$$;

CREATE OR REPLACE FUNCTION LargeOrderCustomersMem()
returns array
memoizable
as $$
select array_agg(o_custkey)
from (
      select o_custkey, sum(o_totalprice) total
      from snowflake_sample_data.tpch_sf100.orders
      group by o_custkey
      having total > 5000000
)
$$;

Notice that the only real difference is the term memoizable in the definition of the 2nd function. The way that I have the functions defined they will return slightly different resultsets so be aware of that when you compare the two functions.

The first function, LargeOrderCustomers, returns an array of integers identifying all customers who have spent over $7.5 Million in approximately 4 seconds on a small warehouse. The second function, LargeOrderCustomersMem, returns an array of integers for those customers who have spent over $5 Million in approximately .3 seconds on a small warehouse (after the first run)!

I don’t know about y’all but for functions that must be run often, this could be a huge game changer! If you were to run this function 1,000 times per day, you would be saving a little over an hour of processing time. And that, my friend, translates to some cold, hard cash.

Now, there are some limitations to the “memoizable” keyword.

  1. It only works with SQL UDFs.
  2. It only works on UDFs without parameters.
  3. It only works on UDFs that return scalars or arrays.
  4. Each memoizable function has a 10 KB limit for results.

With those limitations, you can see that there is a small subset of functions that work great with these limitations, but it can be truly a game changer for those functions.

Next time, I’ll go over where I use a memoizable function in Production to make my life just a little bit easier (and the users never even know about it!) So until next time my friends!

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.