Making Python work with Snowflake For Fun and Pleasure

As we did last week in our post on Creating a new function in Snowflake, I want to talk about some of the scripts and languages I use to make my life easier in Snowflake.

As the site LearnPython says: “Today, Python is one of the most popular programming languages because of its beginner-friendly syntax, efficiency, and applicability to a wide range of use cases. Python syntax is similar to English, which makes it relatively easy to read and understand even if you’ve never written a line of code before.”

I use a Python script as a function inside of my Snowflake instance to make my life easier and today I’d like to share it with you.

create or replace function text_cleansing(s string)
returns string
language python
runtime_version = '3.8'
handler = 'cleansing'
as
$$
def cleansing(x):
    import re
    import string
     
    # lower case
    x = x.lower()
     
    # Remove HTML tags/markups
    x = re.sub('<.*?>', '', x)
     
    # remove digits
    x = re.sub('\d+', '', x)
     
    # remove punctuation
    x = x.translate(str.maketrans('', '', string.punctuation))
     
    # remove extra space and tabs
    x = re.sub('\s+', ' ', x)
     
    # remove leading/trailing whitespace
    x = x.strip()
     
    return x
$$;

Unlike I usually do, I’m not going to explain the main section of code – at least no more than to tell you to go learn Python. It is truly awesome and can be a game-changer.

But – and you knew there was going to be a but – there are a few things to remember.

First, and probably most important, please remember to declare that you’re working in Python. If you declare that it’s LISP, Haskell, or even C#, then I cannot be held responsible for what Snowflake does or does not do for you. Hint: It won’t do anything at the moment because it doesn’t recognize any of those languages. At some point in the future, it could, but for now – no. Just no.

Also, note that, unlike Javascript, you must declare the runtime_version. This is because – if you’re a Python developer – many functions can change based on the version of the Python executable. If you have any doubt as to when your version of the run time will be deprecated, go to this page to determine when yours goes out of support – hereby noting that 3.8 goes out in 2024.

And for anyone who is wondering about the “handler” phrase, a handler is just “functions that ‘handle’ certain events that they are registered for.” It is not a requirement for Python functions in Snowflake.

Finally, again note the use of “$$”. It helps with string handling in functions and is an all-around good practice with functions in Snowflake.

And with that, I want to thank you for reading this post. Hopefully, you have learned something and will come back next week to read more about functions. With that, let’s go out with one of my favorite artists and songs.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.