Basic Analysis of Data with Snowflake

Now and again, you’ll be asked to do some basic analysis of your data in Snowflake. Now by “basic analysis,” I mean for each column tell how many rows are populated, what’s the largest and smallest value in the column, etc…

Some of you will want to review each and every column of every table in your database by hand.

I don’t know about y’all but no matter how much I like working with data, I despise doing manual repetitive tasks. Thus, the reason for today’s script – automating the heck out of work.

declare

    sql string;

    final_sql string;

    ccur cursor for (select TABLE_NAME, COLUMN_NAME, DATA_TYPE from INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'PUBLIC');

    res resultset;

   

begin

  final_sql := '';


  for record in ccur do

    if(record.data_type = 'TEXT') THEN

        sql := 'SELECT ''' || record.table_name || ''' as TABLENAME, ''' || record.column_name || ''' as COLUMN_NAME, ''' || record.data_type || ''' as DATA_TYPE, count(' || record.column_name || ') as count, MAX(LEN(' || record.column_name || ')) as max_length, null as min_value, null as max_value FROM ' || record.TABLE_NAME || ' ';

    else

        sql := 'SELECT ''' || record.TABLE_NAME || ''' as TABLENAME, ''' || record.column_name|| ''' as COLUMN_NAME, ''' || record.data_type || ''' as DATA_TYPE, count(' || record.column_name || ') as count, null as max_length, MIN(TO_CHAR(' || record.column_name || ')) as min_value, MAX(TO_CHAR(' || record.column_name || ')) as max_value FROM ' || record.TABLE_NAME || ' ';

    end if;

    if (final_sql<>'') then

        final_sql := final_sql || ' UNION ALL ';

    end if;

    final_sql := final_sql || sql;

     

  end for;

  res := (execute immediate :final_sql);
  return table(res);

end;

Now, I’m sure you have questions as you review the script. Let’s hit the highlights here:

  1. Why are we using “:=” instead of “=”? Is this some odd language we didn’t declare in Snowflake?
    • Nope. This is called Snowflake Scripting. As you can see, it’s close to SQL syntax, but not quite. We’re using the “:=” to assign a value to a variable we’ve already declared.
  2. Cursors? In SQL Server, they’re “nasty” and should be avoided like the plague.
    • Yes, you should feel dirty after writing a cursor. Is it perfect? Absolutely not. But a limited subset of data will flow into the cursor, so it’s not completely awful. Will you need to take a bath afterward? Probably.
  1. Why do you break out the TEXT data type? And what is the TEXT data type? I didn’t declare them in my tables!
    • TEXT is just the varchar or nvarchar data types. Snowflake made it easy on themselves by just saying TEXT.
    • Most users will not want to see the MAX value of a text field. I know I don’t need to see it normally. If this is not true, or only true sometimes – I will let you change the script above to fit your needs.
  2. A resultset? What is that and why do I need to table-fy it at the end of my procedure before I’m done?
    • A Resultset is a data type that points to the result set of a query.
      • Yes, it’s a pointer. You thought you got away from those when you stopped coding C, C#, PASCAL, or C++, right?
    • The resultset allows us to do dynamic SQL in our case.
    • The table() function makes a resultset be a table.

And that, my friends, is a fairly elegant way of creating Dynamic SQL within Snowflake to do what is needed in this case. Have fun messing around with the cursor to get exactly what you need into the resultset. Then have some fun with the resultset analyzing data to your heart’s content – without having to result to Python, R, or (shivers) manually going through each column. Until next time!

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.