Flattening out data in SnowFlake

Feb 14, 2023

—

Sometimes you have to work with different kinds of data in Snowflake – just like in most databases. Today I want to work with some JSON – the same basic premise works for XML, Arrays, and other variants.

SELECT *
FROM TABLE(FLATTEN(input => parse_json('[3089, NULL, 217]')));

So, first, let’s go thru the parse_json portion. Notice that the JSON contained within is truly a simple JSON example. 3 fields within – 2 having data.

With the parse_json function, the largest that the JSON can be is 8 MB compressed in size.

If we are to run just the following – SELECT parse_json(‘[3089, NULL, 217]’);

Notice that Snowflake converts our string into actual JSON. Not just some run-of-the-mill string with lots of pretty brackets and whatnot…

The function FLATTEN takes our JSON object and gives us the ability to put the data into an actual table. If we run the original query above, then we’ll get the data shown below:

Notice the THIS column has the values from our original JSON. The Value column has the values that we could insert into a table. INDEX is the position within the JSON. PATH is the position as well – notice that it’s within an array. Within a larger set of data, you could see [1] [1] [2] quite easily. KEY is, well, the primary key for the data that we’re pushing to a table. SEQ is the overall sequence of the data itself. Put another way,

SEQ: The input record has a unique sequence number; the sequence is not guaranteed to be gap-free or organized in any specific way.
KEY: This column contains the key to the exploded value for Maps or Objects.
PATH: The path to the element that needs to be flattened within a Data Structure.
INDEX: If the element is an array, INDEX is the index; otherwise, NULL.
VALUE: The value of the flattened array/object.
THIS: It is the element that has been Flattened (useful in Recursive Flattening).

Hopefully, you’ll play around with this data. You could find the differences such as below:

Notice that the NULL is now a string from within the JSON. Snowflake simply shows us all of this by looking at the VALUE column and seeing the quotes around the data. Notice that we now have multiple data types in our JSON.

This time, instead of NULL, we changed it to undefined – something that is used within arrays in Snowflake. It means that, simply, it is undefined – not that it is NULL. Notice that instead of the three rows that we’ve been showing heretofore, we now only have two. The row missing – index = 1 – is now gone.

With that, my friends, we’ve run through a lot this time. Next time, I hope to wrap a few things up for you to be able to use the data and functionality that I’ve put forth in Snowflake. So until then…

Snowflake

Flattening out data in SnowFlake

Comments

Leave a Reply Cancel reply