Getting the Minimum or Maximum of One Column

By the title of this post, you’re thinking “Ho hum. Sherpa will tell us about the stupid MIN and MAX functions in Snowflake and how great they are.”

Well, they are great. Let’s not kid ourselves about that one.

SELECT department_id, MIN(employee_id) AS FirstEmp
FROM public.employees
GROUP BY department_id
ORDER BY department_id;

This simply tells us the first employee id in the employees table for each department.

SELECT DISTINCT department_id, 
    MIN(employee_id) OVER 
           (PARTITION BY DEPARTMENT_ID) AS FirstEmp
FROM public.employees
ORDER BY department_id;

Same thing – this time using MIN as a windowing function. Note the lack of GROUP BY in this query.

By this time, you’re probably saying “Yes, Sherpa! We’ve seen this in SQL Server a thousand times. What makes Snowflake do it any better?”

What if I told you that Snowflake could find the minimum value of one column and tell you the value of another?

For example, I want to find out the last name of the first employee we put into the database. You’d have to do this in SQL Server in multiple steps. In Snowflake, just one…

SELECT MIN_BY(last_name, employee_id) AS LastName
FROM public.employees;

Notice that the column you want the minimum (or maximum) of is second in the list. The column that you want to be returned is first in the list of parameters.

Now if I want to get really fancy, I can actually show you the last names of the first 5 employees put into the database.

SELECT 
    MIN_BY(last_name, employee_id, 5) AS LastName
FROM public.employees;

Notice that the 3rd optional parameter of the MIN_BY function lets us tell Snowflake how many items you’d like returned. And that the results – when there are more than 1 – are returned in an array. (HINT: Snowflake has array functions as well as functions for JSON and XML.)

Also of huge note, if more than one row had the same employee_id – besides shame on whoever architected that disaster – then the results will be non-deterministic. Running it one time could mean one answer. Running it again could mean a totally different answer…

If there are NULLs in the employee_id column – in this example – the MIN_BY function totally skips over them.

With that, I hope you have fun working with MIN_BY (and its sibling function MAX_BY) can be used to find the minimum (or maximum) value of a table and then give you a completely different column. Useful? Possibly. It does have some nuances, so be ready and check your data first!

Until next time my friends!

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.