Walking Through Advent of Code Day 5

Yes, that’s right my friends! We have reached day 5!

There was a lot going on with this one, so let’s go ahead and get started. First thing we want to do, as always, is to go ahead and load all of our data into SQL Server. With this one, I cheated a little and loaded it into multiple tables to start with – since there are 7 different groups of data that we have to play with.

Could I have loaded it into one table and pull from there into other tables? Absolutely! Do I really want to? Heck, no!

Now that I have all of the data that they gave me loaded into tables, I can start putting all of the data into tables that I can actually work with.

We’re told that the mapping tables are all in the format of:

[Destination Range Start] [Source Range Start] [Range Length]

And we’re told that any number that isn’t listed is just the same number.

With that information, let’s start creating some real data tables we can mess with.

For example, I have the following data as part of my input for the Seed To Soil.

3957953582 2540115966 24844899

First, we have to separate out the data we’ve been given into the format shown above. My first thought is to start using the LEFT, RIGHT, and SUBSTRING functions. But there’s an easier way!

SELECT 	TheList,
  PARSENAME(REPLACE(TheList, ' ', '.'), 3) AS Destination,
  PARSENAME(REPLACE(TheList, ' ', '.'), 2) AS Source,
  PARSENAME(REPLACE(TheList, ' ', '.'), 1) AS RangeLength
FROM SeedToSoil

Yes, the PARSENAME function is normally used for seperating out things like IP Addresses or database objects, but it can be used for any items that are seperated by a “.”. Now, it won’t work if you have 5 or more items in your list and the positions go from right to left, but as seen above, it’s a heck of a lot easier to do this then to attempt to figure out the correct string wizardry to grab the second piece of data…

With the above all split out and looking pretty, we can now start creating real tables!

The first thing I wanted to do was to do something like the following:

DECLARE cCur CURSOR READ_ONLY FOR 
	select	
		parsename(REPLACE(TheList, ' ', '.'), 3) AS Destination,
		parsename(REPLACE(TheList, ' ', '.'), 2) AS Source,
		parsename(REPLACE(TheList, ' ', '.'), 1) AS RangeLength
	FROM seedtosoil

DECLARE @Destination bigint,
	@Source bigint,
	@RangeLength bigint,

        @Counter bigint
OPEN cCur

FETCH NEXT FROM cCur INTO @Destination, @Source, @RangeLength
WHILE (@@fetch_status <> -1)
BEGIN
	IF (@@fetch_status <> -2)
	BEGIN

             SET @Counter = 0
	     WHILE @Counter < @RangeLength

             BEGIN
                  INSERT INTO #SeedToSoil
                  SELECT @Destination + @Counter, 
                         @Source + @Counter
             END 
	END
	FETCH NEXT FROM cCur INTO @Destination, @Source, @RangeLength
END

CLOSE cCur
DEALLOCATE cCur
GO

Now, this is a somewhat common practice since we’re adding in a fixed amount of numbers and SQL Server can loop through these fairly quickly and we’re all done!

Except, these numbers are rather large. Notice in the above example for my input, you would be looping through the inner while loop 24,844,899 times. Yes, that’s almost 25 Million. And there are several more rows of data like this. There has to be a better way…

And thankfully, there is. That way, my friends, is with what is known as a Tally table. Yes, the concepts for it started in SQL Server sometime back – noted here. You will need a fairly large set of numbers in the table – otherwise you won’t get to do the exercise as it has large numbers.

From there, you can lose the cursor and the while loops!

with cte as (select	
		TheList,
		parsename(REPLACE(TheList, ' ', '.'), 3) AS Destination,
		parsename(REPLACE(TheList, ' ', '.'), 2) AS Source,
		parsename(REPLACE(TheList, ' ', '.'), 1) AS RangeLength
	FROM seedtosoil)

INSERT INTO SeedToSoilGoodness
SELECT cte.Source, t.n
FROM Tally t
JOIN cte on t.n > cte.Destination 
     AND t.n < cte.Destination + cte.RangeLength

As long as your Tally table is properly indexed, then this will be many times faster than the cursor method shown above. Like thousands of times faster…

From here, we still need to add all of the values that don’t currently exist in the – in what I call here – SeedToSoilGoodness table.

INSERT INTO SeedToSoilGoodness
SELECT n, n  -- Since we want the same number on both sides
FROM Tally
WHERE NOT EXISTS (SELECT 1
	FROM SeedToSoilRange
	WHERE n = Destination
        OR n = Source)

Now, once you have all of this in place, remember you still have several tables to do basically the same thing as above to: SoilToFertilizer, FertilizerToWater, WaterToLight, LightToTemperature, TemperatureToHumidity, and HumidityToLocation.

Once you have all of that in place, you can finally create the query that will get the data you were requested in the first place!

SELECT TL.Value, MIN(HL.Destination)
FROM Seeds S
CROSS APPLY STRING_SPLIT(TheList, ' ') TL
JOIN SeedToSoilGoodness SS ON SS.Source = TL.Value
JOIN SoilToFertilizerGoodness SF ON SF.Source = SS.Destination
JOIN FertilizerToWaterGoodness FW ON FW.Source = SF.Destination
JOIN WaterToLightGoodness WL ON WL.Source = FW.Destination
JOIN LightToTemperatureGoodness LT ON LT.Source = WL.Destination
JOIN TemperatureToHumidityGoodness TH ON TH.Source = LT.Destination
JOIN HumidityToLocationGoodness HL ON HL.Source = LT.Destination 
GROUP BY TL.Value

Yes, it’s everyone’s friends CROSS APPLY and STRING_SPLIT because the seeds table is truly made up on one row in the table. Then we have to use all of our reference tables to find what our lowest location number.

Lots of work here. Hopefully, you have found a great “new” use for your tally tables. And how much time it can save because we are set-based instead of relying on cursors and loops in SQL Server. Until tomorrow when we’re back with you for Part 2 of Day 5 of #AdventOfCode!

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.