A Different Way to Compare Data

Last week, we discussed the INTERSECT operator and how it is a great way to show what data points are the same between two disparate sources.

INTERSECT in diagram form

As with most operators in SQL Server, there is another way to get the same data. It’s one that many of you are familiar with – it is the INNER JOIN.

Now, some of you are asking yourselves “Just how the heck would I check for where there are the same values in two different queries with an INNER JOIN?”

I am so glad that you asked!

First, let’s use the query that we used last week to match up against:

SELECT FirstName, LastName
FROM dbo.Person1
WHERE Gender = 'Male'
INTERSECT 
SELECT FirstName, LastName
FROM dbo.Person2
WHERE email LIKE '%.com'

Notice that the above query is fairly easy to read. Even if you’re not aware of what the INTERSECT operator does, then you should be able to make out the two queries on either side of it and what they do.

To do the same thing with an INNER JOIN, we should do something like the following:

;WITH Point1 AS (SELECT FirstName, LastName
		FROM dbo.Person1
		WHERE Gender = 'Male'),
      Point2 AS (SELECT FirstName, LastName
		FROM dbo.Person2
		WHERE email LIKE '%.com%')
SELECT p1.FirstName, p1.LastName
FROM Point1 P1 
INNER JOIN Point2 P2 ON P1.FirstName = P2.FirstName AND P1.LastName = P2.LastName

Notice that there’s a lot more here that is required. First, Point1 and Point2 are defined in what is known as a CTE at the top (don’t forget the semicolon before the WITH clause!) as the two queries that we’ll be working on.

Then, we have to set up the INNER JOIN itself. Please notice that for every point we’re comparing Point1 and Point2 on, we have to add that to the INNER JOIN’s ON clause. If we miss even one, then we are truly not comparing every field to see if it’s the same!

Now, as you’ve probably noticed rather quickly, this is not an efficient way to check to see if all values are the same between the two queries. Heck, the more fields that you add to the INNER JOIN, the higher your chances are that you’ll miss one of the important fields for comparison.

In this case, the INNER JOIN doesn’t perform as well as our INTERSECT query. In the screen capture below, the lower the percentage, the better performance it has.

Now, after altering the table a little to get better performance – aka adding Primary Keys to the table – I can get this query to perform similarly between the two queries.

I say all of this to show that there are multiple ways to get to the same data. Sometimes, one way works more efficiently than another. If performance matters to you (and you don’t bring down the system doing it one way), then you will need to know multiple ways to get to the same answer.

Even if this doesn’t apply to you at the moment, or you truly don’t care about performance, this is something to remember for the future as you work with others and see them trying multiple ways to get better performance out of their database queries.

And with that, my friends, it is time to come to a close for this week. Until next week when we talk more about comparisons!

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.