T-SQL Tuesday #23: WHERE The JOIN Ends
This month's T-SQL Tuesday topic comes to us courtesy of Stuart Ainsworth (blog | @codegumbo), and is early due to the PASS Summit being next week. Perhaps those planning next year's summit will do a better job to make sure they don't interfere with any blogging parties :)
Stuart's topic for the month is joins, and while they are certainly a building block of the skyscraper that is database usage, their complexities can still trip up experienced users from time to time. I ran into a new (to me) join "gotcha" not too long ago and had this post in my queue to release later, but once this topic was announced I knew it would be appropriate to discuss here.
When learning SQL, one of the first keywords taught is there WHERE clause, as filtering is a rather basic operation that anyone can conceptualize. After that, INNER JOINs tend to follow pretty quickly, and then OUTER JOINs. WHERE can of course be used in conjunction with any JOIN, and that's where things can get a little tricky, if not down right confusing.
To make my point we'll need an example, and rather than use AdventureWorks we'll just create our own example tables since they needn't be very large.
1CREATE TABLE TestOld (
2 Ltr CHAR(1),
3 Nbr TINYINT
4);
5
6CREATE TABLE TestNew (
7 Ltr CHAR(1),
8 Nbr TINYINT
9);
10
11INSERT INTO TestOld VALUES ('A',1),('A',2),('A',3),('B',4),('B',6);
12INSERT INTO TestNew VALUES ('B',2),('B',3),('B',6),('C',7),('C',12);
13
14SELECT * FROM TestOld;
15SELECT * FROM TestNew;
We'll start with a simple left join query based on the Ltr
column of each table:
1SELECT o.Ltr AS OldLtr, o.Nbr AS OldNbr, n.Ltr AS NewLtr, n.Nbr AS NewNbr
2FROM TestOld o
3LEFT JOIN TestNew n ON o.Ltr = n.Ltr;
No surprises here. Our results contain all combinations of rows from both tables matching on the Ltr
value of "B" as well as rows from TestOld
that have no match, just as we would expect. Now let's put a twist on it and say we're only interested in results from TestNew
where the Nbr
value is even. We execute the following:
1SELECT o.Ltr AS OldLtr, o.Nbr AS OldNbr, n.Ltr AS NewLtr, n.Nbr AS NewNbr
2FROM TestOld o
3LEFT JOIN TestNew n ON o.Ltr = n.Ltr
4WHERE n.Nbr %2 =0;
But wait, this looks more like an INNER JOIN than a LEFT JOIN. In fact it is an INNER JOIN, as the rows with NULL
values for the TestNew
columns are now gone. Filtering on TestNew.Nbr
gives you way more than you bargained for because not only does it remove values that aren't even from the result set like we specified, it also removes values that aren't NULL
. The WHERE clause implies that rows in TestNew
must exist in order to be compared, and when rows exist on both sides of a join, it's an inner join.
To get what we want and enforce our constraint in the context of a left join, the filter has to be part of the join statement. The query can be rewritten like this:
1SELECT o.Ltr AS OldLtr, o.Nbr AS OldNbr, n.Ltr AS NewLtr, n.Nbr AS NewNbr
2FROM TestOld o
3LEFT JOIN TestNew n ON o.Ltr = n.Ltr AND n.Nbr%2=0;
Now our results contain records for which TestNew
has no matching rows in addition to records where TestNew
has even values – just like we asked for. The moral of the story is that if you're going to filter values from an outer-joined table, you need to include the filtering statement as part of the JOIN instead of in it's own WHERE clause.