Saturday, August 10, 2013

NOT IN is faster than LEFT JOIN IS NULL combo

The best thing about pickup lines is in spite of its blatant approach, their intent is very clear. You don't need to take a second guess, they can spare you the trouble of being friendzoned as they will tell you right away if they like you are not, or at least you can feel it.


If you will try to date a DBA, perhaps you can use this pickup line:


NOT IN is faster than LEFT JOIN + IS NULL combo

Why?

Because 135 millisecond vs 2,194 millisecond


The best wingman for dating DBAs is http://explainextended.com/



NOT IN: 135 milliseconds only:

/*------------------------
 
SELECT  l.id, l.value
FROM    [20090915_anti].t_left l
WHERE   l.value NOT IN
        (
        SELECT  value
        FROM    [20090915_anti].t_right r
        )
           
------------------------*/
SQL Server parse and compile time:
   CPU time = 6 ms, elapsed time = 6 ms.
 
(10 row(s) affected)
Table 't_left'. Scan count 1, logical reads 499, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_right'. Scan count 1, logical reads 3112, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 
SQL Server Execution Times:
   CPU time = 125 ms,  elapsed time = 128 ms.





LEFT JOIN IS NULL: 2,194 milliseconds:

/*------------------------
SELECT  l.id, l.value
FROM    [20090915_anti].t_left l
LEFT JOIN
        [20090915_anti].t_right r
ON      r.value = l.value
WHERE   r.value IS NULL 
------------------------*/
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 10 ms.
 
(10 row(s) affected)
Table 't_left'. Scan count 9, logical reads 1984, physical reads 27, read-ahead reads 498, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_right'. Scan count 9, logical reads 3166, physical reads 37, read-ahead reads 3106, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 9999, logical reads 2239770, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 
SQL Server Execution Times:
   CPU time = 14646 ms,  elapsed time = 2184 ms.



NOT IN is fast and efficient in CPU and Reads


NOT IN            --  CPU:    128 Reads:     3,619 Duration:   135
LEFT JOIN IS NULL --  CPU: 14,642 Reads: 2,254,967 Duration: 2,194




Even we are using an smart RDBMS (we are using SQL Server though) and it can discern that our LEFT JOIN IS NULL is doing an anti-join(i.e., a NOT IN) instead, hence our RDBMS gives our LEFT JOIN IS NULL the same fast performance as NOT IN, there's no reason for us to use LEFT JOIN IS NULL for anti-joins. Don't obscure the code's NOT IN intent, don't replace NOT IN with LEFT JOIN IS NULL. Use NOT IN when we need to detect if a column's value is NOT IN another table. See how intuitive is that? Do not complicate a simple concept with LEFT JOIN IS NULL combo.

LEFT JOIN IS NULL is too highfalutin-a-programmin', LEFT JOIN IS NULL is not the place where we should showboat our programming prowess on SQL Server or any RDBMS for that matter. I'm seeing a few (no, not many, fortunately our company hires the best devs in town) highfalutin codes not just in our SQL codes but on C# codes too, I'll put them on a blog when the list is comprehensive enough.



Do not confuse motion and progress. A rocking horse keeps moving but does not make any progress. -- Alfred Montapert



SELECT p.*, e.IsTerminated
FROM dbo.Person p
join dbo.Employee e on p.PersonID = e.EmployeePersonId

left join dbo.PersonResponsibility pr on p.PersonId = pr.ReviewerPersonId
-- Hey IS NULL buddy! what ya doin' down far there?

where
    e.AllowLogin = 1

    /*
    other 
    humongous 
    filters 
    here 
    competing 
    for 
    your 
    attention
    */        
 
    -- Hey LEFT JOIN! keeps movin' here buddy, just chillin' and filterin' your dumb join, makin' queries slow
    and pr.ReviewerPersonId is null



With LEFT JOIN and IS NULL, those two star-crossed lovers could be far apart from each other, thus making code analysis a little harder; contrast that to NOT IN which is pretty much a one-stop-shop syntax, everything is in one place, thus making NOT IN easier to analyze if there's even analysis at all, the code's intent is self-describing. And oh, did I mention NOT IN is fast? Ironically, the humble-looking NOT IN approach is fast:


SELECT p.*, e.IsTerminated
FROM dbo.Person p
join dbo.Employee e on p.PersonID = e.EmployeePersonId

where
    e.AllowLogin = 1

    /*
    other 
    humongous 
    filters 
    here 
    competing 
    for 
    your 
    attention
    */         

    -- Yay! no more LEFT JOIN buddy that can ruin the party!

   and p.PersonId not in (select pr.ReviewerPersonId from dbo.PersonResponsibility pr)



Unless we really found a corner case where LEFT JOIN IS NULL combo could be faster than NOT IN, we should just keep using NOT IN, aside from it's faster, it is easier to read. And before someone suggest NOT EXISTS, have I forgot to mention IN / NOT IN is easier to read



Happy Coding! ツ

No comments:

Post a Comment