Thursday, July 25, 2013

Debunking the myth that JOIN is faster than IN

We see now that contrary to the popular opinion, IN / EXISTS queries are not less efficient than a JOIN query in SQL Server. In fact, JOIN queries are less efficient on non-indexed tables, since Semi Join methods allow aggregation and matching against a single hash table, while a JOIN needs to do these two operations in two steps. -- http://explainextended.com/2009/06/16/in-vs-join-vs-exists/




Converted this JOIN (with DISTINCT) … (From 1 minute and 24 seconds) :

INNER JOIN dbo.fn_ssrs_ConstrainID(@LanguageCultureCode,@NeutralLanguageCode,@UserID,@PersonID,@Application,@Module,'PEOPLE','Review') cons
 
ON rpp.PersonID = cons.id


…to IN expression (optimized, down to 43 seconds) :

and rpp.PersonID in
    (select cons.id
    from dbo.fn_ssrs_ConstrainID(@LanguageCultureCode,@NeutralLanguageCode,@UserID,@PersonID,@Application,@Module,'PEOPLE','Review') cons)


Advised the two colleagues to further optimize the query's existence filters by converting all the JOIN+DISTINCT combo to IN. They were able to optimized the original query of 1 minute and 24 seconds down to 1 second.



Using the same technique as above, another colleague and I were able to optimized a stored proc that is very crucial to the web app’s performance, as that stored proc determines the visibility of some buttons on the web page. The original stored proc was 5 seconds slow, on one page there are two buttons that are using the stored proc, that makes the web page to appear only after 10 seconds or so, we were able to tame it down to sub-second response time just by converting some of the JOINs to INs.





Happy Computing! ツ

No comments:

Post a Comment