Tuesday, July 23, 2013

Debunking the myth that CTE is slow

There are some folks who believe that CTE is slow. I don't know where they got that impression. We might as well say that our SELECT statement is slow because our code is using er.. SELECT statement. And there are some folks who believe CTEs underpinning is a cursor, really, no joking. I'm starting to lose faith in humanity ^_^ lol


For those who believe CTE is made out of cursor, here's how to partner cursor to a CTE:

http://stackoverflow.com/questions/14601407/using-a-cursor-with-a-cte


So if one can define a cursor out of CTE, what is a cursor on CTE? a cursor on cursor? Head-scratching! :D There's a lot of engineering effort that goes into CTE and all some folks can say about CTE is that it is a cursor, this kind of folks is dangerous to our industry. It might be their excuse for not learning CTE and windowing functions, which by the way can be used separately, but commonly seen together as they come out at the same time on most RDBMSes.


As with derived tables, CTEs allow you to create complex queries that would otherwise require the use of temporary tables and cursors, which can be inefficient. CTEs remove the overhead of creating and dropping those temporary tables, and of the INSERT and UPDATE statements required to populate them. This can make for faster execution, though you should always compare all available options when performance is critical -- http://www.blackwasp.co.uk/SQLCTEs.aspx


What we do with CTE is what determines its performance. CTE is nothing but just a programming functionality that makes our query more manageable, it helps when we are simplifying very complex and slow queries; and granted that you are doing the right approach, it aids the developer on making the query performant.


If you have tried programming with C or C++, you might be familiar with macro. Yes, CTE is just like that, a macro. It is expanded to full form when you use it. It will not magically make your query faster nor slower.



CTE is just an inline view:

with product_count_query as
(
    select product_id, product_count = count(*)
    from tbl
    group by product_id
)

select p.*, the_count = isnull(c.product_count, 0)
from product p 
left join product_count_query c -- derive table from CTE
on p.product_id = c.product_id;

CTE is just a shorthand for an explicit view:

create view product_count_query as

    select product_id, product_count = count(*)
    from tbl
    group by product_id;
go


select p.*, the_count = isnull(c.product_count, 0)
from product p 
left join product_count_query c -- derive table from VIEW
on p.product_id = c.product_id;


A view, and a CTE for that matter, don't have their own execution plan, both are ultimately just being expanded and bolted to the main query:

select p.*, the_count = isnull(c.product_count, 0)
from product p
left join
(
    select product_id, product_count = count(*)
    from tbl
    group by product_id
) as c -- derive table from an inline view, er.. a CTE? ;-)
 on p.product_id = c.product_id


All of them, same banana, exactly the same execution plan, same speed.


If you believe CTE is just a cursor, you might as well believe that view is just a cursor too. A darndest and silly belief.


Time to stop spewing the nonsense that CTE is just a cursor, all those three queries have exactly the same execution plan and same speed. I don't know what magic sauce that could give the table-deriving query a performance boost over its CTE cousin query, there's none.


If you believe that there's something inherently slow with CTE, thus you prefer to write your query using table-deriving approach because you believe it is faster than CTE, you are merely doing Cargo Cult Programming. You are depriving yourself of the benefits of abstractions, readability and manageability of CTE offers to the table by having a cult-like belief that CTE is slow.


It's what you do with CTE that makes it slow or fast. If you will use JOIN+DISTINCT combo on a CTE or any form of query abstraction for that matter, expect bad performance and rewrites.


CTE is just one of the vast array of SQL tools at your disposal that can make your daily craft easier. You don't need to use it exclusively, you can mix-and-match it with other SQL tools too.


So what makes a query fast?


Start with the right assumption, it's just a macro:
@crokusek they(CTE) don't get materialized at all AFAIK - they are just a table expression -- http://dba.stackexchange.com/questions/13112/whats-the-difference-between-a-cte-and-a-temp-table#comment69061_13117



This is what gives CTE a bad name: wrong assumption:

One reason CTE's can be employable for performance reasons is because their (possibly forced) materialization could be more easily parallelized whereby temp tables are generally computed serially before their use. -- http://dba.stackexchange.com/questions/13112/whats-the-difference-between-a-cte-and-a-temp-table#comment69060_13117


Until MATERIALIZED is supported...
with 
T MATERIALIZED AS
(
    select ROW_NUMBER() over(partition by CustomerCode order by OrderID) as rn, * from test
)
,R(CustomerCode, Rn, OrderId, Qty, RunningTotal) as
(
    select CustomerCode, Rn, OrderID, Qty, Qty
    from t 
    where rn = 1
     
    union all
     
    select t.CustomerCode, t.Rn, t.OrderId, t.Qty, p.RunningTotal + t.Qty
    from t t
    join r p on p.CustomerCode = t.CustomerCode and t.rn = p.rn + 1
     
)
select R.CustomerCode, R.OrderId, R.Qty, R.RunningTotal from r
order by R.CustomerCode, R.OrderId 
option(maxrecursion 0);

...we should mix-and-match CTE with temporary table to give our query a performance boost. That is, there are times we need to materialize the result of a query, e.g., we should materialize the results of a row_numbering query to an actual table(temporary table or variable table will do) first when we wanted to use that row-numbered set to a recursive query, this is fast:

select ROW_NUMBER() over(partition by CustomerCode order by OrderID) as rn, * into #xxx 
from test;
 
with 
T AS
(
    select * from #xxx
)
,R(CustomerCode, Rn, OrderId, Qty, RunningTotal) as
(
    select CustomerCode, Rn, OrderID, Qty, Qty
    from t 
    where rn = 1
     
    union all
     
    select t.CustomerCode, t.Rn, t.OrderId, t.Qty, p.RunningTotal + t.Qty
    from t t
    join r p on p.CustomerCode = t.CustomerCode and t.rn = p.rn + 1
     
)
select R.CustomerCode, R.OrderId, R.Qty, R.RunningTotal from r
order by R.CustomerCode, R.OrderId 
option(maxrecursion 0);
 
drop table #xxx;

Happy Computing! ツ

No comments:

Post a Comment