Wednesday, May 23, 2012

Running total. Here and now

Sql Server 2012 is upon us, it's 2012 already. But for some people, their here and now is still Sql Server 2008


Read first how not to do(set-based) running total at http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/running-sums-redux.aspx


Given Sql Server 2008 windowing capability limitation, most would give their new-fangled CTE skills a shot by writing running total query in a recursive manner:

with T AS
(
 select ROW_NUMBER() over(order by OrderID) as rn, * from test
)
,R(Rn, OrderId, Qty, RunningTotal) as
(
 select Rn, OrderID, Qty, Qty
 from t 
 where rn = 1
 
 union all
 
 select t.Rn, t.OrderId, t.Qty, p.RunningTotal + t.Qty
 from t t
 join r p on t.rn = p.rn + 1
 
)
select R.OrderId, R.Qty, R.RunningTotal from r
option(maxrecursion 0)


All is fine and dandy, except when that query is ran on a production database, that query will not scale, that query took a good 9 seconds on a 5,000 rows table. (DDL at the bottom of this post)

Now let's try another approach, let's think for a while we are back in time, say Sql Server 2000. What would your DBA grandpa would do to facilitate that running total report?

create function TestRunningTotal()
returns @ReturnTable table(
    OrderId int, Qty int, RunningTotal int
)
as begin

    insert into @ReturnTable(OrderID, Qty, RunningTotal)
    select OrderID, Qty, 0 from Test
    order by OrderID;

    declare @RunningTotal int = 0;

    update @ReturnTable set 
           RunningTotal = @RunningTotal, 
           @RunningTotal = @RunningTotal + Qty;

    return;
end;

And that query took 0 second.


And go back further in time, say Sql Server 7, what would he do? He would follow Adam Machanic's approach: http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/running-sums-redux.aspx. Cursor is one of the rarity cases where running total is a very good choice.


The following is every developer's heaven, works on Postgresql 8.4, Sql Server 2012 and Oracle 9(or 8.1.7 ?):

select OrderID, Qty, sum(Qty) over(order by OrderID) as RunningTotal from test

Now back to regular programming(Sql Server 2008).

Some called that kind of update that relies on physical sort a quirky update. If you feel uneasy with quirky update, you might want to put a guard statement to prevent wrong update order.

create function TestRunningTotalGuarded()
returns @ReturnTable table(
    OrderId int, Qty int, RunningTotal int not null, RN int identity(1,1) not null
)
as begin

    insert into @ReturnTable(OrderID, Qty, RunningTotal)
    select OrderID, Qty, 0 from Test
    order by OrderID;
    
    declare @RunningTotal int = 0;
    
    declare @RN_check INT = 0;
    
    update @ReturnTable set 
            @RN_check = @RN_check + 1,
            @RunningTotal = (case when RN = @RN_check then @RunningTotal + Qty else 1/0 end),
            RunningTotal = @RunningTotal;

    return;
end;


If UPDATE really update rows in unpredictable order, the @RN_Check will not be equal to RN(identity order) anymore, the code will raise a divide-by-zero error then.



DDL

create table Test(
 OrderID int primary key,
 Qty int not null
);


declare @i int = 1;

while @i <= 5000 begin
 insert into Test(OrderID, Qty) values (@i * 2,rand() * 10); 
 set @i = @i + 1;
end;


Running total example results:
OrderId     Qty         RunningTotal
----------- ----------- ------------
2           4           4
4           8           12
6           4           16
8           5           21
10          3           24
12          8           32
14          2           34
16          9           43
18          1           44
20          2           46
22          0           46
24          2           48
26          6           54
28          2           56
30          8           64
32          6           70
34          0           70
36          4           74
38          2           76
40          5           81
42          4           85



UPDATE: May 29, 2012

Use cursor, quirky update is well.. quirky. Until further notice, please use something predictable, like cursor. http://www.ienablemuch.com/2012/05/recursive-cte-is-evil-and-cursor-is.html

UPDATE: May 29, 2012 8:27 PM

Just by putting a clustered primary key on the table variable, it makes the updating of rows in order. Check the test(looped 100 times) on the bottom of this article, it has no divide-by-zero(guard statement) error anymore: http://www.ienablemuch.com/2012/05/recursive-cte-is-evil-and-cursor-is.html

Until further notice, I would say quirky update is not really quirky.

No comments:

Post a Comment