Wednesday, May 30, 2012

Troubleshooting: There was no endpoint listening.

If you received this kind of error:


There was no endpoint listening at http://localhost:9384/Forumer.svc that could accept the message. This is often caused by an incorrect address or SOAP action. See InnerException, if present, for more details.



Chances are your WCF is using a different port number (or it is Auto-assigned), right click your WCF project, then look at the Web tab, then select the Specific port, put a number there, e.g. 9384

Tuesday, May 29, 2012

Recursive CTE is evil. And cursor is an angel

Someone made a comment on Stackoverflow that I'm painting the Recursive CTE as evil and should always be avoided on my post Running Total. Here and Now

Well, like Stackoverflow, I like facts, not opinions, give me numbers any time of day, numbers don't lie. Numbers could lie if you are a boxer though. How? By cherry-picking opponents. Ok, I digress ;-)

First, I would never declare a too encompassing statement that this and that thing is evil. What I actually conveyed on that previous post is, using recursive CTE for running total is slow. How could I even make a fast hierarchical report in this 21st century if I don't use recursive CTE then?

Anyway, I don't have too many blog post declaring things as evil, in fact since I do blogging, I have only one blog post that mention the word evil, two if you counted Dracula as evil, three after I deliberately title this post with headline-grabbing four-letter word evil. With this in mind, I would give my blog a favor and give it a traffic-grabbing title, XXX is EVIL. I will then title this article as Recursive CTE is Evil, unmindful of the fact that what I really conveyed is Running Total with Recursive CTE is Evil :-)


This blog post has twofold purposes. First, I find using *recursive* CTE for running total is really dog-slow^H^H^H^H^H^H^H^H evil; second, I'm ashamed that quirky update approach for making running total semi-works only, and yeah it's really quirky. I put a guard statement on the UPDATE statement, and that guard statement gives a warning(divide-by-zero) if UPDATE update rows unpredictably(i.e. it run out-of-order). And unpredictable it is, sometimes the query work, sometimes it don't(divide-by-zero is invoked), I have yet to explore the suggestions by many to make a clustered index on temporary table to make the update be predictable.


And before I forgot, a third purpose of this post, I would like to re-introduce to you our old friend cursor; he's nice, he's dependable, he's our most unsung hero. I would hazard a guess that for every smooth-running system, there's a cursor-using report that is lurking around that is not given much recognition or accolade, because you know, cursors are evil. While our new-found friend(CTE) is being given all the mad props, even with the mere fact that it is just doing a glorified row_numbering chore, yes folks, SQL Server 2008 don't have decent windowing capability, that a running total blog post such as this exists because of that limitation. This blog post will be moot when most of us will be using the latest Sql Server 2012 by then, but that's neither here nor there, even we are already at the end of the second quarter of year 2012, most of us are still using Sql Server 2008.


Ok, so let's start with some data to verify our query correctness:

create table Test(
    OrderID int primary key,
    CustomerCode varchar(50),
    Qty int not null
);


declare @i int = 1;

while @i <= 20 begin
    insert into Test(OrderID, CustomerCode, Qty) values (
        @i * 2
        ,case @i % 4 
        when 0 then 'JOHN' 
        when 1 then 'PAUL'
        when 2 then 'GEORGE'
        when 3 then 'RINGO'
        end
        ,rand() * 10);    
    set @i = @i + 1;
end;


Then make a Recursive CTE approach:

with T AS
(
    select ROW_NUMBER() over(partition by CustomerCode order by OrderID) as rn, * from test
)
,R(CustomerCode, Rn, OrderId, Qty, RunningTotal) as
(
    select CustomerCode, Rn, OrderID, Qty, Qty
    from t 
    where rn = 1
    
    union all
    
    select t.CustomerCode, t.Rn, t.OrderId, t.Qty, p.RunningTotal + t.Qty
    from t t
    join r p on p.CustomerCode = t.CustomerCode and t.rn = p.rn + 1
    
)
select R.CustomerCode, R.OrderId, R.Qty, R.RunningTotal from r
order by R.CustomerCode, R.OrderId 
option(maxrecursion 0);

Output:

CustomerCode                                       OrderId     Qty         RunningTotal
-------------------------------------------------- ----------- ----------- ------------
GEORGE                                             4           7           7
GEORGE                                             12          8           15
GEORGE                                             20          4           19
GEORGE                                             28          4           23
GEORGE                                             36          9           32
JOHN                                               8           2           2
JOHN                                               16          9           11
JOHN                                               24          0           11
JOHN                                               32          3           14
JOHN                                               40          3           17
PAUL                                               2           5           5
PAUL                                               10          8           13
PAUL                                               18          3           16
PAUL                                               26          7           23
PAUL                                               34          0           23
RINGO                                              6           7           7
RINGO                                              14          9           16
RINGO                                              22          2           18
RINGO                                              30          3           21
RINGO                                              38          3           24

(20 row(s) affected)


Then make a Quirky Update approach:

create function TestRunningTotalGuarded()
returns @ReturnTable table(
    CustomerCode varchar(50), OrderId int, Qty int, RunningTotal int not null, RN int identity(1,1) not null
)
as begin
 
    insert into @ReturnTable(CustomerCode, OrderID, Qty, RunningTotal)
    select CustomerCode, OrderID, Qty, 0 from Test
    order by CustomerCode, OrderID;
     
    declare @RunningTotal int;
     
    declare @RN_check INT = 0;
    declare @PrevCustomerCode varchar(50) = NULL;
     
    update @ReturnTable set
            @RN_check = @RN_check + 1,
            @RunningTotal = 
                (case when RN = @RN_check then 
                    case when @PrevCustomerCode = CustomerCode then
                        @RunningTotal + Qty 
                    else
                        Qty
                    end
                else 
                    1/0 
                end),
            @PrevCustomerCode = CustomerCode,
            RunningTotal = @RunningTotal;
 
    return;
end;

Output:

CustomerCode                                       OrderId     Qty         RunningTotal RN
-------------------------------------------------- ----------- ----------- ------------ -----------
GEORGE                                             4           7           7            1
GEORGE                                             12          8           15           2
GEORGE                                             20          4           19           3
GEORGE                                             28          4           23           4
GEORGE                                             36          9           32           5
JOHN                                               8           2           2            6
JOHN                                               16          9           11           7
JOHN                                               24          0           11           8
JOHN                                               32          3           14           9
JOHN                                               40          3           17           10
PAUL                                               2           5           5            11
PAUL                                               10          8           13           12
PAUL                                               18          3           16           13
PAUL                                               26          7           23           14
PAUL                                               34          0           23           15
RINGO                                              6           7           7            16
RINGO                                              14          9           16           17
RINGO                                              22          2           18           18
RINGO                                              30          3           21           19
RINGO                                              38          3           24           20

(20 row(s) affected)


Then make a Running Total via Cursor approach:

create function TestRunningTotalCursor()
returns @ReturnTable table(
    CustomerCode varchar(50), OrderId int, Qty int, RunningTotal int not null
) as
begin

    declare @c_CustomerCode varchar(50);
    declare @c_OrderID int;
    declare @c_qty int;
    
    declare @PrevCustomerCode varchar(50) = null;
    declare @RunningTotal int = 0;
    
    

    declare o_cur cursor for
    select CustomerCode, OrderID, Qty from Test
    order by CustomerCode, OrderID;

    open o_cur;


    fetch next from o_cur
    into @c_CustomerCode, @c_OrderID, @c_Qty;

    while @@FETCH_STATUS = 0 begin
                
        if @c_CustomerCode = @PrevCustomerCode begin
            set @RunningTotal = @RunningTotal + @c_qty;
        end
        else begin
            set @RunningTotal = @c_Qty;
        end
        
        set @PrevCustomerCode = @c_CustomerCode;
        
        insert into @ReturnTable(CustomerCode, OrderId, Qty, RunningTotal)
        values(@c_CustomerCode, @c_OrderID, @c_Qty, @RunningTotal);
                    
        
        fetch next from o_cur
        into @c_CustomerCode, @c_OrderID, @c_Qty;
    end


    close o_cur;
    deallocate o_cur;
    
    return;    
end;

Output:

CustomerCode                                       OrderId     Qty         RunningTotal
-------------------------------------------------- ----------- ----------- ------------
GEORGE                                             4           7           7
GEORGE                                             12          8           15
GEORGE                                             20          4           19
GEORGE                                             28          4           23
GEORGE                                             36          9           32
JOHN                                               8           2           2
JOHN                                               16          9           11
JOHN                                               24          0           11
JOHN                                               32          3           14
JOHN                                               40          3           17
PAUL                                               2           5           5
PAUL                                               10          8           13
PAUL                                               18          3           16
PAUL                                               26          7           23
PAUL                                               34          0           23
RINGO                                              6           7           7
RINGO                                              14          9           16
RINGO                                              22          2           18
RINGO                                              30          3           21
RINGO                                              38          3           24

(20 row(s) affected)


After verifying all those queries are correct, now let's bump the number of rows to 5,000. Here are the metrics:


* Recursive CTE : 49 seconds
* Quirky Update : 0 second
* Cursor        : 0 second


Those 0 seconds are not meaningful, let's bump the rows to 50,000 to see a meaningful metrics:
* Quirky Update : 1 second
* Cursor        : 3 seconds
* Recursive CTE : An hour

Caveat, quirky update look like a speed demon, but there's some problem with it, with Sql Server(and other RDBMS might be too. Sybase was the first RDBMS that uses the quirky update approach for running total and encourages it, so YMMV if you are using Sybase. Microsoft SQL Server was a Sybase before), the order of updating rows is unpredictable when it is given large number of rows. I now see a divide-by-error intermittently on 50,000 rows for about one time in every five run.

CustomerCode                                       OrderId     Qty         RunningTotal RN
-------------------------------------------------- ----------- ----------- ------------ -----------
Msg 8134, Level 16, State 1, Line 1
Divide by zero error encountered.
The statement has been terminated.


From the metrics and correctness, we should be now comfortable using cursor for running total. No one could paint FUD on this baby.


If someone could enlighten me if making quirky update not quirky is really a lost cause, I would stop here and would not find a way to make the quirky update reliable. Until further notice, I'm scouring the web if there's a way to make the quirky update reliable.



UPDATE

The pure CTE approach is the root of all evil :-) When a row-numbering query(that is being joined to a recursive CTE afterwards) is materialized to an actual table first before being joined to a recursive CTE, the speed goes up

select ROW_NUMBER() over(partition by CustomerCode order by OrderID) as rn, * into #xxx 
from test;

with T AS
(
    select * from #xxx
)
,R(CustomerCode, Rn, OrderId, Qty, RunningTotal) as
(
    select CustomerCode, Rn, OrderID, Qty, Qty
    from t 
    where rn = 1
    
    union all
    
    select t.CustomerCode, t.Rn, t.OrderId, t.Qty, p.RunningTotal + t.Qty
    from t t
    join r p on p.CustomerCode = t.CustomerCode and t.rn = p.rn + 1
    
)
select R.CustomerCode, R.OrderId, R.Qty, R.RunningTotal from r
order by R.CustomerCode, R.OrderId 
option(maxrecursion 0);

drop table #xxx;



Using view is also as slow as the pure CTE, also an hour:
create view xxx 
as
    select ROW_NUMBER() over(partition by CustomerCode order by OrderID) as rn, *
    from test;
go



with T AS
(
    select * from xxx
)
,R(CustomerCode, Rn, OrderId, Qty, RunningTotal) as
(
    select CustomerCode, Rn, OrderID, Qty, Qty
    from t 
    where rn = 1
     
    union all
     
    select t.CustomerCode, t.Rn, t.OrderId, t.Qty, p.RunningTotal + t.Qty
    from t t
    join r p on p.CustomerCode = t.CustomerCode and t.rn = p.rn + 1
     
)
select R.CustomerCode, R.OrderId, R.Qty, R.RunningTotal from r
order by R.CustomerCode, R.OrderId 
option(maxrecursion 0);



To speed up that view, materialize the view to an actual table:
create view xxx 
as
    select ROW_NUMBER() over(partition by CustomerCode order by OrderID) as rn, *
    from test;
go


select * into #xxx from xxx;


with T AS
(
    select * from #xxx
)
,R(CustomerCode, Rn, OrderId, Qty, RunningTotal) as
(
    select CustomerCode, Rn, OrderID, Qty, Qty
    from t 
    where rn = 1
     
    union all
     
    select t.CustomerCode, t.Rn, t.OrderId, t.Qty, p.RunningTotal + t.Qty
    from t t
    join r p on p.CustomerCode = t.CustomerCode and t.rn = p.rn + 1
     
)
select R.CustomerCode, R.OrderId, R.Qty, R.RunningTotal from r
order by R.CustomerCode, R.OrderId 
option(maxrecursion 0);




CTE is not slow, not understanding how queries work is what makes your query slow.



To recap, prior to materializing the row numbering to an actual table(i.e. temporary table), here are the metrics on 50,000 rows:

* Quirky Update       : 1 second
* Cursor              : 3 seconds
* Recursive CTE(Pure) : An hour

Here are the new metrics after row numbering table materialization (50,000 rows):

* Quirky Update         : 1 second
* Cursor                : 3 seconds
* Recursive CTE(Hybrid) : 2 seconds (inclusive of row numbering table materialization)


UPDATE May 29, 2012 8:00 PM

The quirky update is not so quirky after all. Just by putting a clustered primary key on sequential row number, SQL Server UPDATE is able to do an orderly update of rows.

The only change needed be done is to put a not null primary key clustered
alter function TestRunningTotalGuarded()
returns @ReturnTable table(
    CustomerCode varchar(50), OrderId int, Qty int, RunningTotal int not null, RN int identity(1,1) not null primary key clustered
)

I tried looping it 100 times, didn't encounter any divide-by-zero error
declare @i int = 0;
while @i < 100 begin
 select * from dbo.TestRunningTotalGuarded() order by CustomerCode, OrderId
 set @i = @i + 1;
end
And it's still the same 1 second query, and it's faster than Hybrid Recursive CTE. Here's the metric for 100,000 rows:
Quirky Update        : 3 seconds
Hybrid Recursive CTE : 5 seconds
Cursor               : 6 seconds
Quirky update is not so quirky after all.
UPDATE July 27, 2013 10:47 AM Table variable is faster by a few milliseconds...
declare @t table
(
    RN INT,
    OrderID int,
    CustomerCode varchar(50),
    Qty int,
    primary key(CustomerCode, RN)
);

insert into @t(RN, OrderID, CustomerCode, Qty)
select ROW_NUMBER() over(partition by CustomerCode order by OrderID) as rn, * from test;

with T AS
(
    select * from @t
)
,R(CustomerCode, Rn, OrderId, Qty, RunningTotal) as
(
    select CustomerCode, Rn, OrderID, Qty, Qty
    from t 
    where rn = 1
     
    union all
     
    select t.CustomerCode, t.Rn, t.OrderId, t.Qty, p.RunningTotal + t.Qty
    from t t
    join r p on p.CustomerCode = t.CustomerCode and t.rn = p.rn + 1
     
)
select R.CustomerCode, R.OrderId, R.Qty, R.RunningTotal from r
order by R.CustomerCode, R.OrderId 
option(maxrecursion 0);
...than temporary table:
create table #t
(
    RN INT,
    OrderID int,
    CustomerCode varchar(50),
    Qty int,
    primary key(CustomerCode, RN)
);

insert into #t(RN, OrderID, CustomerCode, Qty)
select ROW_NUMBER() over(partition by CustomerCode order by OrderID) as rn, * from test;

with T AS
(
    select * from #t
)
,R(CustomerCode, Rn, OrderId, Qty, RunningTotal) as
(
    select CustomerCode, Rn, OrderID, Qty, Qty
    from t 
    where rn = 1
     
    union all
     
    select t.CustomerCode, t.Rn, t.OrderId, t.Qty, p.RunningTotal + t.Qty
    from t t
    join r p on p.CustomerCode = t.CustomerCode and t.rn = p.rn + 1
     
)
select R.CustomerCode, R.OrderId, R.Qty, R.RunningTotal from r
order by R.CustomerCode, R.OrderId 
option(maxrecursion 0);


drop table #t
It's not possible to create a temporary table inside a function, only a table variable is possible:
create function TestRunningTotalMaterialized()
returns @ReturnTable table(
    CustomerCode varchar(50), OrderId int, Qty int, RunningTotal int not null, RN int identity(1,1) not null
)
as
begin

    declare @t table
    (
        RN INT,
        OrderID int,
        CustomerCode varchar(50),
        Qty int,
        primary key(CustomerCode, RN)
    );

    insert into @t(RN, OrderID, CustomerCode, Qty)
    select ROW_NUMBER() over(partition by CustomerCode order by OrderID) as rn, * from test;

    
    with T AS
    (
        select * from @t
    )
    ,R(CustomerCode, Rn, OrderId, Qty, RunningTotal) as
    (
        select CustomerCode, Rn, OrderID, Qty, Qty
        from t 
        where rn = 1
     
        union all
     
        select t.CustomerCode, t.Rn, t.OrderId, t.Qty, p.RunningTotal + t.Qty
        from t t
        join r p on p.CustomerCode = t.CustomerCode and t.rn = p.rn + 1
     
    )
    insert into @ReturnTable(CustomerCode, OrderId, Qty, RunningTotal)
    select R.CustomerCode, R.OrderId, R.Qty, R.RunningTotal from r
    order by R.CustomerCode, R.OrderId 
    option(maxrecursion 0);
    
    return;
end;

In terms of performance, even we use table variable on Hybrid Recursive CTE, things stay the same: Quirky Update > Hybrid Recursive CTE > Cursor

Monday, May 28, 2012

Anonymous type + Dynamic + ServiceStack == Consuming cloud have never been easier

In this post, I will show you how to request a service without all those codegen'd classes.

What's wrong with WCF? Everything revolves around the toolchain(i.e. Visual Studio), e.g. every classes must be generated not only for the returned objects, but also for the class(es) of your service(s). WCF would have versioning problem too when you update a method and add a new parameter to it or arranging its parameters, the calling app would break when the parameter mismatches. An example, you have this function:

public void Save(string firstname, string lastname)

And then you add a new parameter later on:

public void Save(string firstname, string lastname, string middlename)

The calling app would break as it cannot find a Save method that has three parameters. In order to fix that, the dev need to update the service reference, then it will code-generate the service's class with the updated method and parameter(s).


Then let's say you notice that the parameter arrangement is not ideal, you need to switch the Middle Name and Last Name.

public void Save(string firstname, string middlename, string lastname)


Then that will silently break the existing app saving of record, let's say you call it like this before:

Save("John", "Lennon", "Winston");

With the updated service, the Last Name: Winston, will fall on Middle Name; and the Middle Name: Winston, will fall on Last Name.


Though that is not exactly a problem of WCF, but since the mechanism for Request-Response protocol is not so formalized with WCF, most WCF developers could fall into that potential problem. If the Save method was made to only accept a class from the get-go, you could re-design your service's Save mechanism without affecting the dependent client using the service.


The following is how ServiceStack is able to prevent that problem. With ServiceStack, the notion of request and response is so formalized well that it's nigh impossible to make your code brittle as the one we discussed above. The following is a typical ServiceStack code:

class Hello      
{
    public string Name { get; set; }
            
}

class HelloResponse
{
    public string Result { get; set; }      
    public int Viva { get; set; }
}

class HelloService : ServiceStack.ServiceHost.IService<Hello>
{
    
    public object Execute(Hello request) 
    {
        return  new HelloResponse { Result = "Good morning " + request.Name + "!", Viva = 1976 };
    }
    
}   

For a complete walkthrough how to make a ServiceStack service, read it at http://www.ienablemuch.com/2012/04/servicestack-setup-screenshots-guide.html. Or read it at http://www.servicestack.net/ServiceStack.Hello/

With ServiceStack, even you add a property on before or after of property Name, the existing code that call the ServiceStack will not fail, as ServiceStack can enforce (via interface) that your service's method's paramater can only accept a class type.


The ServiceHost's IService is very simple:

namespace ServiceStack.ServiceHost
{
    public interface IService<T>
    {
        object Execute (T request);
    }
}


So here's how you will call the ServiceStack on front-end (JsonServiceClient is in ServiceStack.Common.DLL):

public class MainClass
{
    public static void Main()
    {               
        var client = new ServiceStack.ServiceClient.Web.JsonServiceClient();                
        HolaResponse r =  client.Send<HolaResponse>("GET","http://127.0.0.1:8080/yourServiceStack/como-esta", new Hola { Name = "Michael Buen" });
        System.Console.WriteLine ("* {0}\n* {1}", r.Result, r.Viva);
    }
    
}


class Hola {
    public string Name { get; set; }
}
                
        
class HolaResponse {
    public string Result { get; set; }
    public int Viva { get; set; }
}


You can also make your class name different from ServiceStack's returned class. And with ServiceStack, you can avoid code generated classes (as opposed to WCF every-classes-are-generated approach), making your code very lightweight. Even when there are new methods added on your ServiceStack, there will be no such thing on your ServiceStack-consuming code analogous to WCF's Update Service Reference(on which by itself is another code-generated stub class mess), you just call the ServiceStack services via their RESTful urls. How frictionless is that? ツ



And with C# 3's anonymous type and 4's dynamic, you can eliminate those two classes altogether.
public class MainClass
{
    public static void Main()
    {   
        var client = new ServiceStack.ServiceClient.Web.JsonServiceClient();                
        dynamic r =  client.Send<string>("GET","http://127.0.0.1:8080/yourServiceStack/como-esta", new { Name = "Michael Buen" }).JsDeserialize();
        System.Console.WriteLine ("* {0}\n* {1}", r.Result, r.Viva);
    }
}


Output:
* Good morning Michael Buen!
* 1976


Happy Coding! ツ


By the way, the following is the JSON dynamic deserializer. Code sourced here: http://www.drowningintechnicaldebt.com/ShawnWeisfeld/archive/2010/08/22/using-c-4.0-and-dynamic-to-parse-json.aspx


using System;
using System.Collections.Generic;
using System.Linq;
using System.Dynamic;
using System.Collections;
using System.Web.Script.Serialization;
using System.Collections.ObjectModel;



namespace DynamicJson
{
    public class DynamicJsonObject : DynamicObject
    {
        private IDictionary<string, object> Dictionary { get; set; }

        public DynamicJsonObject(IDictionary<string, object> dictionary)
        {
            this.Dictionary = dictionary;
        }

        public override bool TryGetMember(GetMemberBinder binder, out object result)
        {
            result = this.Dictionary[binder.Name];

            if (result is IDictionary<string, object>)
            {
                result = new DynamicJsonObject(result as IDictionary<string, object>);
            }
            else if (result is ArrayList && (result as ArrayList) is IDictionary<string, object>)
            {
                result = new List<DynamicJsonObject>((result as ArrayList).ToArray().Select(x => new DynamicJsonObject(x as IDictionary<string, object>)));
            }
            else if (result is ArrayList)
            {
                result = new List<object>((result as ArrayList).ToArray());
            }

            return this.Dictionary.ContainsKey(binder.Name);
        }
    }

    public class DynamicJsonConverter : JavaScriptConverter
    {
        public override object Deserialize(IDictionary<string, object> dictionary, Type type, JavaScriptSerializer serializer)
        {
            if (dictionary == null)
                throw new ArgumentNullException("dictionary");

            if (type == typeof(object))
            {
                return new DynamicJsonObject(dictionary);
            }

            return null;
        }

        public override IDictionary<string, object> Serialize(object obj, JavaScriptSerializer serializer)
        {
            throw new NotImplementedException();
        }

        public override IEnumerable<Type> SupportedTypes
        {
            get { return new ReadOnlyCollection<Type>(new List<Type>(new Type[] { typeof(object) })); }
        }
    }
}


namespace DynamicJson.Extensions
{
    public static class Helper
    {
        public static string JsSerialize(this object obj)
        {
            return new JavaScriptSerializer().Serialize(obj);
        }

        public static dynamic JsDeserialize(this string str)
        {
            var jss = new JavaScriptSerializer();
            jss.RegisterConverters(new JavaScriptConverter[] { new DynamicJsonConverter() });

            // Mono don't have this method:
            // return jss.Deserialize(str, typeof(object)) as dynamic;          

            // Just call the generic-styled method(of which Microsoft C# also has). Portability FTW!
            return jss.Deserialize<object>(str) as dynamic;
        }
    }

    
}

Wednesday, May 23, 2012

Running total. Here and now

Sql Server 2012 is upon us, it's 2012 already. But for some people, their here and now is still Sql Server 2008


Read first how not to do(set-based) running total at http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/running-sums-redux.aspx


Given Sql Server 2008 windowing capability limitation, most would give their new-fangled CTE skills a shot by writing running total query in a recursive manner:

with T AS
(
 select ROW_NUMBER() over(order by OrderID) as rn, * from test
)
,R(Rn, OrderId, Qty, RunningTotal) as
(
 select Rn, OrderID, Qty, Qty
 from t 
 where rn = 1
 
 union all
 
 select t.Rn, t.OrderId, t.Qty, p.RunningTotal + t.Qty
 from t t
 join r p on t.rn = p.rn + 1
 
)
select R.OrderId, R.Qty, R.RunningTotal from r
option(maxrecursion 0)


All is fine and dandy, except when that query is ran on a production database, that query will not scale, that query took a good 9 seconds on a 5,000 rows table. (DDL at the bottom of this post)

Now let's try another approach, let's think for a while we are back in time, say Sql Server 2000. What would your DBA grandpa would do to facilitate that running total report?

create function TestRunningTotal()
returns @ReturnTable table(
    OrderId int, Qty int, RunningTotal int
)
as begin

    insert into @ReturnTable(OrderID, Qty, RunningTotal)
    select OrderID, Qty, 0 from Test
    order by OrderID;

    declare @RunningTotal int = 0;

    update @ReturnTable set 
           RunningTotal = @RunningTotal, 
           @RunningTotal = @RunningTotal + Qty;

    return;
end;

And that query took 0 second.


And go back further in time, say Sql Server 7, what would he do? He would follow Adam Machanic's approach: http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/running-sums-redux.aspx. Cursor is one of the rarity cases where running total is a very good choice.


The following is every developer's heaven, works on Postgresql 8.4, Sql Server 2012 and Oracle 9(or 8.1.7 ?):

select OrderID, Qty, sum(Qty) over(order by OrderID) as RunningTotal from test

Now back to regular programming(Sql Server 2008).

Some called that kind of update that relies on physical sort a quirky update. If you feel uneasy with quirky update, you might want to put a guard statement to prevent wrong update order.

create function TestRunningTotalGuarded()
returns @ReturnTable table(
    OrderId int, Qty int, RunningTotal int not null, RN int identity(1,1) not null
)
as begin

    insert into @ReturnTable(OrderID, Qty, RunningTotal)
    select OrderID, Qty, 0 from Test
    order by OrderID;
    
    declare @RunningTotal int = 0;
    
    declare @RN_check INT = 0;
    
    update @ReturnTable set 
            @RN_check = @RN_check + 1,
            @RunningTotal = (case when RN = @RN_check then @RunningTotal + Qty else 1/0 end),
            RunningTotal = @RunningTotal;

    return;
end;


If UPDATE really update rows in unpredictable order, the @RN_Check will not be equal to RN(identity order) anymore, the code will raise a divide-by-zero error then.



DDL

create table Test(
 OrderID int primary key,
 Qty int not null
);


declare @i int = 1;

while @i <= 5000 begin
 insert into Test(OrderID, Qty) values (@i * 2,rand() * 10); 
 set @i = @i + 1;
end;


Running total example results:
OrderId     Qty         RunningTotal
----------- ----------- ------------
2           4           4
4           8           12
6           4           16
8           5           21
10          3           24
12          8           32
14          2           34
16          9           43
18          1           44
20          2           46
22          0           46
24          2           48
26          6           54
28          2           56
30          8           64
32          6           70
34          0           70
36          4           74
38          2           76
40          5           81
42          4           85



UPDATE: May 29, 2012

Use cursor, quirky update is well.. quirky. Until further notice, please use something predictable, like cursor. http://www.ienablemuch.com/2012/05/recursive-cte-is-evil-and-cursor-is.html

UPDATE: May 29, 2012 8:27 PM

Just by putting a clustered primary key on the table variable, it makes the updating of rows in order. Check the test(looped 100 times) on the bottom of this article, it has no divide-by-zero(guard statement) error anymore: http://www.ienablemuch.com/2012/05/recursive-cte-is-evil-and-cursor-is.html

Until further notice, I would say quirky update is not really quirky.

Thursday, May 10, 2012

Generics object-orientation. Untyped generic is the key to generic's OOPness

Suppose you have these classes:


class Animal {
}

class Dog : Animal {
}

class Plant {

}


We knew that...
...
{
 // these works:
 MakeSound(new Animal());
 MakeSound(new Dog());
 
 // and this doesn't:
 MakeSound(new Plant());   
}


public static void MakeSound(Animal a) {
}





Then suppose we have this existing code:

public static void AddAnimal(IList<Animal> aList) {
 foreach(Animal a in aList) {
 }
 
 aList.Add(new Animal());
}


And we want that function to be instantly accessible to all Animal's derived type. That is, we want the IList<Dog> be accepted on that function too.


That is not possible, and if that could be possible, it will be dangerous, which we shall discover later on. So this will fail:

IList<Dog> dogs = new List<Dog>();
AddAnimal(dogs);

Produces this compile-time error:


cannot convert `System.Collections.Generic.IList<Dog>' expression to type `System.Collections.Generic.IList<Animal>'


For an AddAnimal to accept other types, we follow this pattern:

public static void AddAnimal<T>(IList<T> aList) where T : new() {
 foreach(Animal a in aList) {
 }
 
 aList.Add(new T());
}  

Using that function, the IList<Dog>'s Dog can be slotted on untyped T, hence the compiler allowing us to pass the dogs of type IList<T> to that function. You need to put new() on function declaration if you intend to create an object out of T. So this will work now:


IList<Dog> dogs = new List<Dog>();
AddAnimal(dogs);

And you could do this as well:


IList<Plant> plants = new List<Plant>();
AddAnimal(plants);

Oops! Any discerning object-oriented programmers worth his salt, could quickly discern that the above-code is not object-oriented, plant did not derive from Animal, AddAnimal should accept Animal only. To do that, simply put a constraint on the accepted types on the generic's parameter. We just put a where T : BaseType where the BaseType here is the Animal class

public static void AddAnimal<T>(IList<T> aList) where T : Animal, new() {
  foreach(Animal a in aList) {
  }
  
  aList.Add(new T());
}  

This will not work anymore:

IList<Plant> plants = new List<Plant>();
AddAnimal(plants);

Its compilation error:
Plant' cannot be used as type parameter `T' in the generic type or method `TestGenCompat.MainClass.AddAnimal<T>(System.Collections.Generic.IList<T>)'. There is no implicit reference conversion from `Plant' to `Animal'


To recap, these should work:

IList<Animal> anims = new List<Animal>();
AddAnimal(anims);

IList<Dog> dogs = new List<Dog>();
AddAnimal(dogs);

Now let's explore again the old code, I mentioned that it's dangerous if it's possible to pass dogs to this method:

public static void AddAnimal(IList<Animal> aList) {
 foreach(Animal a in aList) {
 }
 
 aList.Add(new Animal());
}



What will happen if they allowed passing derived types to that method? Let's simulate if that is allowed in the first place.

public static void AddAnimal<T>(IList<T> xList) where T : Animal, new() {
 IList<Animal> aList = (IList<Animal>) xList;  

 foreach(Animal a in aList) {
 }
 
 aList.Add(new Animal());
}

But alas, C#'s generic carries the type it is genericizing. Though our casting of IList<T> to IList<Animal> is allowed, during runtime it is checked if the passed variable's type signature matches the type we are casting to. So if we pass an instance of IList<Dog>, that would result to casting error during runtime.


So to simulate the inherent danger if a given language allows us to merely use the untyped generic, let's look at other languages, let's choose choose Java.

First we already knew that this is not valid and can be caught during compile-time, same with C# :

List<Dog> dogs = new ArrayList<Dog>();
List<Animal> anims = (List<Animal>)dogs;

Now let's turn to Java's method that is constrained on Animal type. Then we try to cast it:

public static <T extends Animal> void addAnimal(List<T> aList)
  throws InstantiationException, IllegalAccessException
{
 // On Java, not exactly equal generic types can't be caught during runtime.
 // C# can
 List<Animal> list = (List<Animal>) aList;

 for(Animal x : list) {
 }
 
 list.add(new Animal());
}


Now let's iterate the list after we passed it to that function:

{
 List<Dog> dogs = new ArrayList<Dog>();
 addAnimal(dogs);
 addAnimal(dogs);
 System.out.println("See " + dogs.size());

 for(Animal x : dogs ) {
  System.out.println(x);
 }
}

That code prints 2. The problem is in the for loop.

Exception in thread "main" java.lang.ClassCastException: Animal cannot be cast to Dog

Though the content of the dogs collection are two Animals, and is compatible to Animal x. The for loop don't even reach that part(Animal x) of the loop. The mere act of extracting an object from dogs' iterator is actually doing these steps:


Dog d = dogs.get(0); 
Animal x = d; 

The second line is perfectly fine. However, the first line has the problem, or rather the object in the collection is the root cause of the problem, if the Animal was not possible to be added in dogs collections, we will not be receiving any casting exception, as all dogs' elements are Dog.


So while a Dog Is-An Animal:

Dog x = new Dog();
Animal y = x;

An Animal Is-Not-A Dog, hence this would result to casting exception:

Animal a = new Animal(); // think of this as dogs.get(0)
Dog b = a; // casting exception
Animal x = b; // no error

With type erasure, this code:

public static <T extends Animal> void addAnimal(List<T> aList)
  throws InstantiationException, IllegalAccessException
{
 // Not exactly equal generic can't be caught during runtime
 List<Animal> list = (List<Animal>) aList;
}

Is actually compiled to JVM like this:

public static void addAnimal(List aList) {
   List list = aList;
   
   list.add(new Animal());
}


So that's it, in Java it's not entirely feasible during runtime that adding an Animal to a List<Dog> type can be prevented. And the consequence is, when we ultimately needed to unbox the object out of that list to its proper type, it will cause a casting exception. C# generics can prevent that scenario, as its generics carry the type; Java's generics erases the type, its generics merely shift the burden of castings away from the programmer. Behind the scenes(in JVM level), Java generics are untyped objects and are merely cast back when accessing the object.


So there goes the rationale of not allowing OOP on typed generics on function. And it requires type erasure on generic's parameter, of which C# is not designed to be.


To summarize, untyped generics coupled with type constraining (via where T : typehere) is the only way to achieve OOP nirvana on generics