Monday, August 26, 2013

Linq's Composability and Reusability

Nothing can beat the composability and reusability of Linq, it enables one-stop-shop coding.


Case in point, getting a page from a filtered list and getting the total count of the filtered list is just a walk in the park with Linq, they can all be done on one spot.


using (var session = TheMapper.GetSessionFactory().OpenSession())
using (var tx = session.BeginTransaction())
{
    var products = session.Query<Product>();

    if (filter.ProductName != null) 
        products = products.Where(x => x.Name.Contains(filter.ProductName));                

    if (filter.ProductModelName != null)
        products = products.Where(x => x.ProductModel.Name.Contains(filter.ProductModelName));


    var result = new ProductResultDto();                
    
    result.Products =
            products
            .OrderBy(x => x.Name)
            .Take(filter.PageSize).Skip((filter.PageNumber - 1) * filter.PageSize)
            .Select(x => new ProductDto { ProductName = x.Name, ProductModelName = x.ProductModel.Name })
            .ToList();

    result.Count = products.Count();

    return result;                
}//using


How it is done when writing queries directly on database instead:

create procedure GetList(
    @ProductName nvarchar(400),
    @ProductModelName nvarchar(400),
    @PageNumber int,
    @PageSize int
)
as
begin


    declare @rn int = (@PageNumber-1) * @PageSize;

    with a as
    (
        select 
            RN = row_number() over(order by p.Name),
            ProductName = p.Name, ProductModelName = m.Name
        from Production.Product p
        left join Production.ProductModel m on p.ProductModelID = m.ProductModelID
        where 
            (
                @ProductName is null 
                or p.Name like '%' + @ProductName + '%'
            )
            and
            (
                @ProductModelName is null 
                or m.Name like '%' + @ProductModelName + '%'
            )
    )
    select 
        top (@PageSize)
        a.ProductName, a.ProductModelName 
    from a
    where a.rn > @rn;


    select count(*) as cnt
    from Production.Product p
    left join Production.ProductModel m on p.ProductModelID = m.ProductModelID
    where 
        (
            @ProductName is null 
            or p.Name like '%' + @ProductName + '%'
        )
        and
        (
            @ProductModelName is null 
            or m.Name like '%' + @ProductModelName + '%'
        );


end;


The above code is not DRY, you can see the filter being repeated in two places. We can avoid wetting the query by refactoring it:


create function GetList_Func(
    @ProductName nvarchar(400),
    @ProductModelName nvarchar(400)
) returns table
as
return
    select 
        RN = row_number() over(order by p.Name),
        ProductName = p.Name, ProductModelName = m.Name
    from Production.Product p
    left join Production.ProductModel m on p.ProductModelID = m.ProductModelID
    where 
        (
            @ProductName is null 
            or p.Name like '%' + @ProductName + '%'
        )
        and
        (
            @ProductModelName is null 
            or m.Name like '%' + @ProductModelName + '%'
        );
go



create procedure GetList_Refactored(
    @ProductName nvarchar(400),
    @ProductModelName nvarchar(400),
    @PageNumber int,
    @PageSize int
)
as
begin

    declare @rn int = (@PageNumber-1) * @PageSize;
    
    select 
        top (@PageSize)
        a.ProductName, a.ProductModelName 
    from GetList_Func(@ProductName, @ProductModelName) a
    where a.rn > @rn;

    select count(*) as cnt
    from GetList_Func(@ProductName, @ProductModelName);

end;

go

However, it's not one-stop-shop anymore. The query is now being defined in two places.



Happy Computing! ツ


One-stop-shop code:
namespace OrmFtw
{

    using System;
    using System.Collections.Generic;
    using System.Linq;

    using NHibernate.Linq;

    using OrmFtw.Models;
    using OrmFtw.Mapper;


    class Program
    {
        static void Main(string[] args)
        {
            // var result = GetList(new FilterRequest { ProductName = "Black", PageSize = 10, PageNumber = 2 });
            var result = GetList(new FilterRequest { ProductModelName = "Frame", PageSize = 10, PageNumber = 2 });

            Console.WriteLine("Total: {0}", result.Count);
            Console.WriteLine("Second Page: ");
            foreach (var prod in result.Products)
            {
                Console.WriteLine("Product: {0}", prod.ProductName);
                Console.WriteLine("Model: {0}", prod.ProductModelName);
            }

            Console.ReadKey();
        }

        private static ProductResultDto GetList(FilterRequest filter)
        {
            using (var session = TheMapper.GetSessionFactory().OpenSession())
            using (var tx = session.BeginTransaction())
            {
                var products = session.Query<Product>();

                if (filter.ProductName != null)
                    products = products.Where(x => x.Name.Contains(filter.ProductName));

                if (filter.ProductModelName != null)
                    products = products.Where(x => x.ProductModel.Name.Contains(filter.ProductModelName));


                var result = new ProductResultDto();

                result.Products =
                        products
                        .OrderBy(x => x.Name)
                        .Take(filter.PageSize).Skip((filter.PageNumber - 1) * filter.PageSize)
                        .Select(x => new ProductDto { ProductName = x.Name, ProductModelName = x.ProductModel.Name })
                        .ToList();

                result.Count = products.Count();

                return result;
            }//using

        }//GetList()
    }

    public class FilterRequest
    {
        public string ProductName { get; set; }
        public string ProductModelName { get; set; }

        public int PageNumber { get; set; }
        public int PageSize { get; set; }
    }

    public class ProductResultDto
    {
        public int Count { get; set; }

        public IList<ProductDto> Products { get; set; }
    }

    public class ProductDto
    {
        public string ProductName { get; set; }
        public string ProductModelName { get; set; }
    }

}



namespace OrmFtw.Models
{
    class Product
    {
        public virtual int ProductID { get; set; }
        public virtual string Name { get; set; }
        public virtual string ProductNumber { get; set; }

        public virtual ProductModel ProductModel { get; set; }
    }
}


namespace OrmFtw.Models
{

    public class ProductModel
    {
        public virtual int ProductModelID { get; set; }
        public virtual string Name { get; set; }
    }
}

namespace OrmFtw.ModelMappings
{
    using NHibernate.Mapping.ByCode.Conformist;
    using OrmFtw.Models;

    class ProductMapping : ClassMapping<Product>
    {
        public ProductMapping()
        {
            Table("Production.Product");
            Id(x => x.ProductID);
            Property(x => x.Name);
            Property(x => x.ProductNumber);


            ManyToOne(x => x.ProductModel, x => x.Column("ProductModelID"));

        }
    }
}


namespace OrmFtw.ModelMappings
{

    using NHibernate.Mapping.ByCode.Conformist;
    using OrmFtw.Models;

    class ProductModelMapping : ClassMapping<ProductModel>
    {
        public ProductModelMapping()
        {
            Table("Production.ProductModel");
            Id(x => x.ProductModelID);
            Property(x => x.Name);
        }
    }
}



namespace OrmFtw.Mapper
{
    using NHibernate;
    using NHibernate.Cfg;
    using NHibernate.Mapping.ByCode;
    using NHibernate.Cfg.MappingSchema;

    using OrmFtw.ModelMappings;

    public class TheMapper
    {
        static ISessionFactory _sessionFactory;
        public static ISessionFactory GetSessionFactory()
        {
            if (_sessionFactory != null)
                return _sessionFactory;

            var cfg = new Configuration();
            var mapper = new ModelMapper();

            mapper.AddMappings(
                new[] {
                    // Entities
                    typeof(ProductMapping), 
                    typeof(ProductModelMapping)
                });



            cfg.DataBaseIntegration(c =>
            {
                c.Driver<NHibernate.Driver.Sql2008ClientDriver>();
                c.Dialect<NHibernate.Dialect.MsSql2012Dialect>();
                c.ConnectionString = "Server=localhost; Database=AdventureWorks2012; Trusted_Connection=true;";

                c.LogFormattedSql = true;
                c.LogSqlInConsole = true;
            });


            HbmMapping domainMapping = mapper.CompileMappingForAllExplicitlyAddedEntities();

            cfg.AddMapping(domainMapping);

            _sessionFactory = cfg.BuildSessionFactory();

            return _sessionFactory;
        }


    }

}

Sunday, August 25, 2013

Safe Navigation via Linq

AdventureWorks product's model is nullable. Null exception will be raised when we navigate the Name of the ProductModel from the Product.

var products = session.Query<Product>().Fetch(x => x.ProductModel);

foreach (var prod in products)
{
        Console.WriteLine("Product: {0}", prod.Name);
   
        // this will raise a null exception when Production.Product.ProductModelID is null
        Console.WriteLine("Model: {0}", prod.ProductModel.Name); 
}


Aside from using null-safe navigation extension method...

var products = session.Query<Product>().Fetch(x => x.ProductModel);

foreach (var prod in products)
{
        Console.WriteLine("Product: {0}", prod.Name);
        Console.WriteLine("Model: {0}", prod.ProductModel.NullSafe(x => x.Name));
}

...we can just flatten the domain models from Linq's projection, the database won't balk on navigating nullable product model from the product, then the ORM can happily map the null value to the property.

var products = session.Query<Product>()
                    // .Fetch(x => x.ProductModel) // This is not needed anymore, as we efficiently select only the needed columns on our projection
                    .Select(x => 
                            new { 
                                 ProductName = x.Name, 
                                 ProductModelName = x.ProductModel.Name // database don't have problem projecting null
                            });

foreach (var prod in products)
{
        Console.WriteLine("Product: {0}", prod.ProductName);
        Console.WriteLine("Model: {0}", prod.ProductModelName);
}

We can't do that trick on in-memory objects, for those scenario just use safe navigation extension method.



Happy Coding! ツ

Friday, August 23, 2013

Microsoft === Open Source

In my dream world, Microsoft is truly open source. Some years ago, you won’t see Microsoft and Open Source on the same sentence. But times they are a-changin’:


Microsoft recently made a change to their OSS page which now lists alternative web frameworks on the .Net framework which is great and thanks to Scott Hanselman for doing this.

And here are the reasons why:



But it looks like Microsoft is having a hard time breaking the old habits: http://haacked.com/archive/2013/06/24/platform-limitations-harm-net.aspx


Just a digression on Phil Haack’s post, look at the acronym naming convention for three letters or more, Microsoft is pretty much consistent on it. Hence Microsoft’s Base Class Libraries acronym is named as Bcl, i.e.:

Microsoft.Bcl.Compression
Microsoft.Bcl.Immutable

…, the acronyms are not named in an obnoxious manner:

Microsoft.BCL.Compression
Microsoft.BCL.Immutable



You can also notice this on the latest framework for our world domination, ASP.NET MVC. Its namespace in the framework is not named as System.Web.MVC, it is named as System.Web.Mvc.



Happy Computing! ツ

ServiceStack: A Fast Service Framework

If you are planning for world domination, your service framework must be able to serve requests fast, obscenely fast. Here's a benchmark for ServiceStack, WebAPI, Nancy, WCF:






ServiceStack is very fast. WebAPI is more than 3x slower than ServiceStack



The only problem with ServiceStack and NHibernate for that matter, though both are written in Microsoft language, both of them are not made by Microsoft



Downloaded the code from: https://github.com/ServiceStack/ServiceStack/wiki/Real-world-performance


Just add the corresponding WCF benchmarking code, it's trivial to add


Happy Computing! ツ

Saturday, August 10, 2013

NOT IN is faster than LEFT JOIN IS NULL combo

The best thing about pickup lines is in spite of its blatant approach, their intent is very clear. You don't need to take a second guess, they can spare you the trouble of being friendzoned as they will tell you right away if they like you are not, or at least you can feel it.


If you will try to date a DBA, perhaps you can use this pickup line:


NOT IN is faster than LEFT JOIN + IS NULL combo

Why?

Because 135 millisecond vs 2,194 millisecond


The best wingman for dating DBAs is http://explainextended.com/



NOT IN: 135 milliseconds only:

/*------------------------
 
SELECT  l.id, l.value
FROM    [20090915_anti].t_left l
WHERE   l.value NOT IN
        (
        SELECT  value
        FROM    [20090915_anti].t_right r
        )
           
------------------------*/
SQL Server parse and compile time:
   CPU time = 6 ms, elapsed time = 6 ms.
 
(10 row(s) affected)
Table 't_left'. Scan count 1, logical reads 499, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_right'. Scan count 1, logical reads 3112, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 
SQL Server Execution Times:
   CPU time = 125 ms,  elapsed time = 128 ms.





LEFT JOIN IS NULL: 2,194 milliseconds:

/*------------------------
SELECT  l.id, l.value
FROM    [20090915_anti].t_left l
LEFT JOIN
        [20090915_anti].t_right r
ON      r.value = l.value
WHERE   r.value IS NULL 
------------------------*/
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 10 ms.
 
(10 row(s) affected)
Table 't_left'. Scan count 9, logical reads 1984, physical reads 27, read-ahead reads 498, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_right'. Scan count 9, logical reads 3166, physical reads 37, read-ahead reads 3106, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 9999, logical reads 2239770, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 
SQL Server Execution Times:
   CPU time = 14646 ms,  elapsed time = 2184 ms.



NOT IN is fast and efficient in CPU and Reads


NOT IN            --  CPU:    128 Reads:     3,619 Duration:   135
LEFT JOIN IS NULL --  CPU: 14,642 Reads: 2,254,967 Duration: 2,194




Even we are using an smart RDBMS (we are using SQL Server though) and it can discern that our LEFT JOIN IS NULL is doing an anti-join(i.e., a NOT IN) instead, hence our RDBMS gives our LEFT JOIN IS NULL the same fast performance as NOT IN, there's no reason for us to use LEFT JOIN IS NULL for anti-joins. Don't obscure the code's NOT IN intent, don't replace NOT IN with LEFT JOIN IS NULL. Use NOT IN when we need to detect if a column's value is NOT IN another table. See how intuitive is that? Do not complicate a simple concept with LEFT JOIN IS NULL combo.

LEFT JOIN IS NULL is too highfalutin-a-programmin', LEFT JOIN IS NULL is not the place where we should showboat our programming prowess on SQL Server or any RDBMS for that matter. I'm seeing a few (no, not many, fortunately our company hires the best devs in town) highfalutin codes not just in our SQL codes but on C# codes too, I'll put them on a blog when the list is comprehensive enough.



Do not confuse motion and progress. A rocking horse keeps moving but does not make any progress. -- Alfred Montapert



SELECT p.*, e.IsTerminated
FROM dbo.Person p
join dbo.Employee e on p.PersonID = e.EmployeePersonId

left join dbo.PersonResponsibility pr on p.PersonId = pr.ReviewerPersonId
-- Hey IS NULL buddy! what ya doin' down far there?

where
    e.AllowLogin = 1

    /*
    other 
    humongous 
    filters 
    here 
    competing 
    for 
    your 
    attention
    */        
 
    -- Hey LEFT JOIN! keeps movin' here buddy, just chillin' and filterin' your dumb join, makin' queries slow
    and pr.ReviewerPersonId is null



With LEFT JOIN and IS NULL, those two star-crossed lovers could be far apart from each other, thus making code analysis a little harder; contrast that to NOT IN which is pretty much a one-stop-shop syntax, everything is in one place, thus making NOT IN easier to analyze if there's even analysis at all, the code's intent is self-describing. And oh, did I mention NOT IN is fast? Ironically, the humble-looking NOT IN approach is fast:


SELECT p.*, e.IsTerminated
FROM dbo.Person p
join dbo.Employee e on p.PersonID = e.EmployeePersonId

where
    e.AllowLogin = 1

    /*
    other 
    humongous 
    filters 
    here 
    competing 
    for 
    your 
    attention
    */         

    -- Yay! no more LEFT JOIN buddy that can ruin the party!

   and p.PersonId not in (select pr.ReviewerPersonId from dbo.PersonResponsibility pr)



Unless we really found a corner case where LEFT JOIN IS NULL combo could be faster than NOT IN, we should just keep using NOT IN, aside from it's faster, it is easier to read. And before someone suggest NOT EXISTS, have I forgot to mention IN / NOT IN is easier to read



Happy Coding! ツ

Wednesday, August 7, 2013

Fail Fast: Why we should avoid staging tables as much as possible

We are not talking about Fail Fast the philosophical one as not everyone are sold with that idea. We will tackle Fail Fast the engineering one



http://martinfowler.com/ieeeSoftware/failFast.pdf:
The most annoying aspect of software development, for me, is debugging. I don't mind the kinds of bugs that yield to a few minutes' inspection. The bugs I hate are the ones that show up only after hours of successful operation, under unusual circumstances, or whose stack traces lead to dead ends. Fortunately, there’s a simple technique that will dramatically reduce the number of these bugs in your software. It won't reduce the overall number of bugs, at least not at first, but it'll make most defects much easier to find. The technique is to build your software to "fail fast."



In pursuit of speed, one will argue that First is better than Single:


var e = session.Query<Employee>().First(x => x.PersonId == personId);


An HR system which is a multi-tenant one and was designed around composite primary keys, but the dev forgot to add a company filter on the query above, then a few days later their client Microsoft is complaining that their system is randomly returning an employee: "Lee Kai-Fu is not working with us anymore, why is he included in our employee list? Please effin' correct your app, Steve Ballmer will throw chair at us, Tim Cook is not our CEO!"



The cause of the error is on the usage of First, which doesn't Fail Fast. Have they used Single, the following will not continue if it returns more than one employee, it will throw an exception, it will fail fast.

var e = session.Query<Employee>().Single(x => x.PersonId == personId);


Another scenario where we should especially strive for Fail Fast is on reports. While we are optimizing one stored proc which has a performance problem due to staging tables approach, we uncovered a subtle bug that doesn't fail fast, the error is not so pronounced with staging tables as compared with straight query.



The following is an oversimplification of the original stored proc we are optimizing, but is simple enough to illustrate the nature of the problem, given the following data:

create table Person
(
    PersonCode char(1) primary key,
    PersonName nvarchar(20) not null
);
 
create table Product
(
    ProductCode char(1) primary key,
    ProductName nvarchar(20) not null
);
 
create table Person_Likes_Product
(
    PersonCode char(1) not null references Person(PersonCode),
    ProductCode char(1) not null references Product(ProductCode),
    Rating int not null,
    constraint ck_Rating check(Rating between 1 and 5)
);
 
insert into Person(PersonCode, PersonName) values
('J','John'), ('P','Paul'), ('G','George'), ('R','Ringo');
 
insert into Product(ProductCode, ProductName) values
('C','CPU'), ('K','Keyboard'), ('M','Mouse'), ('S','Smartphone');
 
insert into Person_Likes_Product(PersonCode, ProductCode, Rating) values
('J', 'C', 5),
('J', 'K', 3),
('P', 'C', 5),
('P', 'K', 4),
('P', 'S', 5),
('G', 'S', 5);

...we should list all products with Paul's rating on them. It should produce the following output:


ProductCode ProductName          Rating
----------- -------------------- -----------
C           CPU                  5
K           Keyboard             4
M           Mouse                NULL
S           Smartphone           5
(4 row(s) affected)
This is the original stored proc:

create procedure PersonLikesStagingTables
(
    @PersonID char(1)
) as
begin
     
    -- staging table
    declare @productRating table
    (
        ProductCode char(1),
        ProductName nvarchar(20),
        Rating int
    );
     
    insert into @productRating(ProductCode, ProductName)
    select p.ProductCode, p.ProductName
    from Product p;
     
     
    update p
        set p.Rating = l.Rating
    from @productRating p
    left join Person_Likes_Product l on p.ProductCode = l.ProductCode;
     
     
    select p.ProductCode, p.ProductName, p.Rating
    from @productRating p
    order by p.ProductName;
     
end;
go
 
exec PersonLikesStagingTables 'P';
The output of that stored proc has an error:

ProductCode ProductName          Rating
----------- -------------------- -----------
C           CPU                  5
K           Keyboard             3
M           Mouse                NULL
S           Smartphone           5
(4 row(s) affected)
Holy Guacamole! Have you spotted the error on the code above? Why Paul's keyboard has a rating of 3 instead of 4? If this a price listing, your company could go bankrupt.



Upon further inspection, Paul's rating was not applied to the staging table:

update p
    set p.Rating = l.Rating
from @productRating p
left join Person_Likes_Product l on p.ProductCode = l.ProductCode;
That should be done like this:

update p
    set p.Rating = l.Rating
from @productRating p
left join Person_Likes_Product l on p.ProductCode = l.ProductCode and l.PersonCode = @PersonCode;
Had the original stored proc was written with no staging tables, that is it is using straight query instead, the error will be more pronounced, failing fast:

create procedure PersonLikesFailFast(@PersonCode char(1)) as
begin
    select
        p.ProductCode,
        p.ProductName,
        l.Rating
    from Product p   
    left join Person_Likes_Product l on p.ProductCode = l.ProductCode
     
end;   
go
 
exec PersonLikesFailFast 'P'
That will have an output of this:
ProductCode ProductName          Rating
----------- -------------------- -----------
C           CPU                  5
C           CPU                  5
K           Keyboard             3
K           Keyboard             4
M           Mouse                NULL
S           Smartphone           5
S           Smartphone           5
(7 row(s) affected)
The QA or user could quickly report to the devs that the person's product rating output has an error. If your system has an error, it should fail it fast, it's easy to correct errors when they are discovered early. Staging tables and First extension method deprives your system of failing fast. The errors done on the system with no fail fast mechanism is hard to debug as they sometimes semi-works, they can even linger producing corrupted data.



Aside from making our reports perform fast, it should also be fast on failing when it has an error.



How about the optimization after the removal of staging tables? After removing 5 insert statements, 13 update statements, 3 table variables then converted them to straight query, it improved from 20 seconds to 11 seconds.



And before we do aggressive optimization, let's pay attention with correctness first:

http://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_are_switching_written_numbers_when_scanning



It's easier to make a correct program fast than to make a fast program correct.





Happy Computing! ツ

Monday, August 5, 2013

My 5-year old nephew is very smart




My 5-year old nephew is dreaming of Windows, he likes it very much. But I hope he outgrows it before reaching high school, it's good if he know other operating systems :)


"Real programmers don't write in BASIC. Actually, no programmers write in BASIC after reaching puberty" -- http://www.isaacsoft.com/Humor/RoalProgrammers.html

Friday, August 2, 2013

Let's Just Suck Less




Someone is seeking an advice based on my blog post Why So Fond Of UNION


I don't have the wisdom of a Sage, but here's my piece: Let's just suck less every sprint, every year, it's good enough


The initial step for sucking less is to identify that our code sucks


That advice somehow sucks, but I'll make sure that I’ll be giving more advice and code that sucks less through the passage of time


And let’s not forget: Let's be happy, but not satisfied, my code sucks, your code sucks, suffice to say everyone's code sucks


If we really want to truly build a product that doesn't suck, let's try to build vacuum cleaners ^_^ lol



Happy S... QL writing! ツ