Monday, December 30, 2013

The proper way to deal with old database design (read: composite keys) on NHibernate

In ideal world, we have database that is devoid of composite primary keys.


create table ProductCategory
(
    ProductCategoryId int identity(1,1) primary key, -- ideally the entity is accessed through surrogate primary key

    ProductId int not null references Product(ProductId),
    CategoryId int not null references Category(CategoryId),

    CustomizedProductCategoryDescription nvarchar(200) not null,

    constraint uk_ProductCategory unique(ProductId, CategoryId) -- ideally on unique
);


create table Model
(
    ModelId int identity(1,1) primary key,
    
    -- ideally the entity referenced is accessed by the key, the whole key and nothing but the key.
    -- hear it loud, not keys! not plural, singular key only. capiche? :-)
    ProductCategoryId int not null references ProductCategory(ProductCategoryId), 

    ModelDescription nvarchar(200) not null
);


However not everyone are afforded of a perfect world, composite keys are pervasive on old database designs:

create table ProductCategory
(
    ProductId int not null references Product(ProductId),
    CategoryId int not null references Category(CategoryId),

    CustomizedProductCategoryDescription nvarchar(200) not null,

    constraint pk_ProductCategory primary key(ProductId, CategoryId) -- what an imperfect world
);


create table Model
(
    ModelId int identity(1,1) primary key,
    
    ProductId int not null,
    CategoryId int not null,

    ModelDescription nvarchar(200) not null,

    constraint fk_Model__ProductCategory foreign key(ProductId, CategoryId) references ProductCategory(ProductId, CategoryId) -- why the world have to be imperfect?
);


Ideally, even when mapping that imperfect design, related entities should still be navigable through object reference:

public class ProductCategory
{

    public virtual Product Product { get; set; }
    public virtual Category Category { get; set; }

    public virtual string CustomizedProductCategoryDescription { get; set; }        

    public override bool Equals(object obj)
    {
        if (obj == null)
            return false;
        var t = obj as ProductCategory;
        if (t == null)
            return false;
        if (Product == t.Product && Category == t.Category)
            return true;
        return false;
    }
    public override int GetHashCode()
    {
        return (Product.ProductId + "|" + Category.CategoryId).GetHashCode();
    }
}


class ProductCategoryMapping : ClassMapping<ProductCategory>
{
    public ProductCategoryMapping()
    {            
        ComposedId(
            c =>
            {                    
                c.ManyToOne(x => x.Product, x => x.Column("ProductId"));
                c.ManyToOne(x => x.Category, x => x.Column("CategoryId"));
            });

        Property(x => x.CustomizedProductCategoryDescription);
    }
}

However, the real problem with that kind of domain modeling is that lazy-loading will be defeated, think of Edit screen, you just need to get the IDs of both Product and Category via ProductCategory, there's no way we can avoid the unnecessary fetching of the whole ProductCategory object when we have the kind of domain model like the above. For a good detail why accessing the ProductId from Product of ProductCategory unnecessarily fetches the whole ProductCategory object, read this: http://devlicio.us/blogs/anne_epstein/archive/2009/11/20/nhibernate-and-composite-keys.aspx


Just a mere reading of ProductId from Product of ProductCategory, the app will unnecessarily fetch the whole ProductCategory object. This kind of problem doesn't happen on applications with no composite keys.

So this code..

public static Model LoadModel(int id)
{
    using (var session = SessionMapper.Mapper.SessionFactory.OpenSession())
    {
        var x = session.Load<Model>(1);
        Console.WriteLine("\nHey! {0} {1}", x.ProductCategory.Product.ProductId, x.ModelDescription);
        
        return x;
    }
}

..produces this SQL:
NHibernate:
    SELECT
        model0_.ModelId as ModelId3_0_,
        model0_.ProductId as ProductId3_0_,
        model0_.CategoryId as CategoryId3_0_,
        model0_.ModelDescription as ModelDes4_3_0_
    FROM
        Model model0_
    WHERE
        model0_.ModelId=@p0;
    @p0 = 1 [Type: Int32 (0)]
NHibernate:
    SELECT
        productcat0_.ProductId as ProductId2_0_,
        productcat0_.CategoryId as CategoryId2_0_,
        productcat0_.CustomizedProductCategoryDescription as Customiz3_2_0_
    FROM
        ProductCategory productcat0_
    WHERE
        productcat0_.ProductId=@p0
        and productcat0_.CategoryId=@p1;
    @p0 = 1 [Type: Int32 (0)], @p1 = 2 [Type: Int32 (0)]

Hey! 1 Viking shoe


Not optimized. As you can see, even we just read the ProductId the whole ProductCategory object is also fetched by our app. It looks amateurish when we are just accessing the ProductId and it's already available right there from the source table, yet our app still insist on loading the whole ProductCategory just to get the ProductId


Another problem with the kind of domain model above, when we persist the Model object, the persistence mechanism will become convoluted:

public static string AddModel()
{
    using (var session = SessionMapper.Mapper.SessionFactory.OpenSession())
    {
        var m = new Model
        {
            ProductCategory = session.Load<ProductCategory>(
                  new ProductCategory { Product = session.Load<Product>(1), Category = session.Load<Category>(2) }),
            ModelDescription = "Bad " + DateTime.Now.ToString()
        };

        session.Save(m);
        session.Flush();

        return m.ModelId + " " +  m.ModelDescription;
    }
}


To rectify that amateurish SQL and hideous persistence code, we must isolate the composite keys to their own class:
[Serializable]
public class ProductCategoryIdentifier
{
    public virtual int ProductId { get; set; }
    public virtual int CategoryId { get; set; }

    public override bool Equals(object obj)
    {
        if (obj == null)
            return false;
        var t = obj as ProductCategoryIdentifier;
        if (t == null)
            return false;
        if (ProductId == t.ProductId && CategoryId == t.CategoryId)
            return true;
        return false;
    }
    public override int GetHashCode()
    {
        return (ProductId + "|" + CategoryId).GetHashCode();
    }
}

That class will be the primitive type for the composite key of our ProductCategory:
public class ProductCategory
{
    public virtual ProductCategoryIdentifier ProductCategoryIdentifier { get; set; }
    
    //// To enforce single source-of-truth when creating product category, set both Product and Category properties setter as protected.
    //// When assigning Product and Category, it must be done through the composite key class, i.e. through ProductCategoryIdentifier above

    public virtual Product Product { get; protected set; }
    public virtual Product Category { get; protected set; }
   
    public virtual string CustomizedProductCategoryDescription { get; set; }
}


class ProductCategoryMapping : ClassMapping<ProductCategory>
{
    public ProductCategoryMapping()
    {
        ComponentAsId(
            i => i.ProductCategoryIdentifier, 
            c =>
            {
                c.Property(x => x.ProductId);
                c.Property(x => x.CategoryId);
            });

        ManyToOne(x => x.Product, m =>
        {
            m.Column("ProductId");
            m.Update(false);
            m.Insert(false);                
        });

        ManyToOne(x => x.Category, m =>
        {
            m.Column("CategoryId");
            m.Update(false);
            m.Insert(false);                
        });

        Property(x => x.CustomizedProductCategoryDescription);
    }
}


This will be how our app will fetch the ModelDescription property and ProductId..
public static Model LoadModel(int id)
{
    using (var session = SessionMapper.Mapper.SessionFactory.OpenSession())
    {                
        // Read the ID from Composite Key's separate class(ProductCategoryIdentifier) 
        // ,this way the whole object of ProductCategory won't be unnecessarily fetched.
        var x = session.Load<Model>(1);
        Console.WriteLine("{0} {1}", x.ProductCategory.ProductCategoryIdentifier.ProductId, x.ModelDescription);
        

        return x;
    }        
}

..and the following is the SQL produced by that data access. As expected there's no unnecessary data that was fetched, ProductCategory is not fetched. Very optimized code
NHibernate:
    SELECT
        model0_.ModelId as ModelId3_0_,
        model0_.ProductId as ProductId3_0_,
        model0_.CategoryId as CategoryId3_0_,
        model0_.ModelDescription as ModelDes4_3_0_
    FROM
        Model model0_
    WHERE
        model0_.ModelId=@p0;
    @p0 = 1 [Type: Int32 (0)]

1 Viking shoe

This is how we persist the Model object when it has a composite foreign key..
public static string AddModel()
{
    using (var session = SessionMapper.Mapper.SessionFactory.OpenSession())
    {
        var m = new Model
        {
            ProductCategory = session.Load<ProductCategory>(
                 new ProductCategoryIdentifier { ProductId = 1, CategoryId = 2 }),                    
            ModelDescription = "Good " + DateTime.Now.ToString()
        };

        session.Save(m);
        session.Flush();

        return m.ModelId + " " + m.ModelDescription;
    }
}


..the code looks more clean as compared to the model without a separate class for composite key.


It will be more clean if we don't have composite keys on the database in the first place. Ah legacy systems..!



Complete code: https://github.com/MichaelBuen/TestComposite



Happy Coding! ツ

NHibernate: Foreign key must have same number of columns as the referenced primary key

If you get this kind of error when mapping many-to-one from child to parent, where the parent is using composite key which is mapped using mapping-by-code, e.g. ComponentAsId ..


System.TypeInitializationException was unhandled
  HResult=-2146233036
  Message=The type initializer for 'GoodCompositeMapping.SessionMapper.Mapper' threw an exception.
  Source=GoodCompositeMapping
  TypeName=GoodCompositeMapping.SessionMapper.Mapper
  StackTrace:
       at GoodCompositeMapping.SessionMapper.Mapper.get_SessionFactory()
       at GoodCompositeMapping.SampleLoad.LoadModel(Int32 id) in c:\Users\Michael\Documents\GitHub\TestComposite\TestComposite\GoodCompositeMapping\SampleLoad.cs:line 21
       at TestComposite.Program.Main(String[] args) in c:\Users\Michael\Documents\GitHub\TestComposite\TestComposite\TestComposite\Program.cs:line 15
       at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
       at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
       at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
       at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
       at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
       at System.Threading.ThreadHelper.ThreadStart()
  InnerException: NHibernate.FKUnmatchingColumnsException
       HResult=-2146232832
       Message=Foreign key (FK2F6A0A3BACD3E029:Model [CategoryId])) must have same number of columns as the referenced primary key (ProductCategory [ProductId, CategoryId])
       Source=NHibernate
       StackTrace:
            at NHibernate.Cfg.Configuration.LogAndThrow(Exception exception)
            at NHibernate.Cfg.Configuration.SecondPassCompileForeignKeys(Table table, ISet done)
            at NHibernate.Cfg.Configuration.SecondPassCompile()
            at NHibernate.Cfg.Configuration.BuildSessionFactory()
            at GoodCompositeMapping.SessionMapper.Mapper.GetSessionFactory() in c:\Users\Michael\Documents\GitHub\TestComposite\TestComposite\GoodCompositeMapping\SessionMapper\Mapper.cs:line 58
            at GoodCompositeMapping.SessionMapper.Mapper..cctor() in c:\Users\Michael\Documents\GitHub\TestComposite\TestComposite\GoodCompositeMapping\SessionMapper\Mapper.cs:line 17
       InnerException:  


..chances are you are using a wrong mapping:

ManyToOne(x => x.ProductCategory, pm =>
    {
        pm.Column("ProductId");
        pm.Column("CategoryId");
    });

I saw a solution from stackoverflow that works:
ManyToOne(x => x.ProductCategory,
    c => c.Columns(
        new Action<NHibernate.Mapping.ByCode.IColumnMapper>[]
        {
            x  => x.Name("ProductId"),                        
            x  => x.Name("CategoryId")
        }));

This is the shorthand:
ManyToOne(x => x.ProductCategory, c => c.Columns(x => x.Name("ProductId"), x => x.Name("CategoryId")));


Happy Coding! ツ

Wednesday, December 18, 2013

Say No to ViewData and ViewBag

We've all been there, we got a perfect Model/ViewModel for the view…


public class HomeController : Controller
{

    public ActionResult Index()
    {

        var people = new[]
        {
            new Person { Firstname = "Ely", Lastname = "Buendia"},
            new Person { Firstname = "Raymund", Lastname = "Marasigan"},
            new Person { Firstname = "Buddy", Lastname = "Zabala"},
            new Person { Firstname = "Marcus", Lastname = "Adoro"},
        };

        return View(people);

    } // Index Action

} //Home Controller    


View:

@model IEnumerable<TestAdhocViewModel.Models.Person>

 
<table>    
@foreach (var p in Model)
{
    <tr>
        <td>@p.Firstname</td>
        <td>@p.Lastname</td>
    </tr>
}
</table>


...and then one day, there's a new requirement to bring new info on the view.

You can either put it in ViewData or ViewBag. However, you reluctantly wanted to put those info on ViewData/ViewBag. It pains you to see the contract(Model/ViewModel) between the controller and the view is being violated in the broad daylight, the various information are located in disparate sources, some of them are stored in strongly-typed Model/ViewModel but some are in ViewData/ViewBag. ViewData/ViewBag mars the beauty of operating exclusively within the domain of Model/ViewModel.


You can serialize/deserialize the anonymous types with JSON, but you will lose the benefit of autocomplete. Dynamic types are not navigable. Just because you could doesn't mean you should


So it's safe to say we all love our strongly-typed language. C#, a language so strongly-typed so we don't have to type strongly, thanks autocomplete! :D


We can introduce a view model, but it's too ceremonial when we just wanted to bring only a few new info to view. Scratch that, there's nothing ceremonial with view model when it's done on an ad hoc approach. Whether or not there is really a place for adhocly things to occur on a very formalized system(system that leads to class explosions(read: gazillion of classes), tends to be rampant on strongly-typed language) is up for another contentious discussion; we will not go into that.


When an approach is too ceremonial(read: tedious), we tend to forego that approach, we tend to use the duct tape approach, e.g., we tend to use ViewData/ViewBag, Session


What we really don't want is a view model that is just being used on one part of the system only, yet it pollutes the global namespace; classes in global namespace tends to give the illusion that a view model is very reusable. We wanted that ad hoc view model to be local on one part of the system only, that is that view model should be visible on that part only.


Without further ado, just declare the ViewModel right on the controller only

public class HomeController : Controller
{

    public ActionResult Index()
    {

        var people = new[]
            {
                new Person { Firstname = "Ely", Lastname = "Buendia"},
                new Person { Firstname = "Raymund", Lastname = "Marasigan"},
                new Person { Firstname = "Buddy", Lastname = "Zabala"},
                new Person { Firstname = "Marcus", Lastname = "Adoro"},
            };


        var band = new Band
            {
                BandName = "Eraserheads",
                Members = people
            };
        
        return View(band);

    } // Index Action


    public class Band
    {
        public string BandName { get; set; }
        public IEnumerable<Person> Members { get; set; }
    }

} //Home Controller



View:


@model TestAdhocViewModel.Controllers.HomeController.Band

 
@Model.BandName

 
<table>        
    @foreach (var p in Model.Members)
    {
        <tr>
            <td>@p.Firstname</td>
            <td>@p.Lastname</td>
        </tr>
    }
</table>


If it still feels like you violated a contract between your controller and view, as there's a big impact on your code when the shape of your object changes, i.e., the code changed to: foreach(var p in Model.Members) instead of retaining the old code: foreach(var p in Model), and you wanted a lesser change in your code, tap the features of your language. In C#, you can provide an enumerator for your class instance. Redefine your class to this:


public class Band : IEnumerable<Person>
{
    public string BandName { get; set; }
    // using internal, Members property is accessible to controller only, is not visible on views
    internal IEnumerable<Person> Members { get; set; }

    public IEnumerator<Person> GetEnumerator() { return Members.GetEnumerator(); }
    IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); }
}

With that, you can use the old code on foreach, i.e., you can just use the Model directly on foreach:


@model TestAdhocViewModel.Controllers.HomeController.Band

 
@Model.BandName

 
<table>        
    @foreach (var p in Model)
    {
        <tr>
            <td>@p.Firstname</td>
            <td>@p.Lastname</td>
        </tr>
    }
</table>


For single object, you can apply the same concept above, just use inheritance, if you have this code..

public ViewResult Detail(int id)
{
    var p = new Person
        {
            Firstname = "Ely",
            Lastname = "Buendia"
        };
    
    return View(qp);
} 

View:
@model TestAdhocViewModel.Models.Person

 
<p>Firstname: @Model.Firstname</p>
<p>Lastname: @Model.Lastname</p>


..and you wanted just to bring another information to view, create a derived class inside the controller and inherit the existing Model/ViewModel:

public class HomeController : Controller
{
    public ViewResult Detail(int id)
    {
        var p = new Person
            {
                Firstname = "Ely",
                Lastname = "Buendia"
            };
        var qp = new QualifiedPerson();
        qp.Firstname = p.Firstname;
        qp.Lastname = p.Lastname;
        qp.IsQualified = true;
        
        return View(qp);
    } 

    public class QualifiedPerson : Person
    {
        public bool IsQualified { get; set; }
    }
    
} //Home Controller

This is now your view with minimal modification, just change the model on model directive:


@model TestAdhocViewModel.Controllers.HomeController.QualifiedPerson
 
<p>Firstname: @Model.Firstname</p>  @* Look Ma! No Changes! *@
<p>Lastname: @Model.Lastname</p> @* Look Pa! No Changes! *@
<p>Qualified: @Model.IsQualified</p>

Assigning each values from base class to derived class is tedious though. Just use the awesome ValueInjecter, a.k.a. Omu Value Injecter



public class HomeController : Controller
{
    public ViewResult Detail(int id)
    {
        var p = new Person
            {
                Firstname = "Ely",
                Lastname = "Buendia"
            };
        var qp = new QualifiedPerson();
        qp.InjectFrom(p);
 
        qp.IsQualified = true;

        return View(qp);
    } 

    public class QualifiedPerson : Person
    {
        public bool IsQualified { get; set; }
    }
    
} //Home Controller


Then just use the InjectFrom extension method to copy the values from base class to derived class


Another beauty of placing the new information to their adhoc model/viewmodel, when the model/viewmodel doesn't feel adhocly anymore (e.g., it can be reused, merging multiple model/viewmodel service calls into one), it's easy to move that model/viewmodel to global namespace (e.g., to WCF), and it will only incur minimal adjustments to your controller and view. Contrast that to passing your related information to view in a non-cohesive manner, i.e., some of the info are in model/viewmodel and some are in ViewData/ViewBag, it will incur more adjustments to your code to use the refactored related info. Use ViewData and ViewBag sparingly.


Can we say goodbye to ViewData and ViewBag now?


Happy Coding! ツ

Sunday, December 8, 2013

Pragmatic Domain-Driven Design

There's a DDD, then there's a pragmatic DDD. When we say pragmatic, it means the application must not sacrifice performance


You wanted to get the count of the person's hobbies and you wanted your code to be performant.

public class Person
{
    public virtual int PersonId { get; set; }    
    public virtual string Name { get; set; }
    
    public virtual IList<Hobby> Hobbies { get; set; }
}


This is not DDD. DDD must encapsulate, if we access Hobbies count directly we "can't" add any further condition(say only count the active hobbies) on it and expect the code to be performant, see further comments below.

var p = session.Load<Person>(1);
Console.WriteLine("Count: {0}", p.Hobbies.Count());


This is DDD, intelligence are encapsulated by the domain model. And this is also performant, the list is not eagerly-loaded, the count is performed directly by the database

public virtual int FavoriteHobbiesCount
{
    get
    {
        // Thanks Extra Lazy! This is on PersonMapping:
        //    rel.Lazy(NHibernate.Mapping.ByCode.CollectionLazy.Extra);

        // With Extra Lazy, counting will be performed at the database-side instead of counting the in-memory objects
        return this.Hobbies.Count();
    }
}

// On Main()
var p = session.Load<Person>(1);
var count = p.FavoriteHobbiesCount;
Console.WriteLine("Count: {0}", count);    

Output:


NHibernate:
    SELECT
        person0_.person_id as person1_0_0_,
        person0_.first_name as first2_0_0_,
        person0_.last_name as last3_0_0_
    FROM
        person person0_
    WHERE
        person0_.person_id=:p0;
    :p0 = 1 [Type: Int32 (0)]
NHibernate:
    SELECT
        count(favorite_hobby_id)
    FROM
        favorite_hobby
    WHERE
        person_id=:p0;
    :p0 = 1 [Type: Int32 (0)]
Count: 10





However, that code is not future-proof, Extra Lazy won't work efficiently when you add a condition on the list. i.e., the collection will be eagerly-loaded when you add a condition on it.

public virtual int FavoriteHobbiesCount
{
    get
    {
        // Thanks Extra Lazy! This is on PersonMapping
        //     rel.Lazy(NHibernate.Mapping.ByCode.CollectionLazy.Extra);

        // Hobbies' items will be eagerly-loaded when we add a condition on its Count even we use Extra Lazy
        return this.Hobbies.Count(x => x.IsActive); 
    }
}

// On Main()
var p = session.Load<Person>(1);
var count = p.FavoriteHobbiesCount;
Console.WriteLine("Count: {0}", count);    

Output:

NHibernate:
    SELECT
        person0_.person_id as person1_0_0_,
        person0_.first_name as first2_0_0_,
        person0_.last_name as last3_0_0_
    FROM
        person person0_
    WHERE
        person0_.person_id=:p0;
    :p0 = 1 [Type: Int32 (0)]
NHibernate:
    SELECT
        favoriteho0_.person_id as person2_1_,
        favoriteho0_.favorite_hobby_id as favorite1_1_,
        favoriteho0_.favorite_hobby_id as favorite1_1_0_,
        favoriteho0_.person_id as person2_1_0_,
        favoriteho0_.hobby as hobby1_0_,
        favoriteho0_.is_active as is4_1_0_
    FROM
        favorite_hobby favoriteho0_
    WHERE
        favoriteho0_.person_id=:p0;
    :p0 = 1 [Type: Int32 (0)]
Count: 9


The Count(x => x.IsActive) happens on application-side only, instead of being run on database.


To fix the slow performance, we must directly query the database. Pass an IQueryable to Person:


public virtual int GetFavoriteActiveHobbiesCountFromQueryable(IQueryable<FavoriteHobby> fh)
{            
    return fh.Count(x => x.Person == this && x.IsActive);            
}

    
// On Main()
var p = session.Load<Person>(1);
var count = p.GetFavoriteActiveHobbiesCountFromQueryable(s.Query<FavoriteHobby>()); 
Console.WriteLine("Count: {0}", count);    


Output:

NHibernate:
    SELECT
        person0_.person_id as person1_0_0_,
        person0_.first_name as first2_0_0_,
        person0_.last_name as last3_0_0_
    FROM
        person person0_
    WHERE
        person0_.person_id=:p0;
    :p0 = 1 [Type: Int32 (0)]
NHibernate:
    select
        cast(count(*) as int4) as col_0_0_
    from
        favorite_hobby favoriteho0_
    where
        favoriteho0_.person_id=:p0
        and favoriteho0_.is_active=TRUE;
    :p0 = 1 [Type: Int32 (0)]
Count: 9


However, you'll notice that even we don't access any of the property of Person, the Person model is still eagerly-loaded from database. NHibernate will eagerly-load the model when we access any of its properties/methods, regardless of being mapped or unmapped.


To really fix that slow performance, move the model's behavior to extension method:

public static class PersonMethodsWithPerformanceConcerns
{
    public static int GetActiveFavoriteHobbies(this Person p, IQueryable<FavoriteHobby> fh)
    {            
        Console.WriteLine("Extension method version");
        return fh.Count(x => x.Person == p && x.IsActive);
    }
}


// On Main()
var p = session.Load<Person>(1);
var count = p.GetActiveFavoriteHobbies(s.Query<FavoriteHobby>()); 
Console.WriteLine("Count: {0}", count);

Output:

Extension method version
NHibernate:
    select
        cast(count(*) as int4) as col_0_0_
    from
        favorite_hobby favoriteho0_
    where
        favoriteho0_.person_id=:p0
        and favoriteho0_.is_active=TRUE;
    :p0 = 1 [Type: Int32 (0)]
Count: 9


That's it, performance must not be compromised on the altar of DDD


Full code: https://github.com/MichaelBuen/TestAggregate


Update 2018-May-20

On NHibernate 5, the collection will not be eager-loaded anymore when adding a condition on collection. It will perform a real database query instead. Prior to 5, this:

public virtual int FavoriteHobbiesCount
{
    get
    {
        // Thanks Extra Lazy! This is on PersonMapping
        //     rel.Lazy(NHibernate.Mapping.ByCode.CollectionLazy.Extra);

        // Hobbies' items will be eagerly-loaded when we add a condition on its Count even we use Extra Lazy
        return this.Hobbies.Count(x => x.IsActive); 
    }
}

will run this query, and perform the Count on application instead.
NHibernate:
    SELECT
        favoriteho0_.person_id as person2_1_,
        favoriteho0_.favorite_hobby_id as favorite1_1_,
        favoriteho0_.favorite_hobby_id as favorite1_1_0_,
        favoriteho0_.person_id as person2_1_0_,
        favoriteho0_.hobby as hobby1_0_,
        favoriteho0_.is_active as is4_1_0_
    FROM
        favorite_hobby favoriteho0_
    WHERE
        favoriteho0_.person_id=:p0;
    :p0 = 1 [Type: Int32 (0)]


With NHibernate 5, just add AsQueryable() on an entity's collection property, and NHibernate will happily run the query from the database instead, even if there's a condition on collection's query.


public virtual int FavoriteHobbiesCount
{
    get
    {
        // Hobbies' items will not be be eagerly-loaded anymore on NHibernate 5 even when we add a condition on its Count.
        return this.Hobbies.AsQueryable().Count(x => x.IsActive); 
    }
}

The resulting query would be like:
NHibernate:
    SELECT
        cast(count(*) as int4) as col_0_0_
    FROM
        favorite_hobby favoriteho0_
    WHERE
        favoriteho0_.person_id=:p0
        and favoriteho0_.is_active=:p1;
    :p0 = 1 [Type: Int32 (0)], :p1 = true [Type: Boolean]


Happy Computing! ツ

Thursday, December 5, 2013

Let the API do the heavy lifting

Instead of concatenating directory and filename manually:

string dir = @"C:\Windows";
string filename = "notepad.exe";
Console.WriteLine(dir + @"\" + filename);


Use Path.Combine:

string dir = @"C:\Windows";
string filename = "notepad.exe";
Console.WriteLine(Path.Combine(dir, filename));


Output: C:\Windows\notepad.exe


Aside from it take care of the nuances of platform differences(slash for *nix-based systems (e.g. Mono on Linux), backslash for Windows-based), it also takes care of trailing slash/backslash of the directory


The output of the following is still: C:\Windows\notepad.exe

string dir = @"C:\Windows\";
string filename = "notepad.exe";
Console.WriteLine(Path.Combine(dir, filename));


C# on Unix: http://ideone.com/rTACSJ

C# on Windows: http://dotnetfiddle.net/kxVNW5



Happy Coding! ツ

Saturday, November 30, 2013

Typical NHibernate SessionFactory boilerplate

using DomainMapping.Mappings;

using NHibernate.Cfg;


using System.Linq;

using System.Collections.Generic;



namespace DomainMapping
{
    public static class Mapper
    {


        static NHibernate.ISessionFactory _sessionFactory = Mapper.BuildSessionFactory();


        // call this on production
        public static NHibernate.ISessionFactory SessionFactory
        {
            get { return _sessionFactory; }
        }


        // Call this on unit testing, so we can test caching on each test method independently
        public static NHibernate.ISessionFactory BuildSessionFactory(bool useUnitTest = false)
        {
            var mapper = new NHibernate.Mapping.ByCode.ModelMapper();

            mapper.AddMappings
                (
                    typeof(Mapper).Assembly.GetTypes()
                          .Where( x => x.BaseType.IsGenericType
                                       && x.BaseType.GetGenericTypeDefinition()==typeof(NHibernate.Mapping.ByCode.Conformist.ClassMapping<>)) 
                );

            // Or you can manually add the mappings
            // mapper.AddMappings(new[]
            //    {
            //        typeof(PersonMapping)
            //    });


            var cfg = new NHibernate.Cfg.Configuration();

            // .DatabaseIntegration! Y U EXTENSION METHOD?
            cfg.DataBaseIntegration(c =>
            {
                var cs = System.Configuration.ConfigurationManager.ConnectionStrings["TheJunBetConnection"].ConnectionString;

                // SQL Server
                c.Driver<NHibernate.Driver.SqlClientDriver>();
                c.Dialect<NHibernate.Dialect.MsSql2008Dialect>();
                c.ConnectionString = "Server=localhost;Database=TestTheDatabase;Trusted_Connection=True;MultipleActiveResultSets=True";

                // // PostgreSQL
                // c.Driver<NHibernate.Driver.NpgsqlDriver>();
                // c.Dialect<NHibernate.Dialect.PostgreSQLDialect>();
                // c.ConnectionString = "Server=localhost; Database=test_the_database; User=postgres; password=opensesame";                


                if (useUnitTest)
                {
                    c.LogSqlInConsole = true;
                    c.LogFormattedSql = true;
                }
            });

            var domainMapping = mapper.CompileMappingForAllExplicitlyAddedEntities();


            // // AsString is an extension method from NHibernate.Mapping.ByCode:
            // string mappingXml = domainMapping.AsString();

            // // Life without Resharper. 
            // string mappingXml = NHibernate.Mapping.ByCode.MappingsExtensions.AsString(domainMapping);

            // System.Console.WriteLine(mappingXml);


            cfg.AddMapping(domainMapping);


            // http://www.ienablemuch.com/2013/06/multilingual-and-caching-on-nhibernate.html
            //var filterDef = new NHibernate.Engine.FilterDefinition("lf", /*default condition*/ null,
            //                                                       new Dictionary<string, NHibernate.Type.IType>
            //                                                           {
            //                                                               { "LanguageCultureCode", NHibernate.NHibernateUtil.String}
            //                                                           }, useManyToOne: false);
            //cfg.AddFilterDefinition(filterDef);



            cfg.Cache(x =>
            {
                // SysCache is not stable on unit testing
                if (!useUnitTest)
                {
                    x.Provider<NHibernate.Caches.SysCache.SysCacheProvider>();

                    // I don't know why SysCacheProvider is not stable on simultaneous unit testing, 
                    // might be SysCacheProvider is just giving one session factory, so simultaneous test see each other caches
                    // This solution doesn't work: http://stackoverflow.com/questions/700043/mstest-executing-all-my-tests-simultaneously-breaks-tests-what-to-do                    
                }
                else
                {
                    // This is more stable in unit testing
                    x.Provider<NHibernate.Cache.HashtableCacheProvider>();
                }





                // http://stackoverflow.com/questions/2365234/how-does-query-caching-improves-performance-in-nhibernate

                // Need to be explicitly turned on so the .Cacheable directive on Linq will work:                    
                x.UseQueryCache = true;
            });


            if (useUnitTest)
                cfg.SetInterceptor(new NHSqlInterceptor());

            var sf = cfg.BuildSessionFactory();




            //using (var file = new System.IO.FileStream(@"c:\x\ddl.txt",
            //       System.IO.FileMode.Create,
            //       System.IO.FileAccess.ReadWrite))
            //using (var sw = new System.IO.StreamWriter(file))
            //{
            //    new NHibernate.Tool.hbm2ddl.SchemaUpdate(cfg)
            //        .Execute(scriptAction: sw.Write, doUpdate: false);
            //}

            return sf;
        }

        class NHSqlInterceptor : NHibernate.EmptyInterceptor
        {
            // http://stackoverflow.com/questions/2134565/how-to-configure-fluent-nhibernate-to-output-queries-to-trace-or-debug-instead-o
            public override NHibernate.SqlCommand.SqlString OnPrepareStatement(NHibernate.SqlCommand.SqlString sql)
            {

                Mapper.NHibernateSQL = sql.ToString();
                return sql;
            }

        }

        public static string NHibernateSQL { get; set; }


    } // Mapper



}



Happy Coding!

Friday, November 22, 2013

Add a condition on joined entities in NHibernate

This query is working, but the cross-cutting concern, e.g., localization, is cluttering the main Where clause

var x = 
    (from q in 
        from ps in session.Query<GetProductSold>()
        join pl in session.Query<ProductLanguage>() on ps.ProductId equals pl.ProductLanguageCompositeKey.ProductId
        select new { ps, pl }
    where q.pl.ProductLanguageCompositeKey.LanguageCode == languageCode
    select q).Cacheable();


We can convert that to following, however NHibernate doesn't support this yet:
var x = 
    (from q in 
         from ps in session.Query<GetProductSold>()
         join pl in session.Query<ProductLanguage>().Where(plc => plc.ProductLanguageCompositeKey.LanguageCode == languageCode) on ps.ProductId equals pl.ProductLanguageCompositeKey.ProductId
         select new { ps, pl }                         
     select q).Cacheable();


We have to do it this way:
var x =
    (from q in
         from ps in session.Query<GetProductSold>()
         join pl in session.Query<ProductLanguage>() on new { ps.ProductId, LanguageCode = languageCode } equals new { pl.ProductLanguageCompositeKey.ProductId, pl.ProductLanguageCompositeKey.LanguageCode }
         select new { ps, pl }                         
     select q).Cacheable();

Saturday, November 2, 2013

Managed code success stories

If you think managed code is slow, think twice

C++ :
million lines of code
40 developers
15 years
output: 10k trades per second

Managed code :
6k lines of code*
1 developer
3 months
output: 200k trades per second

http://stackoverflow.com/questions/4257659/c-sharp-versus-c-performance/19505716#19505716

* Wow! 6k lines of F# code (nope, I'm not cursing lol) only, my thesis is just almost that size. The realization that a 6k lines of code could deliver world-class solution to a world-class problem given using the right tool, just humbled the C++ programmer in me; likewise to my C# skills too


Thoughts on managed language vs C++ on the above success of managed code against C++ :

https://twitter.com/jonharrop/status/392415184045633536


Of course there are more success stories on C++ too :

http://www.computerworlduk.com/news/open-source/3260727/london-stock-exchange-in-historic-linux-go-live/

http://www.computerworlduk.com/in-depth/open-source/3246835/london-stock-exchange-linux-record-breaking-system-faces-new-challengers/


Happy Computing! ツ

Monday, October 7, 2013

Thanks SQL Server! My Bonus is Now Higher* than My Salary

Given these two functions:

create function EmployeeMoolahs()
returns table
as
return
    select
        e.EmployeePersonID, e.Bonus, e.Salary
    from Employee e
go
 
create function EmployeeSummary()
returns table
as
return
    select p.FirstName, p.LastName, em.*
    from Person p
    join EmployeeMoolahs() em on p.PersonID = em.EmployeePersonId
go

The output of this query:

select top 5 * from EmployeeSummary() es order by es.EmployeePersonId




Seeing that it's best to present the salary information first before the bonus, we move the Salary information right after the EmployeePersonID.

alter function EmployeeMoolahs()
returns table
as
return
    select
        e.EmployeePersonID, e.Salary, e.Bonus
    from Employee e
go

Then you go back home, didn't even bother to check the output since it's a trivial fix anyway. On the 15th of the month, you receive your salary and you check your pay slip. You are so delighted that they make your bonus equal to your salary, then your jaw dropped, you don't receive any salary. Upon investigating the cause of that anomaly, you found out the erring query:

select top 5 * from EmployeeSummary() es order by es.EmployeePersonId




It yields a wrong output!


Whoa, holy guacamole! Why I don't have any salary? As if, the Bonus and Salary swapped contents. Yeah right, they really are. The presentation concern you tried to fix, by placing Salary right after EmployeePersonId, has an error. The salary is now slotted to bonus field, and the bonus is now slotted salary field.


Checking if your eyes is fooling you, you tried to explicitly select all the columns:

select top 5 es.FirstName, es.LastName, es.EmployeePersonId, es.Salary, es.Bonus from EmployeeSummary() es order by es.EmployeePersonId

But still, you still don't have any salary.




You also check the function you modified if it returns correct information:

select top 5 * from EmployeeMoolahs() em order by em.EmployeePersonId

However, it has correct output:




Then you are thinking what causes the error on EmployeeSummary function? What makes the query wild? Why it's not returning proper information? Hmm.. wild? Maybe it's the wild…card, there's something wild on the wildcard! Got to remove the wildcard on EmployeeSummary()

create function EmployeeSummary()
returns table
as
return
    select p.FirstName, p.LastName, em.*
    from Person p
    join EmployeeMoolahs() em on p.PersonID = em.EmployeePersonId
go

Change that to explicit form:

alter function EmployeeSummary()
returns table
as
return
    select p.FirstName, p.LastName, em.EmployeePersonId, em.Salary, em.Bonus
    from Person p
    join EmployeeMoolahs() em on p.PersonID = em.EmployeePersonId
go

Then you check the result of this query:

select top 5 * from EmployeeSummary() es order by es.EmployeePersonId

It's correct now!



Trying to check the fix if it is just a fluke, we swap again the order of Bonus and Salary of the source function to their original order:

alter function EmployeeMoolahs()
returns table
as
return
    select
        e.EmployeePersonID, e.Bonus, e.Salary
    from Employee e
go

Re-check again:

select top 5 * from EmployeeSummary() es order by es.EmployeePersonId

Below is the result, the columns' information are still correct! The order of fields won't change as we are selecting from EmployeeSummary which has explicitly selected columns. What is important, the information is always correct, the ordering of columns should be done on EmployeeSummary, not on EmployeeMoolahs.





Let's try to investigate the root cause of the error. If you got wildcard in the query of your function, SQL Server is trying to build a metadata for returned fields of that function, as if SQL Server burns a metadata for the returned fields of the query of your function, albeit invisibly.

create function EmployeeMoolahs()
returns table
-- burn these metadata...
(
    Bonus money,
    Salary money
)
-- ...burn
as
return
    select
        e.EmployeePersonID, e.Bonus, e.Salary
    from Employee e
go
 
create function EmployeeSummary()
returns table
-- burn metadata
(
    FirstName nvarchar(max),
    LastName nvarchar(max),
          -- burn these metadata based on the metadata obtained from EmployeeMoolah's wildcard
    EmployeePersonId int,   -- 1
    Bonus money,            -- 2
    Salary money            -- 3
)
-- ...burn
as
return
    select p.FirstName, p.LastName, em.*
     
          -- At the time this EmployeeSummary function is created, this is the order of fields on
         -- EmployeeMoolah()'s wildcard:
     
          -- 1. EmployeePersonId
          -- 2. Bonus
          -- 3. Salary
     
     
    from Person p
    join EmployeeMoolahs() em on p.PersonID = em.EmployeePersonId
go

In fact that is how other RDBMS does it, i.e. you need to explicitly return the metadata of your returned fields. Wildcard problem can be avoided by application developers using other RDBMS as they can see the wildcard could one day not be slotted properly to the correct column metadata, so they tend to be explicit on selecting each columns. They tend to avoid wildcards on functions. This kind of problem is virtually non-existent to them.


So in SQL Server, if we change the order of the fields on the source function (EmployeeMoolahs) of the dependent function(EmployeeSummary), e.g. swap the position of Bonus and Salary, Salary comes before Bonus on EmployeeMoolahs, the EmployeeSummary's metadata will not be updated accordingly, it will not be automatically updated. That is, EmployeeSummary metadata's position will still be the same, i.e. Bonus still comes before Salary. Hence when we issue a query on wildcard-bearing EmployeeSummary function, the wildcard will fall to the wrong slots.


EmployeeSummary() 's Metadata is not automatically updated     EmployeePersonId    Bonus    Salary
EmployeeMoolah()'s expanded wildcard from its swapped columns  EmployeePersonId    Salary   Bonus

So that's how the wildcard can cause problems on SQL Server. Trying to imagine what fields are good to slot on the salary :P


Think of the damages that can be caused by wrong information slotted by the wildcard. IsHired field goes to IsAdmin, OriginalPrice goes to CurrentPrice, PassingScore goes to Score, etc.


* It can happen, try to copy the problem above :P


Sample data:
create table Person
(
 PersonId int identity(1,1) primary key,
 FirstName nvarchar(max) not null,
 LastName nvarchar(max) not null,
);

create table Employee
(
 EmployeePersonId int not null primary key references Person(PersonId),
 Salary money not null,
 Bonus money not null default(0)
);

insert into Person(FirstName, LastName) values
('John', 'Lennon'),
('Paul', 'McCartney'),
('George', 'Harrison'),
('Ringo', 'Starr');


insert into Employee(EmployeePersonId, Salary, Bonus) values
(1, 60000, 20000),
(2, 60000, 0),
(3, 40000, 0),
(4, 40000, 0);




Happy Computing! ツ

Saturday, October 5, 2013

Variable-based Table-Valued Function vs Inline Table-Valued Function

Variable-Based Table-Valued Function, a.k.a. Multistatement Table-Valued Function are performance killers, for a good explanation, see the answer here: http://stackoverflow.com/questions/2554333/multi-statement-table-valued-function-vs-inline-table-valued-function

Here's a sample variable-based table-valued Function:

create function getSamplePersonOrgsViaVariableBasedTableValuedFunction()
returns @t table
(
    BusinessEntityID int primary key clustered,
    OrgUnits nvarchar(max)
)
as
begin
    insert into @t(BusinessEntityID, OrgUnits)
    select p.BusinessEntityID, eorg.orgUnits
    from Person.Person p
    cross apply
    (
        select 'blah'
    ) eorg(orgUnits);
     
    return;
     
end;
go


Here's a sample inline table-valued function:

create function getSamplePersonOrgsViaInlineTableValuedFunction() returns table
as
return
    select p.BusinessEntityID, eorg.orgUnits
    from Person.Person p
    cross apply
    (
        select 'blah'
    ) eorg(orgUnits)
go


For 50 invocations of this sample query on inline table-valued function, this took 1 second only

SELECT p.BusinessEntityID, p.FirstName, p.LastName, gsp.orgUnits
FROM Person.Person p
join getSamplePersonOrgsViaInlineTableValuedFunction() gsp on p.BusinessEntityID = gsp.BusinessEntityID
where p.BusinessEntityID < 9
go 50


Contrast that efficiency to variable-based table-valued function, variable-based table-valued function took 9 seconds:

SELECT p.BusinessEntityID, p.FirstName, p.LastName, gsp.orgUnits
FROM Person.Person p
join getSamplePersonOrgsViaVariableBasedTableValuedFunction() gsp on p.BusinessEntityID = gsp.BusinessEntityID
where p.BusinessEntityID < 9
go 50

Eager loading is the root cause of the performance problem for variable-based table-valued function, i.e. even we only need 8 rows on source table (p.BusinessEntityID < 9), when the source table is joined to variable-based table-valued function, the source table has to wait first for the result (19,972 rows) of the variable-based table-valued function before it is finally being joined to.


Inline table-valued function is very efficient and smart, it treats the function like a view, i.e., the query of the inline table-valued function is expanded to an actual tables when being joined to another table, behaves exactly like the joining of a table to a table-deriving query / view / CTE. Hence when the inline table-valued function is joined to a source table, when you fetch 8 rows only on source table (p.BusinessEntityID < 9), the query will also fetch 8 rows only on inline-table-valued function too.


On variable-based table-valued function, the result of the function is the one being expanded then put to another table bucket (variable table), hence causing performance problem, so when for example we have 8 rows then we join it to a variable-based table-valued function, we are joining the 8 rows to eagerly-loaded 19,972 rows.


To illustrate the efficiency of inline table-valued function, let's cause a divide by zero when the record encounters Person ID number 9.

alter function getSamplePersonOrgsViaInlineTableValuedFunction() returns table
as
return
    select p.BusinessEntityId, eorg.orgUnits
    from dbo.Person.Person p
    cross apply
    (
        select 'blah ' + convert(varchar,case when p.BusinessEntityId = 9 then 1 / 0 else 7 end)
    ) eorg(orgUnits)
go
 
alter function getSamplePersonOrgsViaVariableBasedTableValuedFunction()
returns @t table
(
    BusinessEntityId int primary key clustered,
    OrgUnits nvarchar(max)
)
as
begin
    insert into @t(BusinessEntityId, OrgUnits)
    select p.BusinessEntityId, eorg.orgUnits
    from dbo.Person.Person p
    cross apply
    (
        select 'blah ' + convert(varchar,case when p.BusinessEntityId = 9 then 1 / 0 else 7 end)
    ) eorg(orgUnits);
     
    return;
     
end;
go


This query will not cause divide-by-zero error on inline table-valued function, a proof that the function doesn't fetch organization units after BusinessEntityID number 8

SELECT p.BusinessEntityID, p.FirstName, p.LastName, gsp.orgUnits
FROM Person.Person p
join getSamplePersonOrgsViaInlineTableValuedFunction() gsp on p.BusinessEntityID = gsp.BusinessEntityID
where p.BusinessEntityID < 9


This query gets a divide-by-zero error though, a proof that the query fetch all the 19,972 rows of the variable-based table-valued function first before it is being joined to.

SELECT p.BusinessEntityID, p.FirstName, p.LastName, gsp.orgUnits
FROM Person.Person p
join getSamplePersonOrgsViaVariableBasedTableValuedFunction() gsp on p.BusinessEntityID = gsp.BusinessEntityID
where p.BusinessEntityID < 9




Happy Computing! ツ

Wednesday, October 2, 2013

Stored Procedure Name + F12 + Tada!

Repartee is something we think of twenty-four hours too late



Someone made an astute observation that the ap + tab + stored proc name + tab + tada (tada is not included, you have to utter it yourself :p) combo shortcut has a problem on scripting the content, i.e. it strips out the comments on the top of the code.



I should have replied that it's okay to remove those comments, those comments are just a pile of name(s) of the original developer(s) tacked on one after another, too tacky :p Those comments does not in any way helps a new dev on deciphering the intent of the code. Hmm.. but it has a use, it helps others on their witch-hunt on the culprit(s) of erring code heheh



But I would not give an answer that would strip others of their rights to maintain comments on top of the code, any preferences for that matter, cut the drama :P



Without further ado, this is the new shortcut:

exec + stored proc name + F12



RedGate will script back the whole contents of the stored procedure to a new query editor, i.e. RedGate will also include the comments on top of the stored proc.



Other shortcuts:

For view, type: ssf + tab + view name + F12

For function, type: ssf + tab + function name + backspace + F12



An observation, there's no need to put EXEC or SELECT * FROM, just type in the stored proc / view / function name directly, then press F12. Sometimes, RedGate is flaky though, type in EXEC or ssf anyway :-)





Happy Computing! ツ

Thursday, September 26, 2013

Debunking the myth that CTE is just a cursor

Some devs who have fervent belief that CTE is just a cursor got me worked up


Hearing that broken theory from them, I decided to create a proof-of-concept that would prove otherwise. As we only have limited keystrokes left in our hands, I decided to practice first at play.typeracer.com, being able to type fast is certainly a win if one is very worked up to quickly prove something by quickly making a proof-of-concept and quickly blogging it afterwards, all needed be done on one sitting. So that's how seriously worked up I am to prove something lol


Heheh scratch that, I already made an analysis and proof an eon ago that CTE is not slow.


A CTE is just an inline view, and in turn is just a table-deriving query, blogged it here: http://www.ienablemuch.com/2013/07/debunking-myth-that-cte-is-slow.html


CTE is not slow, not understanding the intent and how the query works is what makes a query slow, blogged it here: http://www.ienablemuch.com/2012/05/recursive-cte-is-evil-and-cursor-is.html



Please do note that a **recursive** CTE is being done sequentially, hence the "cursor-like" belief of some devs, however, the recursive CTE sequential steps is not being done on cursor. The CTE's loop is made on C++, which makes CTE faster, while a CURSOR's loop is being done via T-SQL, T-SQL loop is slow. C#'s loop is even an order of magnitude faster than T-SQL loop. Push the loop on C++, i.e., push the loop on CTEs. For non-recursive CTE, rest assured, the query still operates in set-based manner, not in cursor-like manner.



Time to stop spewing the nonsense that CTE is just a cursor.



Happy Coding!

Monday, September 16, 2013

Using Partial Index On SQL Server To Enforce Uniqueness On Denormalized Design

On my last post, using partial index (aka filtered index) I showed you how we can still enforce unique constraint even when soft deletions is the chosen mechanism for deleting records.


We will put again unique filtered index to good use. Given this table:

create table PeriodReviewers
(
     PeriodId int not null references Period(PeriodId),
     RevieweePersonId int not null references Person(PersonId),
     ReviewerPersonId int not null references Person(PersonId),

     IsPrimaryReviewer bit not null default(0),

     constraint pk_PeriodReviewers primary key(PeriodId, RevieweePersonId, ReviewerPersonId)
);


These are all valid data:

PeriodId    RevieweePersonId    ReviewerPersonId    IsPrimaryReviewer
1           Marcus              Buddy               1
1           Marcus              Raymund             0
1           Marcus              Ely                 0
1           Buddy               Raymund             1
1           Buddy               Ely                 0
1           Raymund             Ely                 1

Invalid data such as the following can be blocked:

PeriodId    RevieweePersonId    ReviewerPersonId    IsPrimaryReviewer
1           Marcus              Buddy               1
1           Marcus              Raymund             0   
1           Marcus              Raymund             0   <-- this is blocked, two Raymunds
1           Marcus              Ely                 0
1           Buddy               Raymund             1
1           Buddy               Ely                 0
1           Raymund             Ely                 1

However the above unique constraint can't defend the system on invalid data such as the following:

PeriodId  RevieweePersonId    ReviewerPersonId    IsPrimaryReviewer
1           Marcus            Buddy               1
1           Marcus            Raymund             1 <-- should error,only one primary reviewer to each person
1           Marcus            Ely                 0
1           Buddy             Raymund             1
1           Buddy             Ely                 0
1           Raymund           Ely                 1

There should be only one primary reviewer for each reviewee, but since our unique constraint is applied on these three fields: PeriodId + RevieweePersonId + ReviewerPersonId, invalid data such as above could creep in to our system.


This is the ideal solution:

create table PeriodReviewers
(
     PeriodId int not null references Period(PeriodId),
     RevieweePersonId int not null references Person(PersonId),
     ReviewerPersonId int not null references Person(PersonId),

     constraint pk_PeriodReviewers primary key(PeriodId, RevieweePersonId, ReviewerPersonId)
);
 
create table PeriodPrimaryReviewers
(
     PeriodId int not null references Period(PeriodId),
     RevieweePersonId int not null references Person(PersonId),

     ReviewerPersonId int not null references Person(PersonId),

     constraint pk_PeriodReviewers primary key(PeriodId, RevieweePersonId)
);

That's a fully normalized design. The advantage of normalization is we can save space on redundant data. On our business case, there can be only one primary reviewer for each person on a given period, so let's say we have 20 reviewers for each person on a given period, those other 19 reviewers, who are not primary reviewers, whose IsPrimaryReviewer are all set to false, those 19 IsPrimary fields are just wasting space, IsPrimaryReviewer field should be removed from the table. To normalize, just create a table that maintains each person's primary reviewer on a given period, as illustrated on the DDL above.


Normalize until it hurts, denormalize until it works -- Jeff Atwood

However fully normalizing the database might not be the norm nor an option, hence denormalized schema might be present in your system like the first one shown in this article. As we all know, denormalized design can be faster for queries. So for example, we wanted to list all the person's reviewer and indicate if that reviewer is a primary reviewer of the person. It's just this simple with denormalized table:


select
    
    PeriodId, 
    RevieweePersonId, 
    ReviewerPersonId, 
    IsPrimaryReviewer
    
from PeriodReviewers


Now, to do that on fully normalized tables:

select 

    emp.PeriodId, 
    emp.RevieweePersonId, 
    emp.ReviewerPersonId, 
    emp.IsPrimaryReviewer = 
        case when pri.ReviewerPersonId is not null then
            convert(bit, 1)
        else
            convert(bit, 0)
        end
        
from PeriodReviewers emp
left join PeriodPrimaryReviewers pri 
on  emp.PeriodId = pri.PeriodId
    and emp.RevieweePersonId = pri.RevieweePersonId
    and emp.ReviewerPersonId = pri.ReviewerPersonId


Using very normalized design, queries could become slower. Denormalized table makes for faster queries, as there's no need to join tables. However, denormalized design can cause data integrity problem. On denormalized design, concurrent update or not properly implemented application could let bad data slip in. So in our denormalized design, this bad data could slip in:


PeriodId    RevieweePersonId    ReviewerPersonId    IsPrimaryReviewer
1           Marcus              Buddy               1
1           Marcus              Raymund             1
1           Marcus              Ely                 0
1           Buddy               Raymund             1
1           Buddy               Ely                 0
1           Raymund             Ely                 1


As we can see, Marcus has two primary reviewers now. This kind of errors doesn't happen on normalized database design. If the denormalized design is already in place, it could be far more costly to redesign the database, and it's not guaranteed that fully normalizing the design won't affect query's performance. But how can we prevent bad data such as above from happening if we maintain the denormalized design?

Enter partial index, er.. filtered index. With SQL Server's filtered index (available since version 2008), we can create unique filtered index to prevent two or more primary reviewers for each person.


To note, the following is still allowed even we add unique filtered index, which is a valid business case.

PeriodId    RevieweePersonId    ReviewerPersonId    IsPrimaryReviewer
1           Marcus              Buddy               1
1           Marcus              Ely                 0
1           Marcus              Robin               0


However, upon further adding a reviewer for the same person on the same period, whose IsPrimaryReviewer field value is also set to true, that should be blocked by the database.

PeriodId    RevieweePersonId    ReviewerPersonId    IsPrimaryReviewer
1           Marcus              Buddy               1
1           Marcus              Ely                 0
1           Marcus              Robin               0
1           Marcus              Raymund             1   <-- adding this fourth row will not be allowed by the database.


To facilitate robust database design for the desired scenario above, we just need to add a unique index on those primary reviewers only, hence the filtered index nomenclature. In most RDBMSes this is called partial index. To wit, this will be the table definition:

create table PeriodReviewers
(
     PeriodId int not null references Period(PeriodId),
     RevieweePersonId int not null references Person(PersonId),

     ReviewerPersonId int not null references Person(PersonId),

     constraint pk_PeriodReviewers primary key(PeriodId, RevieweePersonId, ReviewerPersonId)
);

 
create unique index ix_PeriodReviewers_PrimaryReviewer
on PeriodReviewers(PeriodId, RevieweePersonId)
where IsPrimaryReviewer = 1;


We just need to create a unique index on Period and RevieweePersonId for primary reviewers only. When we try to add another primary reviewer for the same person on the same period, it will be stopped by the database, no bad data could creep in. That's how easy it is to prevent bad data on SQL Server. Validating business logic should be done on application layer, but it should also be done on the database too



Happy Computing! ツ

Monday, August 26, 2013

Linq's Composability and Reusability

Nothing can beat the composability and reusability of Linq, it enables one-stop-shop coding.


Case in point, getting a page from a filtered list and getting the total count of the filtered list is just a walk in the park with Linq, they can all be done on one spot.


using (var session = TheMapper.GetSessionFactory().OpenSession())
using (var tx = session.BeginTransaction())
{
    var products = session.Query<Product>();

    if (filter.ProductName != null) 
        products = products.Where(x => x.Name.Contains(filter.ProductName));                

    if (filter.ProductModelName != null)
        products = products.Where(x => x.ProductModel.Name.Contains(filter.ProductModelName));


    var result = new ProductResultDto();                
    
    result.Products =
            products
            .OrderBy(x => x.Name)
            .Take(filter.PageSize).Skip((filter.PageNumber - 1) * filter.PageSize)
            .Select(x => new ProductDto { ProductName = x.Name, ProductModelName = x.ProductModel.Name })
            .ToList();

    result.Count = products.Count();

    return result;                
}//using


How it is done when writing queries directly on database instead:

create procedure GetList(
    @ProductName nvarchar(400),
    @ProductModelName nvarchar(400),
    @PageNumber int,
    @PageSize int
)
as
begin


    declare @rn int = (@PageNumber-1) * @PageSize;

    with a as
    (
        select 
            RN = row_number() over(order by p.Name),
            ProductName = p.Name, ProductModelName = m.Name
        from Production.Product p
        left join Production.ProductModel m on p.ProductModelID = m.ProductModelID
        where 
            (
                @ProductName is null 
                or p.Name like '%' + @ProductName + '%'
            )
            and
            (
                @ProductModelName is null 
                or m.Name like '%' + @ProductModelName + '%'
            )
    )
    select 
        top (@PageSize)
        a.ProductName, a.ProductModelName 
    from a
    where a.rn > @rn;


    select count(*) as cnt
    from Production.Product p
    left join Production.ProductModel m on p.ProductModelID = m.ProductModelID
    where 
        (
            @ProductName is null 
            or p.Name like '%' + @ProductName + '%'
        )
        and
        (
            @ProductModelName is null 
            or m.Name like '%' + @ProductModelName + '%'
        );


end;


The above code is not DRY, you can see the filter being repeated in two places. We can avoid wetting the query by refactoring it:


create function GetList_Func(
    @ProductName nvarchar(400),
    @ProductModelName nvarchar(400)
) returns table
as
return
    select 
        RN = row_number() over(order by p.Name),
        ProductName = p.Name, ProductModelName = m.Name
    from Production.Product p
    left join Production.ProductModel m on p.ProductModelID = m.ProductModelID
    where 
        (
            @ProductName is null 
            or p.Name like '%' + @ProductName + '%'
        )
        and
        (
            @ProductModelName is null 
            or m.Name like '%' + @ProductModelName + '%'
        );
go



create procedure GetList_Refactored(
    @ProductName nvarchar(400),
    @ProductModelName nvarchar(400),
    @PageNumber int,
    @PageSize int
)
as
begin

    declare @rn int = (@PageNumber-1) * @PageSize;
    
    select 
        top (@PageSize)
        a.ProductName, a.ProductModelName 
    from GetList_Func(@ProductName, @ProductModelName) a
    where a.rn > @rn;

    select count(*) as cnt
    from GetList_Func(@ProductName, @ProductModelName);

end;

go

However, it's not one-stop-shop anymore. The query is now being defined in two places.



Happy Computing! ツ


One-stop-shop code:
namespace OrmFtw
{

    using System;
    using System.Collections.Generic;
    using System.Linq;

    using NHibernate.Linq;

    using OrmFtw.Models;
    using OrmFtw.Mapper;


    class Program
    {
        static void Main(string[] args)
        {
            // var result = GetList(new FilterRequest { ProductName = "Black", PageSize = 10, PageNumber = 2 });
            var result = GetList(new FilterRequest { ProductModelName = "Frame", PageSize = 10, PageNumber = 2 });

            Console.WriteLine("Total: {0}", result.Count);
            Console.WriteLine("Second Page: ");
            foreach (var prod in result.Products)
            {
                Console.WriteLine("Product: {0}", prod.ProductName);
                Console.WriteLine("Model: {0}", prod.ProductModelName);
            }

            Console.ReadKey();
        }

        private static ProductResultDto GetList(FilterRequest filter)
        {
            using (var session = TheMapper.GetSessionFactory().OpenSession())
            using (var tx = session.BeginTransaction())
            {
                var products = session.Query<Product>();

                if (filter.ProductName != null)
                    products = products.Where(x => x.Name.Contains(filter.ProductName));

                if (filter.ProductModelName != null)
                    products = products.Where(x => x.ProductModel.Name.Contains(filter.ProductModelName));


                var result = new ProductResultDto();

                result.Products =
                        products
                        .OrderBy(x => x.Name)
                        .Take(filter.PageSize).Skip((filter.PageNumber - 1) * filter.PageSize)
                        .Select(x => new ProductDto { ProductName = x.Name, ProductModelName = x.ProductModel.Name })
                        .ToList();

                result.Count = products.Count();

                return result;
            }//using

        }//GetList()
    }

    public class FilterRequest
    {
        public string ProductName { get; set; }
        public string ProductModelName { get; set; }

        public int PageNumber { get; set; }
        public int PageSize { get; set; }
    }

    public class ProductResultDto
    {
        public int Count { get; set; }

        public IList<ProductDto> Products { get; set; }
    }

    public class ProductDto
    {
        public string ProductName { get; set; }
        public string ProductModelName { get; set; }
    }

}



namespace OrmFtw.Models
{
    class Product
    {
        public virtual int ProductID { get; set; }
        public virtual string Name { get; set; }
        public virtual string ProductNumber { get; set; }

        public virtual ProductModel ProductModel { get; set; }
    }
}


namespace OrmFtw.Models
{

    public class ProductModel
    {
        public virtual int ProductModelID { get; set; }
        public virtual string Name { get; set; }
    }
}

namespace OrmFtw.ModelMappings
{
    using NHibernate.Mapping.ByCode.Conformist;
    using OrmFtw.Models;

    class ProductMapping : ClassMapping<Product>
    {
        public ProductMapping()
        {
            Table("Production.Product");
            Id(x => x.ProductID);
            Property(x => x.Name);
            Property(x => x.ProductNumber);


            ManyToOne(x => x.ProductModel, x => x.Column("ProductModelID"));

        }
    }
}


namespace OrmFtw.ModelMappings
{

    using NHibernate.Mapping.ByCode.Conformist;
    using OrmFtw.Models;

    class ProductModelMapping : ClassMapping<ProductModel>
    {
        public ProductModelMapping()
        {
            Table("Production.ProductModel");
            Id(x => x.ProductModelID);
            Property(x => x.Name);
        }
    }
}



namespace OrmFtw.Mapper
{
    using NHibernate;
    using NHibernate.Cfg;
    using NHibernate.Mapping.ByCode;
    using NHibernate.Cfg.MappingSchema;

    using OrmFtw.ModelMappings;

    public class TheMapper
    {
        static ISessionFactory _sessionFactory;
        public static ISessionFactory GetSessionFactory()
        {
            if (_sessionFactory != null)
                return _sessionFactory;

            var cfg = new Configuration();
            var mapper = new ModelMapper();

            mapper.AddMappings(
                new[] {
                    // Entities
                    typeof(ProductMapping), 
                    typeof(ProductModelMapping)
                });



            cfg.DataBaseIntegration(c =>
            {
                c.Driver<NHibernate.Driver.Sql2008ClientDriver>();
                c.Dialect<NHibernate.Dialect.MsSql2012Dialect>();
                c.ConnectionString = "Server=localhost; Database=AdventureWorks2012; Trusted_Connection=true;";

                c.LogFormattedSql = true;
                c.LogSqlInConsole = true;
            });


            HbmMapping domainMapping = mapper.CompileMappingForAllExplicitlyAddedEntities();

            cfg.AddMapping(domainMapping);

            _sessionFactory = cfg.BuildSessionFactory();

            return _sessionFactory;
        }


    }

}

Sunday, August 25, 2013

Safe Navigation via Linq

AdventureWorks product's model is nullable. Null exception will be raised when we navigate the Name of the ProductModel from the Product.

var products = session.Query<Product>().Fetch(x => x.ProductModel);

foreach (var prod in products)
{
        Console.WriteLine("Product: {0}", prod.Name);
   
        // this will raise a null exception when Production.Product.ProductModelID is null
        Console.WriteLine("Model: {0}", prod.ProductModel.Name); 
}


Aside from using null-safe navigation extension method...

var products = session.Query<Product>().Fetch(x => x.ProductModel);

foreach (var prod in products)
{
        Console.WriteLine("Product: {0}", prod.Name);
        Console.WriteLine("Model: {0}", prod.ProductModel.NullSafe(x => x.Name));
}

...we can just flatten the domain models from Linq's projection, the database won't balk on navigating nullable product model from the product, then the ORM can happily map the null value to the property.

var products = session.Query<Product>()
                    // .Fetch(x => x.ProductModel) // This is not needed anymore, as we efficiently select only the needed columns on our projection
                    .Select(x => 
                            new { 
                                 ProductName = x.Name, 
                                 ProductModelName = x.ProductModel.Name // database don't have problem projecting null
                            });

foreach (var prod in products)
{
        Console.WriteLine("Product: {0}", prod.ProductName);
        Console.WriteLine("Model: {0}", prod.ProductModelName);
}

We can't do that trick on in-memory objects, for those scenario just use safe navigation extension method.



Happy Coding! ツ

Friday, August 23, 2013

Microsoft === Open Source

In my dream world, Microsoft is truly open source. Some years ago, you won’t see Microsoft and Open Source on the same sentence. But times they are a-changin’:


Microsoft recently made a change to their OSS page which now lists alternative web frameworks on the .Net framework which is great and thanks to Scott Hanselman for doing this.

And here are the reasons why:



But it looks like Microsoft is having a hard time breaking the old habits: http://haacked.com/archive/2013/06/24/platform-limitations-harm-net.aspx


Just a digression on Phil Haack’s post, look at the acronym naming convention for three letters or more, Microsoft is pretty much consistent on it. Hence Microsoft’s Base Class Libraries acronym is named as Bcl, i.e.:

Microsoft.Bcl.Compression
Microsoft.Bcl.Immutable

…, the acronyms are not named in an obnoxious manner:

Microsoft.BCL.Compression
Microsoft.BCL.Immutable



You can also notice this on the latest framework for our world domination, ASP.NET MVC. Its namespace in the framework is not named as System.Web.MVC, it is named as System.Web.Mvc.



Happy Computing! ツ

ServiceStack: A Fast Service Framework

If you are planning for world domination, your service framework must be able to serve requests fast, obscenely fast. Here's a benchmark for ServiceStack, WebAPI, Nancy, WCF:






ServiceStack is very fast. WebAPI is more than 3x slower than ServiceStack



The only problem with ServiceStack and NHibernate for that matter, though both are written in Microsoft language, both of them are not made by Microsoft



Downloaded the code from: https://github.com/ServiceStack/ServiceStack/wiki/Real-world-performance


Just add the corresponding WCF benchmarking code, it's trivial to add


Happy Computing! ツ

Saturday, August 10, 2013

NOT IN is faster than LEFT JOIN IS NULL combo

The best thing about pickup lines is in spite of its blatant approach, their intent is very clear. You don't need to take a second guess, they can spare you the trouble of being friendzoned as they will tell you right away if they like you are not, or at least you can feel it.


If you will try to date a DBA, perhaps you can use this pickup line:


NOT IN is faster than LEFT JOIN + IS NULL combo

Why?

Because 135 millisecond vs 2,194 millisecond


The best wingman for dating DBAs is http://explainextended.com/



NOT IN: 135 milliseconds only:

/*------------------------
 
SELECT  l.id, l.value
FROM    [20090915_anti].t_left l
WHERE   l.value NOT IN
        (
        SELECT  value
        FROM    [20090915_anti].t_right r
        )
           
------------------------*/
SQL Server parse and compile time:
   CPU time = 6 ms, elapsed time = 6 ms.
 
(10 row(s) affected)
Table 't_left'. Scan count 1, logical reads 499, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_right'. Scan count 1, logical reads 3112, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 
SQL Server Execution Times:
   CPU time = 125 ms,  elapsed time = 128 ms.





LEFT JOIN IS NULL: 2,194 milliseconds:

/*------------------------
SELECT  l.id, l.value
FROM    [20090915_anti].t_left l
LEFT JOIN
        [20090915_anti].t_right r
ON      r.value = l.value
WHERE   r.value IS NULL 
------------------------*/
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 10 ms.
 
(10 row(s) affected)
Table 't_left'. Scan count 9, logical reads 1984, physical reads 27, read-ahead reads 498, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_right'. Scan count 9, logical reads 3166, physical reads 37, read-ahead reads 3106, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 9999, logical reads 2239770, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 
SQL Server Execution Times:
   CPU time = 14646 ms,  elapsed time = 2184 ms.



NOT IN is fast and efficient in CPU and Reads


NOT IN            --  CPU:    128 Reads:     3,619 Duration:   135
LEFT JOIN IS NULL --  CPU: 14,642 Reads: 2,254,967 Duration: 2,194




Even we are using an smart RDBMS (we are using SQL Server though) and it can discern that our LEFT JOIN IS NULL is doing an anti-join(i.e., a NOT IN) instead, hence our RDBMS gives our LEFT JOIN IS NULL the same fast performance as NOT IN, there's no reason for us to use LEFT JOIN IS NULL for anti-joins. Don't obscure the code's NOT IN intent, don't replace NOT IN with LEFT JOIN IS NULL. Use NOT IN when we need to detect if a column's value is NOT IN another table. See how intuitive is that? Do not complicate a simple concept with LEFT JOIN IS NULL combo.

LEFT JOIN IS NULL is too highfalutin-a-programmin', LEFT JOIN IS NULL is not the place where we should showboat our programming prowess on SQL Server or any RDBMS for that matter. I'm seeing a few (no, not many, fortunately our company hires the best devs in town) highfalutin codes not just in our SQL codes but on C# codes too, I'll put them on a blog when the list is comprehensive enough.



Do not confuse motion and progress. A rocking horse keeps moving but does not make any progress. -- Alfred Montapert



SELECT p.*, e.IsTerminated
FROM dbo.Person p
join dbo.Employee e on p.PersonID = e.EmployeePersonId

left join dbo.PersonResponsibility pr on p.PersonId = pr.ReviewerPersonId
-- Hey IS NULL buddy! what ya doin' down far there?

where
    e.AllowLogin = 1

    /*
    other 
    humongous 
    filters 
    here 
    competing 
    for 
    your 
    attention
    */        
 
    -- Hey LEFT JOIN! keeps movin' here buddy, just chillin' and filterin' your dumb join, makin' queries slow
    and pr.ReviewerPersonId is null



With LEFT JOIN and IS NULL, those two star-crossed lovers could be far apart from each other, thus making code analysis a little harder; contrast that to NOT IN which is pretty much a one-stop-shop syntax, everything is in one place, thus making NOT IN easier to analyze if there's even analysis at all, the code's intent is self-describing. And oh, did I mention NOT IN is fast? Ironically, the humble-looking NOT IN approach is fast:


SELECT p.*, e.IsTerminated
FROM dbo.Person p
join dbo.Employee e on p.PersonID = e.EmployeePersonId

where
    e.AllowLogin = 1

    /*
    other 
    humongous 
    filters 
    here 
    competing 
    for 
    your 
    attention
    */         

    -- Yay! no more LEFT JOIN buddy that can ruin the party!

   and p.PersonId not in (select pr.ReviewerPersonId from dbo.PersonResponsibility pr)



Unless we really found a corner case where LEFT JOIN IS NULL combo could be faster than NOT IN, we should just keep using NOT IN, aside from it's faster, it is easier to read. And before someone suggest NOT EXISTS, have I forgot to mention IN / NOT IN is easier to read



Happy Coding! ツ

Wednesday, August 7, 2013

Fail Fast: Why we should avoid staging tables as much as possible

We are not talking about Fail Fast the philosophical one as not everyone are sold with that idea. We will tackle Fail Fast the engineering one



http://martinfowler.com/ieeeSoftware/failFast.pdf:
The most annoying aspect of software development, for me, is debugging. I don't mind the kinds of bugs that yield to a few minutes' inspection. The bugs I hate are the ones that show up only after hours of successful operation, under unusual circumstances, or whose stack traces lead to dead ends. Fortunately, there’s a simple technique that will dramatically reduce the number of these bugs in your software. It won't reduce the overall number of bugs, at least not at first, but it'll make most defects much easier to find. The technique is to build your software to "fail fast."



In pursuit of speed, one will argue that First is better than Single:


var e = session.Query<Employee>().First(x => x.PersonId == personId);


An HR system which is a multi-tenant one and was designed around composite primary keys, but the dev forgot to add a company filter on the query above, then a few days later their client Microsoft is complaining that their system is randomly returning an employee: "Lee Kai-Fu is not working with us anymore, why is he included in our employee list? Please effin' correct your app, Steve Ballmer will throw chair at us, Tim Cook is not our CEO!"



The cause of the error is on the usage of First, which doesn't Fail Fast. Have they used Single, the following will not continue if it returns more than one employee, it will throw an exception, it will fail fast.

var e = session.Query<Employee>().Single(x => x.PersonId == personId);


Another scenario where we should especially strive for Fail Fast is on reports. While we are optimizing one stored proc which has a performance problem due to staging tables approach, we uncovered a subtle bug that doesn't fail fast, the error is not so pronounced with staging tables as compared with straight query.



The following is an oversimplification of the original stored proc we are optimizing, but is simple enough to illustrate the nature of the problem, given the following data:

create table Person
(
    PersonCode char(1) primary key,
    PersonName nvarchar(20) not null
);
 
create table Product
(
    ProductCode char(1) primary key,
    ProductName nvarchar(20) not null
);
 
create table Person_Likes_Product
(
    PersonCode char(1) not null references Person(PersonCode),
    ProductCode char(1) not null references Product(ProductCode),
    Rating int not null,
    constraint ck_Rating check(Rating between 1 and 5)
);
 
insert into Person(PersonCode, PersonName) values
('J','John'), ('P','Paul'), ('G','George'), ('R','Ringo');
 
insert into Product(ProductCode, ProductName) values
('C','CPU'), ('K','Keyboard'), ('M','Mouse'), ('S','Smartphone');
 
insert into Person_Likes_Product(PersonCode, ProductCode, Rating) values
('J', 'C', 5),
('J', 'K', 3),
('P', 'C', 5),
('P', 'K', 4),
('P', 'S', 5),
('G', 'S', 5);

...we should list all products with Paul's rating on them. It should produce the following output:


ProductCode ProductName          Rating
----------- -------------------- -----------
C           CPU                  5
K           Keyboard             4
M           Mouse                NULL
S           Smartphone           5
(4 row(s) affected)
This is the original stored proc:

create procedure PersonLikesStagingTables
(
    @PersonID char(1)
) as
begin
     
    -- staging table
    declare @productRating table
    (
        ProductCode char(1),
        ProductName nvarchar(20),
        Rating int
    );
     
    insert into @productRating(ProductCode, ProductName)
    select p.ProductCode, p.ProductName
    from Product p;
     
     
    update p
        set p.Rating = l.Rating
    from @productRating p
    left join Person_Likes_Product l on p.ProductCode = l.ProductCode;
     
     
    select p.ProductCode, p.ProductName, p.Rating
    from @productRating p
    order by p.ProductName;
     
end;
go
 
exec PersonLikesStagingTables 'P';
The output of that stored proc has an error:

ProductCode ProductName          Rating
----------- -------------------- -----------
C           CPU                  5
K           Keyboard             3
M           Mouse                NULL
S           Smartphone           5
(4 row(s) affected)
Holy Guacamole! Have you spotted the error on the code above? Why Paul's keyboard has a rating of 3 instead of 4? If this a price listing, your company could go bankrupt.



Upon further inspection, Paul's rating was not applied to the staging table:

update p
    set p.Rating = l.Rating
from @productRating p
left join Person_Likes_Product l on p.ProductCode = l.ProductCode;
That should be done like this:

update p
    set p.Rating = l.Rating
from @productRating p
left join Person_Likes_Product l on p.ProductCode = l.ProductCode and l.PersonCode = @PersonCode;
Had the original stored proc was written with no staging tables, that is it is using straight query instead, the error will be more pronounced, failing fast:

create procedure PersonLikesFailFast(@PersonCode char(1)) as
begin
    select
        p.ProductCode,
        p.ProductName,
        l.Rating
    from Product p   
    left join Person_Likes_Product l on p.ProductCode = l.ProductCode
     
end;   
go
 
exec PersonLikesFailFast 'P'
That will have an output of this:
ProductCode ProductName          Rating
----------- -------------------- -----------
C           CPU                  5
C           CPU                  5
K           Keyboard             3
K           Keyboard             4
M           Mouse                NULL
S           Smartphone           5
S           Smartphone           5
(7 row(s) affected)
The QA or user could quickly report to the devs that the person's product rating output has an error. If your system has an error, it should fail it fast, it's easy to correct errors when they are discovered early. Staging tables and First extension method deprives your system of failing fast. The errors done on the system with no fail fast mechanism is hard to debug as they sometimes semi-works, they can even linger producing corrupted data.



Aside from making our reports perform fast, it should also be fast on failing when it has an error.



How about the optimization after the removal of staging tables? After removing 5 insert statements, 13 update statements, 3 table variables then converted them to straight query, it improved from 20 seconds to 11 seconds.



And before we do aggressive optimization, let's pay attention with correctness first:

http://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_are_switching_written_numbers_when_scanning



It's easier to make a correct program fast than to make a fast program correct.





Happy Computing! ツ