Thursday, June 30, 2011

NHibernate foreign key property, solution on model binding impedance mismatch.

There's a little problem with NHibernate if you strictly adhere to its proper domain modelling. You will end up with a class like this (note Line 6):

public class SalesDetail 
{
    public virtual SalesHeader SalesHeader { get; set; }

    public virtual int SalesDetailId { get; set; } // Primary Key
    public virtual Product ProductX { get; set; }
    public virtual int Qty { get; set; }
    public virtual decimal UnitPrice { get; set; }
    public virtual decimal Amount { get; set; }

    public virtual byte[] Version { get; set; }           
}



If some components could allow direct binding to ProductX, there won't be too much problem with clean entity modeling of NHibernate. You can not use the ProductX directly as a DataPropertyName of your DataGridView for example.


There's a solution on that though, we can take a page from Entity Framework. EF forces one to use primitive types on model's foreign keys instead of stating the fact that it is a pointer to other records; for example, this is how your EF model looks like:


public class SalesDetail 
{
    public SalesHeader SalesHeader { get; set; }

    public int SalesDetailId { get; set; }
    public int ProductId { get; set; }
    public int Qty { get; set; }
    public decimal UnitPrice { get; set; }
    public decimal Amount { get; set; }

    public byte[] Version { get; set; }
           
}

See the ProductID there? It is model-binding-ready. However, if you just quick-glance that class, it is not immediately obvious that an entity(ProductId) has a relationship with other entity; for all we know, ProductId could just be a barcode misrepresented as integer type. And notice that reaching the ProductId's other properties(say Category, or StockLevel) is not readily available from that property alone. Fear not however, as we all know that the first thing you shun after using ORM is joining tables; with an ORM, you don't need to join the classes to reach the foreign key's other properties(fields, e.g. Category, StockLevel); when you have a foreign key attribute or necessary mapping for foreign key on a model, EF automatically populates its corresponding object. Like this example:


public class SalesDetail 
{
    public SalesHeader SalesHeader { get; set; }

    public int SalesDetailId { get; set; }
    public int ProductId { get; set; }
    public int Qty { get; set; }
    public decimal UnitPrice { get; set; }
    public decimal Amount { get; set; }

    public byte[] Version { get; set; }
           
   [ForeignKey("ProductId")]
   public virtual Product ProductX { get; set; }
}


That's how we will also tackle NHibernate model-binding-mismatch. But with a twist, we will do it in reverse instead, we will introduce primitive types, so we can directly bind them to UI widgets. An example implementation:

public class SalesDetail 
{
    public virtual SalesHeader SalesHeader { get; set; }

    public virtual int SalesDetailId { get; set; }
    public virtual Product ProductX { get; set; }
    public virtual int Qty { get; set; }
    public virtual decimal UnitPrice { get; set; }
    public virtual decimal Amount { get; set; }

    public virtual byte[] Version { get; set; }            

    [Lookup("ProductId")]
    public virtual int? lookup_ProductX { get; set; }
}



That's very clean compared to EF's approach, NH's approach directly mimics the problem domain as we still use the ProductX as a pointer to Product entity; UI-concerns-wise, we just need to introduce a new property on our model whenever we need something for the widgets to bind upon.


However, there's a problem with NHibernate populating those properties at the same time. It cannot bind a field to two properties. So for that, we have to do this:


After you get the records from NHibernate:
foreach (SalesDetail d in sh.Sales)
{ 
 d.lookup_ProductX.Value = d.ProductX.ProductId;
}

Before you save the records via NHibernate:
foreach (SalesDetail d in sh.Sales)
{ 
 d.ProductX = s.Load<Product>(d.lookup_ProductX.Value);
}



But that easily gets old, we need to make a helper to automatically re-hydrate those properties.


This is our NHibernate assigner helper, it's very short and concise:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using NHibernate;
using System.Reflection;
using System.Collections;

namespace NhLookupAssigner
{
    public static class Helper
    {
        public static void SetupModel(this ISession s, object model)
        {
            Type modelType = model.GetType();

            foreach (PropertyInfo p in modelType.GetProperties())
            {
                LookupAttribute a = p.GetCustomAttributes(false).OfType<LookupAttribute>().SingleOrDefault();

                if (a != null)
                {
                    string lookupName = p.Name.Substring(p.Name.IndexOf('_') + 1);

                    PropertyInfo targetProperty = modelType.GetProperty(lookupName);
                    Type targetPropertyType = targetProperty.PropertyType;

                    // get property's(public virtual Product Product { get; set; }) object
                    object nhValue = modelType.InvokeMember(lookupName, BindingFlags.GetProperty, null, model, new object[] { });
                    // get property's object's primary key
                    object pkValue = nhValue.GetType().InvokeMember(a.PrimaryKey, BindingFlags.GetProperty, null, nhValue, new object[] { });


                    // set property(e.g. public virtual int? lookup_Product { get; set; }) value
                    modelType.InvokeMember(p.Name, BindingFlags.SetProperty, null, model, new object[] { pkValue });

                    
                }
                else
                {
                    if (p.PropertyType.IsGenericType && typeof(IEnumerable).IsAssignableFrom(p.PropertyType))
                    {
                        object list = modelType.InvokeMember(p.Name, BindingFlags.GetProperty, null, model, new object[] { });
                        foreach (var x in (IEnumerable)list)
                        {
                            s.SetupModel(x);
                        }
                    }//if
                }
            }//foreach
            
        }//void SetupModel

        

        public static void SetupNh(this ISession s, object model)
        {            
            Type modelType = model.GetType();

            foreach (PropertyInfo p in modelType.GetProperties())
            {
                LookupAttribute a = p.GetCustomAttributes(false).OfType<LookupAttribute>().SingleOrDefault();

                if (a != null)
                {
                    string lookupName = p.Name.Substring(p.Name.IndexOf('_') + 1);

                    PropertyInfo targetProperty = modelType.GetProperty(lookupName);
                    Type targetPropertyType = targetProperty.PropertyType;
                    object inputValue = modelType.InvokeMember(p.Name, BindingFlags.GetProperty, null, model, new object[] { });

                    object nhValue = s.Load(targetPropertyType, inputValue);
                    modelType.InvokeMember(lookupName, BindingFlags.SetProperty, null, model, new object[] { nhValue });
                }
                else
                {                    
                    if (p.PropertyType.IsGenericType && typeof(IEnumerable).IsAssignableFrom(p.PropertyType))
                    {                                                
                        object list = modelType.InvokeMember(p.Name, BindingFlags.GetProperty, null, model, new object[] { });
                        foreach (var x in (IEnumerable)list)
                        {
                            s.SetupNh(x);
                        }                       
                    }//if
                }
            }//foreach

        }//void SetupNh

    }//class Helper

}//namespace NhLookupAssigner



And this is the Lookup attribute:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace NhLookupAssigner
{
    [AttributeUsage(AttributeTargets.Property)]
    public class LookupAttribute : Attribute
    {
        public string PrimaryKey { get; set; }

        public LookupAttribute(string primaryKey)
        {
            this.PrimaryKey = primaryKey;
        }
    }
}

To use on opening:
void DoOpen()
{
    using (var s = Mapper.GetSessionFactory().OpenSession())
    {
        var z = s.Query<SalesHeader>().Where(x => x.SalesHeaderId == int.Parse(uxRecordIdentifier.Text)).Single()

        s.SetupModel(z);

        bdsHeader.DataSource = z;
    }
}

To use on saving:
void DoSave()
{
    using (var s = Mapper.GetSessionFactory().OpenSession())
    using (var tx = s.BeginTransaction())
    {
        var sh = (SalesHeader)bdsHeader.Current;

        s.SetupNh(sh);

        s.SaveOrUpdate(sh);

        tx.Commit();
    }
}

Tuesday, June 28, 2011

When to use dynamic vs pure reflection?

First of all, a reflection on reflection. Reflection is not a black magic, to know it you don't need to face the mirror and summon some dark forces, but somehow this a bit advance programming is making some feel proud and boasting to peers if they grok reflection, thinking others don't. There's nothing rocket science about reflection. Reflection function names and property names are very intuitive on .NET, you will hardly need documentation. What one need is a due diligence and a bit of googling-fu and asking questions on stackoverflow if one cannot easily infer the right method and property names.


Now back to regular programming. Given this, using pure reflection:

// List all properties of a model
foreach (PropertyInfo p in modelSourceType.GetProperties())
{ 
 // If the property is collection, list all the ID of each collection's item
 if (p.PropertyType.IsGenericType &&  typeof(IEnumerable).IsAssignableFrom(p.PropertyType))
 {  
  object list = modelSourceType.InvokeMember(p.Name, BindingFlags.GetProperty, null, modelSource, new object[] { });
  foreach (var x in (IEnumerable)list)
  {
   object objValue = x.GetType().InvokeMember("ID", BindingFlags.GetProperty, null, x, new object[] { });
   System.Windows.Forms.MessageBox.Show(objValue.ToString());
  } 
 }//if
}

And this, using C# 4's dynamic capability:

// List all properties of a model
foreach (PropertyInfo p in modelSourceType.GetProperties())
{ 
 // If the property is collection, list all the ID of each collection's item
 if (p.PropertyType.IsGenericType &&  typeof(IEnumerable).IsAssignableFrom(p.PropertyType))
 {
  dynamic list = modelSourceType.InvokeMember(p.Name, BindingFlags.GetProperty, null, modelSource, new object[] { });
  foreach (var x in list)
  {
   System.Windows.Forms.MessageBox.Show(x.ID.ToString());
  }
 }//if
}

Which one would you use? I would choose the second code on two conditions, if I'm sure on certain terms that the environment to deploy on will allow to install .NET 4 runtime, and also sure that the certain property (e.g. ID) always exists; the pure reflection is very tedious to code. Make a third condition for choosing the second code, if there's .NET 5 already, no one will sneer at the .NET 4-specific code, second code will have the illusion of backward-compatibility if .NET 5 is out already :p

Some look at latest-version-specific code with disdain ;-)


And oh, it won't hurt to embed the if statement on array's extension method:

foreach( PropertyInfo p in 
 modelSourceType.GetProperties().Where( x => x.PropertyType.IsGenericType && typeof(IEnumerable).IsAssignableFrom(x.PropertyType)) )
{
 dynamic list = modelSourceType.InvokeMember(p.Name, BindingFlags.GetProperty, null, modelSource, new object[] { });
 foreach (var x in list)
 {
  System.Windows.Forms.MessageBox.Show(x.ID.ToString());
 } 
}

Saturday, June 25, 2011

My site's style, how-to

Someone asked how I accomplish the styling on my blog. Without further ado, head to Alex Gorbatchev's syntax highlighter

Since I'm using the <pre /> method, I have to HTML-encode my code before I post it in my blog, for that I created the following ASP.NET MVC utility:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.Mvc;

namespace HtmlEncoder.Controllers
{
    
    [ValidateInput(false)]
    public class HomeController : Controller
    {
        public ActionResult Index(string Encode = "")
        {
            ViewData["Encode"] = Encode;
            return View();
        }

    }
}

Note the use of ValidateInput(false), if you don't put that attribute, ASP.NET MVC will be in paranoid mode, anytime it encounters tags from user input, it will refuse to continue doing anything to prevent cross-site scripting.

And this will be your code in view:

<%@ Control Language="C#" Inherits="System.Web.Mvc.ViewUserControl<dynamic>"   %>



<% using(var c = Html.BeginForm()) { %>

    Paste your code here:
    
    <p><%: Html.TextArea("Encode", new { cols = 120, rows = 12 }) %></p>
        
    <p><input type="submit" value="Encode"/></p>



    Copy the following HTML-encoded code:


    <% string s = Html.Encode(Request["Encode"]); %>
    <p><%: Html.TextArea("Output", s, new { cols = 120, rows = 12 } ) %></p>


<% } %>

I wrote that code(using WebForm tags) prior to Razor support in ASP.NET MVC, and that's the only currently supported in ASP.NET MVC on Mono, I'm on Mac OS X now.

For console-like output, example:

Example

I'm using the following css style:
<style type='text/css'>
.console {
background-color: black;
color: #00FF00;
font-family: courier;
}</style>

And to use full-screen content on blogger, disable the content-outer class in your blog's content, the easiest way to do it is to change the <div class='content-outer'> to <div class='x_content-outer'>

Lastly, if you don't want to include the namespace when copying code or you just want to paste the only relevant code to your blog(good also for pasting code to stackoverflow), use block highlighting, if you are in Visual Studio, hold Alt then start highlighting the indented code of your class; on MonoDevelop on Mac, hold the command button, unfortunately this doesn't work well, it can only highlight up to the last column only, and unfortunately, most of the last line of the code you are copying is a single character only, i.e. the curly bracket }

Wednesday, June 22, 2011

Mapping class to a database view with Entity Framework. EntityType 'ClassHere' has no key defined.

You shall receive this kind of error if you map your class to a view if your class has no designated primary key:

One or more validation errors were detected during model generation:

System.Data.Edm.EdmEntityType: : EntityType 'SalesOnEachCountry' has no key defined. Define the key for this EntityType.
System.Data.Edm.EdmEntitySet: EntityType: The EntitySet SalesOnEachCountryList is based on type SalesOnEachCountry that has no keys defined.

Data source:
create table Country
(
CountryId int identity(1,1) not null primary key,
CountryCode varchar(4) not null unique,
CountryName varchar(30) not null unique
);
 
 
 
insert into Country(CountryCode,CountryName)
values('PH','Philippines');
 
insert into Country(CountryCode,CountryName)
values('CN','China');
 
insert into Country(CountryCode,CountryName)
values('JP','Japan');
 
 
create table City
(
CityId int identity(1,1) not null primary key,
CityCode varchar(3) not null unique,
CityName varchar(30) not null unique,
CountryId int not null references Country(CountryId)
);

 
insert into City(CityCode, CityName, CountryId)
values('MKT','Makati',1);
 
insert into City(CityCode, CityName, CountryId)
values('MNL','Manila',1);
 
 
insert into City(CityCode, CityName, CountryId)
values('BEI','Beijing',2);
 
insert into City(CityCode, CityName, CountryId)
values('SHA','Shanghai',2);
 
 
insert into City(CityCode, CityName, CountryId)
values('TKY','Tokyo',3);
 
insert into City(CityCode, CityName, CountryId)
values('KYT','Kyoto',3);

create table Sales
(
OrNo int identity(1,1) not null primary key,
OrYear int not null,
CityId int not null references City(CityId),
OrAmount numeric(18,6) not null
);

declare @i int;
 
set @i = 0;
while @i < 5 begin
 insert into Sales(OrYear,CityId,OrAmount) values(2010,1,2);
 set @i = @i + 1;
end;
 
set @i = 0;
while @i < 5 begin
 insert into Sales(OrYear,CityId,OrAmount) values(2010,2,2);
 set @i = @i + 1;
end;

set @i = 0;
while @i < 10 begin
 insert into Sales(OrYear,CityId,OrAmount) values(2011,1,2);
 set @i = @i + 1;
end;
 
set @i = 0;
while @i < 10 begin
 insert into Sales(OrYear,CityId,OrAmount) values(2011,2,2);
 set @i = @i + 1;
end;
 

create view SalesOnEachCountry -- with SchemaBinding
as
select x.CountryId, x.CountryName, s.OrYear, 
COUNT_BIG(*) AS SalesCount, sum(s.OrAmount) as TotalSales
from dbo.Sales s
join dbo.City c on c.CityId = s.CityId
join dbo.Country x on c.CountryId = x.CountryId
group by x.CountryId, s.OrYear
    ,x.CountryName -- ancillary field(s)
-- CREATE UNIQUE CLUSTERED INDEX ux_SalesOnEachCountry on SalesOnEachCountry(CountryId, OrYear);


Ancillary fields on GROUP BY

select * from SalesOnEachCountry;


CountryId   CountryName                    OrYear      SalesCount TotalSales
----------- ------------------------------ ----------- ---------- -------------
1           Philippines                    2010        10         20.000000
1           Philippines                    2011        20         40.000000

(2 row(s) affected)



You must put a primary key on your class. Howevery, if you just put the primary key on CountryId for the following mapping...

public class SalesOnEachCountry
{        
    [Key]
    public int CountryId { get; set; }
    public string CountryName { get; set; }        
    public int OrYear { get; set; }
    public long SalesCount { get; set; }
    public decimal TotalSales { get; set; }
}

..., even there's no runtime error for this code...
foreach (SalesOnEachCountry s in db.SalesOnEachCountryList)
{
    Console.WriteLine("{0} {1} {2}", s.CountryName, s.OrYear, s.TotalSales);
}

...,Entity Framework will still function normally and produces this incorrect output...
Philippines 2010 20.000000
Philippines 2010 20.000000

You must put the unique combination on Country and Year, remove the Key attribute on SalesOnEachCountry class, and move it on OnModelCreating:

protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
    base.OnModelCreating(modelBuilder);
    modelBuilder.Conventions.Remove<PluralizingTableNameConvention>();

    modelBuilder.Entity<SalesOnEachCountry>().HasKey(x => new { x.CountryId, x.OrYear });     
}

Now that will produce correct result:
Philippines 2010 20.000000
Philippines 2011 40.000000

Another point, though you can opt to use attribute approach on designating composite primary key on the class itself, the intent of composite key is more clear and intuitive if it is placed on OnModelCreating. As for those who preferred to indicate composite primary key by using attributes, here's how it is done (remove first the HasKey code from OnModelCreating):

public class SalesOnEachCountry
{        
    [Key, Column(Order=0)] public int CountryId { get; set; }
    public string CountryName { get; set; }
    [Key, Column(Order=1)] public int OrYear { get; set; }
    
    public long SalesCount { get; set; }      
    public decimal TotalSales { get; set; }
}

Tuesday, June 21, 2011

Non-validation on DateTime and other primitive types on ASP.NET MVC

Even you don't have any Required attribute on a value type property, i.e. primitive types and struct; ASP.NET MVC validation will still kick-in. You must make the field nullable

public class SalesFilterViewModel
{
    public int CompanyId { get; set; } // will be required, even there's no Required attribute
    public int? BranchId { get; set; } // will not be required
    
    public DateTime? StartingDate { get; set; } // nullable, will not be required
    public DateTime? EndDate { get; set; } // nullable, will not be required
}

// A true-false type of polling
public class Poll
{        
    public string QuestionId { get; set; }

    // When a default value of false(when you don't make the bool nullable, 
    // it will default to false on page load) doesn't make sense, 
    // but an answer(either true or false) is still required, make the 
    // property nullable and add a Required attribute.
    [Required]
    public bool? YourAnswer { get; set; }
}

Monday, June 20, 2011

Update on jQuery Ajax ComboBox Helper. jQuery-compatible ID

Made the widget id compatible to jQuery id naming convention, all the dots are replaced with underscore. The widget name is still in object-dot-property naming form. To see things in effect from the sample code at google, I removed the DTO pattern, I used the model from ORM directly, for example, this was the DTO before:

public class ProductDto
{
    public virtual int ProductId { get; set; }
    public virtual string ProductCode { get; set; }
    public virtual string ProductName { get; set; }
    public virtual int CategoryId { get; set; }
}

But since ASP.NET MVC will not emit a property with dots using that class, we removed the DTO pattern for populating the inputs and then used the model directly(nested objects) to simulate dots conversion to underscores. ASP.NET MVC don't have problems binding the inputs to model, no matter how complex or deep the object is, and NHibernate can also use the following class(nested object) directly when saving the object that was automatically populated by ASP.NET MVC model binding:

public class Product
{
    public virtual int ProductId { get; set; }
    public virtual string ProductCode { get; set; }
    public virtual string ProductName { get; set; }
    public virtual Category Category { get; set; }
}

So if we use this kind of input...

@Html.TextBoxFor(model => model.Category.CategoryId)

...this will be the emitted html...
<input data-val="true" data-val-number="The field CategoryId must be a number." data-val-required="The CategoryId field is required." id="Category_CategoryId" name="Category.CategoryId" type="text" value="" />

...and you can access the text box's value and backing html using this:
alert($('input[name=Category.CategoryId]').val());
$('#Category_CategoryId').html('POCO YO!');

The last statement will remove the textbox and replace it with the text: POCO YO!

All the mentioned behavior above, now also applies to jQuery Ajax ComboBox, it now uses underscore for jQuery IDs, and dots for models/inputs.

This is now the new way to clear the other jQuery Ajax ComboBox's value (note the underscores( instead of dots) to access a jQuery Ajax ComboBox from jQuery):

new { on_selected = @"$('#Purchased_Product_ProductId').ajc().clearValue();" }


Get the latest changes at http://code.google.com/p/jquery-ajax-combobox-aspnet-mvc-helper/downloads/list

Friday, June 17, 2011

Materialized view in Sql Server

If you have a high volume database, and you want to aggregate things, materializing/pre-computing the result is the way things should be done. Fortunately for Sql Server users, we don't have to roll our own views materialization, this is a built-in functionality in Sql Server, without such, we will need to put triggers on each involved tables and increment/decrement the summary column of the summary table.

Let's create the tables and populate them:

create table Country
(
CountryId int identity(1,1) not null primary key,
CountryCode varchar(4) not null unique,
CountryName varchar(30) not null unique
);



insert into Country(CountryCode,CountryName)
values('PH','Philippines');

insert into Country(CountryCode,CountryName)
values('CN','China');

insert into Country(CountryCode,CountryName)
values('JP','Japan');


create table City
(
CityId int identity(1,1) not null primary key,
CityCode varchar(3) not null unique,
CityName varchar(30) not null unique,
CountryId int not null references Country(CountryId)
);




insert into City(CityCode, CityName, CountryId)
values('MKT','Makati',1);

insert into City(CityCode, CityName, CountryId)
values('MNL','Manila',1);


insert into City(CityCode, CityName, CountryId)
values('BEI','Beijing',2);

insert into City(CityCode, CityName, CountryId)
values('SHA','Shanghai',2);


insert into City(CityCode, CityName, CountryId)
values('TKY','Tokyo',3);

insert into City(CityCode, CityName, CountryId)
values('KYT','Kyoto',3);



create table Sales
(
OrNo int identity(1,1) not null primary key,
OrYear int not null,
CityId int not null references City(CityId),
OrAmount numeric(18,6) not null
);



declare @i int;

set @i = 0;
while @i < 500000 begin
 insert into Sales(OrYear,CityId,OrAmount) values(2010,1,2);
 set @i = @i + 1;
end;

set @i = 0;
while @i < 500000 begin
 insert into Sales(OrYear,CityId,OrAmount) values(2010,2,2);
 set @i = @i + 1;
end;


set @i = 0;
while @i < 500000 begin
 insert into Sales(OrYear,CityId,OrAmount) values(2011,1,2);
 set @i = @i + 1;
end;

set @i = 0;
while @i < 500000 begin
 insert into Sales(OrYear,CityId,OrAmount) values(2011,2,2);
 set @i = @i + 1;
end;



set @i = 0;
while @i < 500000 begin
 insert into Sales(OrYear,CityId,OrAmount) values(2010,3,2);
 set @i = @i + 1;
end;

set @i = 0;
while @i < 500000 begin
 insert into Sales(OrYear,CityId,OrAmount) values(2010,4,2);
 set @i = @i + 1;
end;



set @i = 0;
while @i < 500000 begin
 insert into Sales(OrYear,CityId,OrAmount) values(2011,3,2);
 set @i = @i + 1;
end;

set @i = 0;
while @i < 500000 begin
 insert into Sales(OrYear,CityId,OrAmount) values(2011,4,2);
 set @i = @i + 1;
end;


set @i = 0;
while @i < 500000 begin
 insert into Sales(OrYear,CityId,OrAmount) values(2010,5,2);
 set @i = @i + 1;
end;

set @i = 0;
while @i < 500000 begin
 insert into Sales(OrYear,CityId,OrAmount) values(2010,6,2);
 set @i = @i + 1;
end;

set @i = 0;
while @i < 500000 begin
 insert into Sales(OrYear,CityId,OrAmount) values(2011,5,2);
 set @i = @i + 1;
end;

set @i = 0;
while @i < 500000 begin
 insert into Sales(OrYear,CityId,OrAmount) values(2011,6,2);
 set @i = @i + 1;
end;
Then let's create a view for the summary:
create view SalesOnEachCountry 
as 
select x.CountryId, x.CountryName, s.OrYear, 
COUNT_BIG(*) AS SalesCountPerCountryPerYear, sum(s.OrAmount) as TotalSales
from dbo.Sales s
join dbo.City c on c.CityId = s.CityId
join dbo.Country x on c.CountryId = x.CountryId
group by x.CountryId, s.OrYear
        ,x.CountryName -- ancillary field
Running this...
select * from SalesOnEachCountry order by CountryName, OrYear
, has a result of...
CountryId   CountryName                    OrYear      SalesCountPerCountryPerYear TotalSales
----------- ------------------------------ ----------- --------------------------- ---------------------------------------
2           China                          2010        1000000                     2000000.000000
2           China                          2011        1000000                     2000000.000000
3           Japan                          2010        1000000                     2000000.000000
3           Japan                          2011        1000000                     2000000.000000
1           Philippines                    2010        1000000                     2000000.000000
1           Philippines                    2011        1000000                     2000000.000000

(6 row(s) affected)
...which took 4 seconds. My benchmarking is muddled up a bit, as my development machine has SSD(with read speed of 250 MB/s), 7200 rpm drive has 35 MB/s read speed only. Anyway, for the sake of discussion, we are interested with how fast can materialized view(indexed view in Sql Server parlance) optimize aggregations. Now that 4 seconds, could really hurt the web response time, it's time to rectify that. We shall materialize that 6 rows view. It's just a two step process, first we must schema-bind the views:
alter view SalesOnEachCountry with SchemaBinding
as 
select x.CountryId, x.CountryName, s.OrYear, 
COUNT_BIG(*) AS SalesCountPerCountryPerYear, sum(s.OrAmount) as TotalSales
from dbo.Sales s
join dbo.City c on c.CityId = s.CityId
join dbo.Country x on c.CountryId = x.CountryId
group by x.CountryId, s.OrYear
        ,x.CountryName -- ancillary field
Second, we put index on those fact columns(i.e. CountryId, OrYear):
CREATE UNIQUE CLUSTERED INDEX ux_Country on SalesOnEachCountry(CountryId, OrYear);
Then try querying the view again:
select * from SalesOnEachCountry order by CountryName, OrYear;
Surprise! That query took 0 second(I don't know how to switch the display to millisecond) :-) And even if you try to insert/update/delete a row in Sales table, Sql Server will not re-aggregate the 3 million and 1 rows, instead it will just increase/decrease the count and summary column. Likewise, updating the OrAmount won't cause re-aggregation on rows; instead it will efficiently increment/decrement the summary column based on the delta of updated OrAmount. Your query will not visit the million rows to produce information for 6 rows when using materialized views indexed views. It will just get the 6 rows.

Ancillary field(s)

Friday, June 3, 2011

Don't bikeshed

Let's not bikeshed during development. If you see some systems that uses GUID, it has a rationale why it is chosen. Don't outright dismiss/demean the design just because the system uses GUID as primary key. Try to grok why certain decisions are made as such.


I can hazard a guess that those Microsoft developers(Microsoft employees) doesn't bikeshed and didn't spent an inordinate amount of time on what field type to use for primary keys when they are designing Microsoft Dynamics ERP, Microsoft CRM, Microsoft Sharepoint. All those mentioned products uses GUID for primary keys. They have made a decision that the advantages far outweighs the disadvantages. And if Microsoft uses GUID for primary keys on an ERP-type of product is not confidence-inducing enough to those who are reluctant to use GUID as primary key, I don't know what could.


Another bikeshedding is debating whether LastModifiedDate is nullable or not. Trivial choices such as this won't go wrong either way. And if populating the non-nullable date column is a hassle on initialization(i.e. INSERT), just put GETDATE() as the default value on that column. Debating this decision rigorously have reminded me of the quote: "The minute you start to strategize too much, the more you start to think you're in control of your own fate. And you're not, really." Which reminded me of another technology, ah.. the list of things that can be blogged is endless :-) Do things in moderation.



And suggesting to use an integer type instead of GUID, this might lead you to fall into a dangerous trap that you will naturally assume things are slow just because the system uses particular data types (e.g. GUID), then you will be reluctant to know the real deep reason why things are slow.



And to prove that the choice of field type is not the major factor on causes of system bottleneck, here's an anecdote, I optimized a query that took 222 seconds; then optimizing it again, it became 71 milliseconds; then tried another approach but suprisingly is slower, 399 milliseconds; then tried another approach, the sweet spot, 37 milliseconds. I used varchar for GUID, Postgresql prior version 8.4 don't have a native(uses 16 bytes) GUID then, I used GUID to roll my own replication, I had an unenviable job to implement a system to make a selective replication based on business rules, and using GUID as primary keys for tables perfectly fits the bill. I digress. Here's the optimization I've done on a system that uses varchar'd GUID. http://stackoverflow.com/questions/3177038/why-the-most-natural-queryi-e-using-inner-join-instead-of-left-join-is-very


Don't bikeshed.