Sunday, June 30, 2013

Multilingual + Caching on NHibernate: Made Compatible

"There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors" -- http://martinfowler.com/bliki/TwoHardThings.html


We will not create computer science today, we will just use one of the fruits of computer science. Today I will show you how to use NHibernate and its built-in caching mechanism and make it compatible with localization.



I've tried creating a seamless multilingual app in NHibernate before, it works even on brownfield projects. And I've tried a cache-enabled NHibernate app with Redis, it just works, it's simply amazing. I tried mixing the two, it has a problem though, localized fields must be mapped to their own classes, otherwise they can't be switched to another language when the master entity is cached already.



This post will show you how to make a multilingual app on NHibernate without compromising the entity and query cacheablity.



First, let's start with the domain model.

public class Product
{
    public virtual int ProductId { get; set; }
    public virtual int YearIntroduced { get; set; }    
}


[Serializable]
public class ProductLanguageCompositeKey
{
    public virtual int ProductId { get; set; }
    public virtual string LanguageCode { get; set; }


    public override bool Equals(object obj)
    {
        if (obj == null)
            return false;
        
        var t = obj as ProductLanguageCompositeKey;     
        if (t == null)
            return false;
            
        if (ProductId == t.ProductId && LanguageCode == t.LanguageCode)
            return true;
            
        return false;
    }
    
    public override int GetHashCode()
    {
        return (ProductId + "|" + LanguageCode).GetHashCode();
    }
}

public class ProductLanguage
{
    public virtual ProductLanguageCompositeKey ProductLanguageCompositeKey { get; set; }         

    public ProductLanguage()
    {
        this.ProductLanguageCompositeKey = new ProductLanguageCompositeKey();
    }

    public virtual string ProductName { get; set; }
    public virtual string ProductDescription { get; set; }

    // A guide for the user, so he/she could know the source language of the untranslated string came from
    public virtual string ActualLanguageCode { get; set; }
}   


For detailed explanation on Serializable on the composite key and the rationale for extracting the composite key of the product localization to its own class. Read on: http://devlicio.us/blogs/anne_epstein/archive/2009/11/20/nhibernate-and-composite-keys.aspx


To get the multi-lingual "table":

create function dbo.get_product_i18n(@language_code nvarchar(6))
returns table --  (product_id int, language_code nvarchar(6), product_name nvarchar(1000), product_description nvarchar(1000))
as
    return 
    with a as
    (
        select
            the_rank =
                row_number() 
                    over(partition by product_id
                    order by
                     case language_code
                     when @language_code then 1
                     when 'en' then 2
                     else 3
                     end) 

                ,* 
                ,actual_language_code = language_code

        from product_i18n 
    )
    select    

        -- composite key for ORM:
        a.product_id, language_code = @language_code
        -- ...composite key

        , a.product_name, a.product_description

        , a.actual_language_code
    from a 
    where the_rank = 1;


GO

That function will return the entity's localized fields(product_name and product_description in our example), if there's no matched found just return the English version of it, and if there's no English just return any localization that matches the entity.


Data Source:
product_id  year_introduced
----------- ---------------
1           2016
2           2007
3           1964
4           1994

(4 row(s) affected)


product_id  language_code product_name       product_description
----------- ------------- ------------------ -------------------------------
1           en            Apple I            First Personal Computer
1           zh            Pingguo Xian       Xian Dian Nao
2           en            iPhone             First Truly Smartphone
3           ph            Sarao              World's Top Jeepney Brand
4           zh            Anta               China's Top Shoe Brand

(5 row(s) affected)


Sample output of get_product_i18n:

select * from dbo.get_product_i18n('zh');


    product_id  language_code product_name       product_description          actual_language_code
    ----------- ------------- ------------------ ---------------------------- --------------------
    1           zh            Pingguo Xian       Xian Dian Nao                zh
    2           zh            iPhone             First Truly Smartphone       en
    3           zh            Sarao              World's Top Jeepney Brand    ph
    4           zh            Anta               China's Top Shoe Brand       zh

    (4 row(s) affected)



There's a zh translation for First Personal Computer, hence the tvf returns the localized version of First Personal Computer, i.e., Pingguo Xian

There's no zh translation for First Truly Smartphone, but there's an English(fallback language) version of it, hence the tvf returns the English version.

There's no zh translation for World's Top Jeepney Brand, and there's no English version of it, hence the tvf returns the native version, i.e., Sarao

There's a zh localization for China's Top Shoe Brand, hence the get_product_i18n will just return that.



On ProductLanguage mapping, you'll notice that we use merge, this will add the product + language pair if it doesn't exist yet, update if it already exist. If we look at the sample output of the query above, it also return a row for iPhone even if the zh language don't have a translation for it yet, the merge command will be able to INSERT the translation for iPhone if the zh user decided to change the product name and product description to something else. Then if the entity already exist on database, use the UPDATE command instead.


SqlInsert and SqlUpdate don't have a named parameter capability yet, the order of the parameters (denoted by the question mark) is simple, the fields just follows the exact order of its corresponding properties on the class. With minor caveat, primary key(s) are on the last part of the database command. Hence this is the order of the parameter: product_name, product_description, actual_language_code, pk_product_id, pk_language_code. pk_product_id and pk_language_code being the composite keys.


using NHibernate.Mapping.ByCode.Conformist;
using NHibernate.Mapping.ByCode;

using LocalizationWithCaching.Models;

namespace LocalizationWithCaching.ModelMappings
{
    public class ProductLanguageMapping : ClassMapping<ProductLanguage>
    {

        string save =
"merge product_i18n as dst
using( values(?,?,?, ?,?) ) 
    as src(product_name, product_description, actual_language_code, pk_product_id, pk_language_code)
on
    src.pk_product_id = dst.product_id and src.pk_language_code = dst.language_code

when matched then
    update set dst.product_name = src.product_name, dst.product_description = src.product_description
when not matched then
    insert (product_id, language_code, product_name, product_description)
    values (src.pk_product_id, src.pk_language_code, src.product_name, src.product_description);";


        public ProductLanguageMapping()
        {            
            // When the query from this mapping is run on different languages, they will have their isolated copy of query caching.
            // That behavior comes from NHibernate filters. 
            
            Table("dbo.get_product_i18n(:lf.LanguageCode)"); // lf is an NHibernate filter
            // Hence the following behavior:
                // TestQueryCache("en"); // database hit
                // TestQueryCache("zh"); // database hit
                // TestQueryCache("en"); // cached query hit
                // TestQueryCache("zh"); // cached query hit
                // TestQueryCache("ca"); // database hit
                
            // If we don't use NHibernate filters(e.g. using CONTEXT_INFO technique instead), identical queries run from different languages will get the same query cache.
            // Thus this mapping:
            //      Table("dbo.get_product_i18n(convert(nvarchar, substring(context_info(), 5, convert(int, substring(context_info(), 1, 4)) )  ))");
            // Will have this behavior:            
                // TestQueryCache("en"); // database hit
                // TestQueryCache("zh"); // cached query hit
                // TestQueryCache("en"); // cached query hit
                // TestQueryCache("zh"); // cached query hit
                // TestQueryCache("ca"); // cached query hit


            // Need to be turned on, so N+1 won't happen
            // http://stackoverflow.com/questions/8761249/how-do-i-make-nhibernate-cache-fetched-child-collections
            Cache(x => x.Usage(CacheUsage.ReadWrite));


            ComponentAsId(key => key.ProductLanguageCompositeKey, m =>
            {
             m.Property(x => x.ProductId, c => c.Column("product_id"));
             m.Property(x => x.LanguageCode, c => c.Column("language_code"));
            });


            SqlInsert(save);
            SqlUpdate(save);

            Property(x => x.ProductName, c => c.Column("product_name"));
            Property(x => x.ProductDescription, c => c.Column("product_description"));
            Property(x => x.ActualLanguageCode, c => c.Column("actual_language_code"));            
        }
    }
}    

The product mapping:


using NHibernate.Mapping.ByCode.Conformist;
using NHibernate.Mapping.ByCode;

using LocalizationWithCaching.Models;

namespace LocalizationWithCaching.ModelMappings
{
    public class ProductMapping : ClassMapping<Product>
    {
        public ProductMapping()
        {
            Table("product");
            Id(x => x.ProductId, c =>
            {
                c.Column("product_id");
                c.Generator(Generators.Identity);
            });


            // Need to be turned on, so N+1 won't happen
            // http://stackoverflow.com/questions/8761249/how-do-i-make-nhibernate-cache-fetched-child-collections
            Cache(x => x.Usage(CacheUsage.ReadWrite));


            Property(x => x.YearIntroduced, c => c.Column("year_introduced"));
        }
    }
}    

Now on the interesting part, when mapping a table-valued function...

create function dbo.tvf_get_product_sold() 
returns table
as
 return
  select p.product_id, ordered_count = coalesce(sum(o.qty), 0)
  from dbo.product p          
  left join dbo.ordered_product o on p.product_id = o.product_id
  group by p.product_id
go



namespace LocalizationWithCaching.Models
{
    public class GetProductSold
    {
        public virtual int ProductId { get; set; }        
        public virtual int Sold { get; set; }
    }
}


...we must have a mechanism to invalidate the query cache whenever there's a change on ordered_product. NHibernate just have that, just specify Synchronize(new[] { "ordered_product" }); on GetProductSold mapping:

public class GetProductSoldMapping : ClassMapping<GetProductSold>
{
    public GetProductSoldMapping()
    {
        Table("dbo.tvf_get_product_sold()");

        Cache(x => x.Usage(CacheUsage.ReadOnly));

        Synchronize(new[] { "ordered_product" });

        Id(x => x.ProductId, c => c.Column("product_id"));
        Property(x => x.Sold, c => c.Column("ordered_count"));
    }
}


Thus we will get this behavior when we specify synchronize:

TestTvfGetProductSoldQueryCache("en"); // database hit
TestTvfGetProductSoldQueryCache("en"); // cached query hit
UpdateOrderedProduct(orderedProductId: 1, languageCode: "en"); // database hit on entity get. database hit on update. refresh entity cache. invalidates GetProductSold query cache
TestOrderedProductEntityCache(orderedProductId: 1, languageCode: "en"); // cached entity hit on entity get
TestTvfGetProductSoldQueryCache("en"); // was invalidated on line 3. database hit
TestTvfGetProductSoldQueryCache("en"); // cached query hit


private void TestTvfGetProductSoldQueryCache(string languageCode)
{
    using (var session = Mapper.TheMapper.GetSessionFactory().OpenSession())
    using (var tx = session.BeginTransaction().SetLanguage(session, languageCode))
    {
        var x = 
            (from q in 
                  from ps in session.Query<GetProductSold>()
                  join pl in session.Query<ProductLanguage>() on ps.ProductId equals pl.ProductLanguageCompositeKey.ProductId
                  select new { ps, pl }
            where q.pl.ProductLanguageCompositeKey.LanguageCode == languageCode
            select q).Cacheable();

        // Rationale for Cacheable at the end:
        // http://www.ienablemuch.com/2013/06/nhibernate-query-caching.html

        var l = x.ToList();
    }
}


private void UpdateOrderedProduct(int orderedProductId, string languageCode)
{
    using (var session = Mapper.TheMapper.GetSessionFactory().OpenSession())
    using (var tx = session.BeginTransaction().SetLanguage(session, languageCode))
    {
        var x = session.Get<OrderedProduct>(orderedProductId);
        x.Quantity = x.Quantity + 1;
        session.Save(x);
        tx.Commit();
    }
}


private void TestOrderedProductEntityCache(int orderedProductId, string languageCode)
{
    using (var session = Mapper.TheMapper.GetSessionFactory().OpenSession())
    using (var tx = session.BeginTransaction().SetLanguage(session, languageCode))
    {
        var x = session.Get<OrderedProduct>(orderedProductId);                
    }
} 

With the right modelling, multilingual with caching can be easily achieved on NHibernate.


Here's the detailed behavior of NHibernate caching:

TestProductAndLanguageQueryCache("en"); // database hit
TestProductAndLanguageQueryCache("zh"); // database hit
TestProductAndLanguageQueryCache("en"); // cached query hit
TestProductAndLanguageQueryCache("zh"); // cached query hit
TestProductAndLanguageQueryCache("ca"); // database hit


TestTvfGetOrderInfoQueryCache("en"); // database hit
TestTvfGetOrderInfoQueryCache("en"); // cached query hit
TestTvfGetOrderInfoQueryCache("zh"); // database hit
TestTvfGetOrderInfoQueryCache("zh"); // cached query hit

   
TestTvfGetOrderInfoQueryCache("en"); // cached query hit
UpdateProduct(productId: 1, languageCode: "en"); // cached entity hit on entity get. database hit on update. refresh entity cache. Invalidates GetOrderInfo query cache
TestTvfGetOrderInfoQueryCache("en"); // database hit
TestTvfGetOrderInfoQueryCache("en"); // cached query hit
UpdateProductLanguage(productId: 1, languageCode: "zh"); // cached entity hit on entity get. database hit on update. refresh entity cache. Invalidates GetOrderInfo query cache
TestTvfGetOrderInfoQueryCache("en"); // database hit. even we only touch the Chinese language above
TestTvfGetOrderInfoQueryCache("en"); // cached query hit



TestTvfGetProductSoldQueryCache("en"); // database hit
UpdateProduct(productId: 1, languageCode: "en"); // cached entity hit on entity get. database hit on update. refresh entity cache. does not invalidates GetProductSold query cache
TestTvfGetProductSoldQueryCache("en"); // was not invalidated. cached query hit. GetProductSold query cache is Synchronized with ordered_product only
TestTvfGetProductSoldQueryCache("en"); // cached query hit
UpdateProductLanguage(productId: 1, languageCode: "zh"); // cached hit on entity get. database hit on update. refresh entity cache. invalidates GetProductSold query cache as it joins on ProductLanguage entity
TestTvfGetProductSoldQueryCache("en"); // cached query was invalidated. database hit
TestTvfGetProductSoldQueryCache("en"); // cached query hit


TestTvfGetProductSoldQueryCache("en"); // cached query hit
UpdateOrderedProduct(orderedProductId: 1, languageCode: "en"); // database hit on entity get. database hit on update. refresh entity cache. invalidates GetProductSold query cache
TestOrderedProductEntityCache(orderedProductId: 1, languageCode: "en"); // cached entity hit on entity get
TestTvfGetProductSoldQueryCache("en"); // database hit
TestTvfGetProductSoldQueryCache("en"); // cached query hit
UpdateOrderedProduct(orderedProductId: 1, languageCode: "zh"); // cached entity hit on entity get. database hit on update. refresh entity cache. invalidates GetProductSold query cache
TestTvfGetProductSoldQueryCache("en"); // database hit. even we only touch the Chinese language above
TestTvfGetProductSoldQueryCache("en"); // cached query hit
UpdateOrderedProduct(orderedProductId: 1, languageCode: "en"); // cached entity hit on entity get. database hit on update. refresh entity cache
TestOrderedProductEntityCache(orderedProductId: 1, languageCode: "en"); // cached entity hit on entity get



TestProductEntityCache(productId: 1,languageCode: "en"); // cached entity hit
TestProductLanguageEntityCache(productId: 1, languageCode: "en"); // cached entity hit
TestProductEntityCache(productId: 2, languageCode: "zh"); // cached entity hit
TestProductLanguageEntityCache(productId: 2, languageCode: "en"); // cached entity hit

UpdateProduct(productId: 1, languageCode: "en"); // cached entity hit on entity get. database hit on update. refresh entity cache
TestProductEntityCache(productId: 1, languageCode: "en"); // cached entity hit

UpdateProduct(productId: 1, languageCode: "en"); // cached entity hit on entity get. database hit. refresh entity cache. invalidates cached query
TestProductAndLanguageQueryCache("en"); // cached query was invalidated. database hit
TestProductAndLanguageQueryCache("en"); // cached query hit

TestProductQueryCache("en"); // cached query was invalidated. database hit
TestProductQueryCache("ca"); // cached query was invalidated. database hit

TestProductQueryCache("en"); // cached query hit
TestProductQueryCache("ca"); // cached query hit

UpdateProduct(productId: 1, languageCode: "en"); // cached entity hit on entity get. database hit on update. refresh entity cache. invalidates cached query 
TestProductEntityCache(productId: 1, languageCode: "en"); // cached entity hit
TestProductAndLanguageQueryCache("en"); // cached query was invalidated. database hit
TestProductAndLanguageQueryCache("ca"); // database hit

TestProductLanguageEntityCache(productId: 1, languageCode: "ca"); // cached entity hit

TestProductLanguageEntityCache(productId: 1, languageCode: "es"); // database hit

TestProductLanguageEntityCache(productId: 1, languageCode: "es"); // cached entity hit

// cached entity hit on entity get. database hit on update. entity cache is refreshed. invalidates *ALL* language version of ProductLanguage query cache
UpdateProductLanguage(productId: 1, languageCode: "es");


TestProductAndLanguageQueryCache("zh"); // was invalidated. database hit
TestProductAndLanguageQueryCache("es"); // was invalidated. database hit
TestProductAndLanguageQueryCache("es"); // cached query hit           
TestProductAndLanguageQueryCache("en"); // was invalidated. database hit            

TestProductAndLanguageQueryCache("zh"); // cached query hit
TestProductAndLanguageQueryCache("en"); // cached query hit


TestProductLanguageEntityCache(productId: 1, languageCode: "es"); // cached entity hit


Full code: https://github.com/MichaelBuen/DemoLocalizationWithCaching/tree/optimized

Tuesday, June 25, 2013

Taking the Optimistic Approach. Say goodbye to deadlocks!

“I've received a lot of support from fellow pessimists. I've considered starting a support group, but nobody thinks it will help.” --  https://twitter.com/raganwald/status/273118149073846272




The following caught my attention on SQL Server 2014 announcement:

“Taking the Optimistic Approach

The Hekaton team also found that multi-version concurrency control (MVCC) proved robust in scenarios with higher workloads and higher contention”



MVCC greatly reduces deadlocks. Let’s hope SQL Server team make MVCC turned on by default on SQL Server 2014 like what Entity Framework team did on its EF6’s code first’s database creation:

Default transaction isolation level is changed to READ_COMMITTED_SNAPSHOT for databases created using Code First, potentially allowing for more scalability and fewer deadlocks. -- http://entityframework.codeplex.com/wikipage?title=specs
               



Happy Computing! 

Monday, June 24, 2013

Returning strongly-typed distinct objects from a function without the drama of GetHashCode and Equals

On my last post, I detailed how to tame Distinct. The following does exactly that without the drama of GetHashCode and Equals. You might still need GetHashCode and Equals if you are using ORMs. However, for in-memory operations, a GroupBy approach would suffice for a distinct operation:


using System;
using System.Linq;
using System.Collections.Generic;
 
public class Test
{
        public static void Main()
        {
                var list = new List<Fact> () {
                            new Fact { Thing = "Roses are", Color = 0xFF0000 },
                                    new Fact { Thing = "Roses are", Color = 0xFF0000 },
                                    new Fact { Thing = "Violets are", Color = 0x0000FF },
                                    new Fact { Thing = "Sugar is", Color = 0x1337 },
                                    new Fact { Thing = "Sugar is", Color = 0x1337 },
                            };                            
                foreach(var item in GetFacts(list)) {
                    Console.WriteLine ("{0:x} {1:x}", item.Thing, item.Color);
                }
        }
        
        public static IEnumerable<Fact> GetFacts(IEnumerable<Fact> list) {
            return list.GroupBy(x => new { x.Thing, x.Color }, (key,group) => group.First());
        }
}
 
public class Fact
{
    public string Thing { get; set; }
    public int Color { get; set; }
}


Roses are ff0000
Violets are ff
Sugar is 1337

To some, the above distinct construct maybe obvious, but rediscovering what Linq/lambda can do gives me a feeling of impostor syndrome, especially I'm way past the 10,000 hours on this beautiful craft called programming. Well, I'll just heed Scott Hanselman's advice, read the advice at the last part of this article: http://www.hanselman.com/blog/ImAPhonyAreYou.aspx


Live code: http://ideone.com/uWtZrT


If Linq is your thing, you can also implement the above code the following way:

public static IEnumerable<Fact> GetFacts(IEnumerable<Fact> list) 
{
    return 
        from x in list
        group x by new { x.Thing, x.Color } into grp
        select grp.First();                 
}

Live Code: http://ideone.com/J0PzRN

Output:

Roses are ff0000
Violets are ff
Sugar is 1337


If you are not a technology bigot, you won't be snotty on pure Linq or pure lambda approach, you will use the right tool for the job, in fact you won't mind mixing Linq and lambda in one statement, especially if this approach gives you the right balance on code maintainability and readability:
public static IEnumerable<Fact> GetFacts(IEnumerable<Fact> list)
{
     return 
         from x in list.GroupBy(x => new {x.Thing, x.Color })                 
         select x.First();                      
}

Live Code: http://ideone.com/tVMHS6

Output:
Roses are ff0000
Violets are ff
Sugar is 1337


And then there's another approach:

public static IEnumerable<Fact> GetFacts(IEnumerable<Fact> list) 
{
    return list.GroupBy(x => new {x.Thing, x.Color }).Select(x => x.First());                  
}

Live Code: http://ideone.com/9aUl4W

Output:

Roses are ff0000
Violets are ff
Sugar is 1337


This is why many folks called programming a subjective, er.. scratch that, an art thing. There are way too many approaches to achieve a common goal. Beauty is in the eye of the beholder and so they say.


If I become blind and my profession is still programming, the last thing I would want to do is to write things in pure lambda approach and forces me to make sure that every parenthesis are properly matched, lambda syntax won't let me fluently phrase the logic in the most natural way, it's not friendly to the finger joints too as one needs to incessantly press the shift key when typing fat arrows and parenthesis. Using Linq syntax instead of lambda, would allow me to write logic fluently, I can just fluently write a code without incessantly worrying if the parenthesis matches, there's little to no parenthesis if one goes the Linq approach, and it's not discriminating to people with impaired vision.


To sway you to avoid exclusively writing in Lambda approach and sell you on using Linq(plus mixing some lambda here and there if it would make sense and make the code readable) approach instead, imagine you are the protagonist in the movie Crank, if you are not able to see an output as fast as you needed to see, your heart rate will drop, then you will die in the process.

And so Jason Statham write the join construct using Lambda approach:

var categorizedProducts = product
    .Join(productcategory, p => p.Id, pc => pc.ProdId, (p, pc) => new { p, pc })
    .Join(category, ppc => ppc.CatId, c => c.Id, (ppc, c) => new { ppc, c }))
    .Select(m => new { 
        ProdId = m.ppc.Id, 
        CatId = m.c.CatId
        // other assignments
    });


And much to his chagrin, when he hit build, there are errors that indicates some properties cannot be found on object, and to add insult to his heart injury, the compiler says there's a syntax error, the frustration sets in, then his heart rate drops, then he die. If you cannot spot the errors on the above lambda construct, you shall die too! lol


To spot where the error on the above lambda code is, get the right code here: http://stackoverflow.com/questions/9720225/how-to-perform-join-between-multiple-tables-in-linq-lambda/9722744#9722744, then use the Diff plugin of Notepad++ to compare the code above to the right code. That has only two joins, but the navigation for ProdId is already three dots away(i.e., m.ppc.p.Id), then imagine if the above code has 5+ joins, property navigation will become way too deep, the volume of dots on the lambda approach could put to shame the amount of parenthesis in a typical Lisp code


Writing code in lambda would not make a good Crank movie script, the protagonist will die way too early. Upon realizing that grave mistake, the scriptwriter rewrite the part where Jason Statham is writing the join construct using lambda syntax; on the revised script, Jason Statham is writing the join construct using Linq syntax instead:


var categorizedProducts =
    from p in product
    join pc in productcategory on p.Id equals pc.ProdId
    join c in category on pc.CatId equals c.Id
    select new {
        ProdId = p.Id, // or pc.ProdId
        CatId = c.CatId
        // other assignments
    };


He was able to produce results with the above Linq code in no time, his heart rate didn't drop a bit, and for that Linq code he was able to survive and made the sequel a possibility. I think on the sequel he wrote AngularJS code instead of jQuery lol :D


Happy Computing! ツ

Thursday, June 20, 2013

Distinct() nuances

You are unique just like everyone else are, it's just hard to convince the computer. According to computer, you are just a series of bits of 1s and 0s


Using a straightforward Distinct() would work if the elements share the same memory address.

using System;
using System.Linq;
using System.Collections.Generic;
 
public class Test
{
        public static void Main()
        {
                var rose = new Fact { Thing = "Roses are", Color = 0xFF0000 };
                var violet = new Fact { Thing = "Violets are", Color = 0x0000FF };
                var sugar = new Fact { Thing = "Sugar is", Color = 0x1337 };
 
                var list = new List<Fact> () {
                    rose, rose, violet, sugar, sugar
                };
 
                foreach (var item in list.Distinct()) {
                    Console.WriteLine ("{0:x} {1:x}", item.Thing, item.Color);
                }
        }
}
 
 
public class Fact
{
    public string Thing { get; set; }
    public int Color { get; set; }
}




Live code: http://ideone.com/psEvjS
Output:
Roses are ff0000
Violets are ff
Sugar is 1337




In real application, it is not that simple, even some elements have the same content, those elements are not slotted to same memory, by default the Distinct() would not work on that kind of scenario. To wit, the following doesn't work:

using System;
using System.Linq;
using System.Collections.Generic;
 
public class Test
{
        public static void Main()
        {
                var list = new List<Fact> () {
                            new Fact { Thing = "Roses are", Color = 0xFF0000 },
                                    new Fact { Thing = "Roses are", Color = 0xFF0000 },
                                    new Fact { Thing = "Violets are", Color = 0x0000FF },
                                    new Fact { Thing = "Sugar is", Color = 0x1337 },
                                    new Fact { Thing = "Sugar is", Color = 0x1337 },
                            };
                
                foreach (var item in list.Distinct()) {
                    Console.WriteLine ("{0:x} {1:x}", item.Thing, item.Color);
                }
        }
}
 
 
public class Fact
{
    public string Thing { get; set; }
    public int Color { get; set; }
}

Live code: http://ideone.com/58gynx

Thus the above Distinct() incorrectly returns duplicate objects:
Roses are ff0000
Roses are ff0000
Violets are ff
Sugar is 1337
Sugar is 1337



You can use anonymous type on Distinct though. Anonymous types has implicit implementation of Equals and GetHashCode, that's why Distinct() on anonymous type can detect objects with similar content:


using System;
using System.Linq;
using System.Collections.Generic;
 
public class Test
{
        public static void Main()
        {
                var list = new List<Fact> () {
                            new Fact { Thing = "Roses are", Color = 0xFF0000 },
                                    new Fact { Thing = "Roses are", Color = 0xFF0000 },
                                    new Fact { Thing = "Violets are", Color = 0x0000FF },
                                    new Fact { Thing = "Sugar is", Color = 0x1337 },
                                    new Fact { Thing = "Sugar is", Color = 0x1337 },
                            };
                
                foreach (var item in list.Select(x => new { x.Thing, x.Color } ).Distinct()) {
                    Console.WriteLine ("{0} {1}", item.Thing, item.Color);
                }
        }
}
 
 
public class Fact
{
    public string Thing { get; set; }
    public int Color { get; set; }
}


Live code: http://ideone.com/8K79Dn
Hence that shall return distinct elements:
Roses are ff0000
Violets are ff
Sugar is 1337



However, you cannot use that technique when you need to return a list from a function, a function requires a non-anonymous type. Thus you cannot do the following, this will have a compilation error:


public static IEnumerable<Fact> GetFacts(IEnumerable<Fact> list) {
    return list.Select(x => new { x.Thing, x.Color } ).Distinct();
}

Live Code: http://ideone.com/1v6x0i



The following works, it felt kludgy and DRY-violating though:

public static IEnumerable<Fact> GetFacts(IEnumerable<Fact> list) {
            return list.Select(x => new { x.Thing, x.Color }).Distinct()
                       .Select(x => new Fact { Thing = x.Thing, Color = x.Color } );
}

Live Code: http://ideone.com/N4iKTx
Output:
Roses are ff0000
Violets are ff
Sugar is 1337


So what does a good developer shall do to be able to write an elegant code? The dev shall learn how to do a proper object similarity comparison. Do you remember the implicit implementation of Equals and GetHashCode on anonymous type I mentioned a while ago? That's what we will do, we will do it explicitly on the concerning class. To wit, we need to have this code on Fact class:


public override bool Equals (object obj)
{
    if (obj == null) return false;
    
    var x = obj as Fact;
    if (x == null) return  false;
    
    return this.Thing == x.Thing && this.Color == x.Color;
}


public override int GetHashCode ()
{        
    // http://stackoverflow.com/questions/263400/what-is-the-best-algorithm-for-an-overridden-system-object-gethashcode
    unchecked 
    {
        int hash = 17;
        hash = hash * 23 + this.Thing.GetHashCode ();
        hash = hash * 23 + this.Color.GetHashCode ();

        return hash;
    }
}   


When we use Distinct() on that class, that returns the distinct elements even each object has different memory addresses:

Live Code: http://ideone.com/eMrO4y
Output:
Roses are ff0000
Violets are ff
Sugar is 1337


That's it folks!

Happy Coding! ツ

Musings on class diagram vs database diagram

Database(and data model for that matter) look flat. If developers are looking at database diagram, it often gives a feel of leaky abstraction especially that the mental model of querying and saving the business entities is done via domain model. It’s hard to grasp a richer context of the whole business when looking at database diagram. Class Diagram provides richer context


it would be a lot better for developers to see the bigger picture via class diagram, not via database diagram, it’s easier and intuitive to grasp business entities/needs via class diagram (the domain model for that matter). This is in line with operating in the mental model of ”There’s No Database”. The developer’s mind can seamlessly work with the domain models when looking at class diagram more.


Ayende often uses class diagram. Class diagram paints a thousand words. And we don’t have to use F12 (Control+click or Control+B if using ReSharper) on a class references/collections to navigate around the domain models when we have a class diagram.


Happy Computing! ツ

Friday, June 14, 2013

Abstraction, a friend or foe?

If the repository component can persist things in memory, we would not have a need for mocking IQueryable, an example:
https://github.com/MichaelBuen/ToTheEfnhX/blob/master/TestProject/TestTheIRepository.cs

Here's the repository component I made that works on memory, NHibernate, Entity Framework:
https://github.com/MichaelBuen/ToTheEfnhX

Other's approach on unit testing without using repository component + mocking is by using an in-memory database like SQLite, this precludes the need for repository component:
http://ayende.com/blog/3983/nhibernate-unit-testing

I ceased the development of my repository component project as it is hard to abstract the differences of fetching strategies between NHibernate and Entity Framework:
http://www.ienablemuch.com/2012/08/solving-nhibernate-thenfetchmany.html

I tried many ways to make the repository component conform to Entity Framework's intuitive fetching API, but it's hard to make NHibernate's fetching strategy similar to Entity Framework. Even I don't like Entity Framework, I'm not a technology bigot and I'm able to see some of Entity Framework's good points. So if I'm able to abstract the fetching strategy on the repository component, it would be patterned after Entity Framework's fetching strategies, i.e., not on NHibernate's buggy ThenFetchMany nor on NHibernate's ToFuture. Entity Framework's fetching strategy is easier to use and intuitive, It Just Works.

http://www.ienablemuch.com/2012/08/solving-nhibernate-thenfetchmany.html
http://msdn.microsoft.com/en-us/library/gg671236%28VS.103%29.aspx



Here's what can happen if the API can't empower the developer, it compels them to use N+1. A case of too much abstraction: http://ayende.com/blog/156577/on-umbracos-nhibernates-pullout

Thursday, June 13, 2013

NHibernate query caching

Query caching won't work using this Linq construct:

var query = 
    (from p in session.Query<Product>()
    join t in session.Query<ProductTranslation>() on p.ProductId equals l.ProductId
    where p.YearIntroduced >= 1950 && string.Compare(t.ProductName, "B") >= 0
    select new { p, t }).Cacheable();

var list = query.ToList();



Must be done this way:
var query = from p in session.Query<Product>()
            join t in session.Query<ProductTranslation>() on p.ProductId equals l.ProductId                         
            select new { p, t };

query = query.Where(q => q.p.YearIntroduced >= 1950 && string.Compare(q.t.ProductName, "B") >= 0).Cacheable();

var list = query.ToList();


If you are a one-stop-shop kind of person, you might prefer the multiple Linq statements be kept together in just one Linq statement:

var query = 
            (from q in 
                from p in session.Query<Product>()
                join t in session.Query<ProductTranslation>() on p.ProductId equals l.ProductId 
                select new { p, t }
            where q.p.YearIntroduced >= 1950 && string.Compare(q.t.ProductName, "B") >= 0
            select q).Cacheable();               

var list = query.ToList();

The above things won't be possible if a DAL/Repository component layered on top of an ORM component hides or neglect to expose useful functionality, or if there's a project policy of not allowing the developer to use NHibernate's API directly, e.g., caching, fetching



The developer should be able to access the caching API when there's a need.



"You can solve every problem with another level of indirection, except for the problem of too many levels of indirection" – D.Wheeler



Here's a case of too many indirections: http://ayende.com/blog/156577/on-umbracos-nhibernates-pullout



Happy Coding! ツ