Monday, June 24, 2013

Returning strongly-typed distinct objects from a function without the drama of GetHashCode and Equals

On my last post, I detailed how to tame Distinct. The following does exactly that without the drama of GetHashCode and Equals. You might still need GetHashCode and Equals if you are using ORMs. However, for in-memory operations, a GroupBy approach would suffice for a distinct operation:


using System;
using System.Linq;
using System.Collections.Generic;
 
public class Test
{
        public static void Main()
        {
                var list = new List<Fact> () {
                            new Fact { Thing = "Roses are", Color = 0xFF0000 },
                                    new Fact { Thing = "Roses are", Color = 0xFF0000 },
                                    new Fact { Thing = "Violets are", Color = 0x0000FF },
                                    new Fact { Thing = "Sugar is", Color = 0x1337 },
                                    new Fact { Thing = "Sugar is", Color = 0x1337 },
                            };                            
                foreach(var item in GetFacts(list)) {
                    Console.WriteLine ("{0:x} {1:x}", item.Thing, item.Color);
                }
        }
        
        public static IEnumerable<Fact> GetFacts(IEnumerable<Fact> list) {
            return list.GroupBy(x => new { x.Thing, x.Color }, (key,group) => group.First());
        }
}
 
public class Fact
{
    public string Thing { get; set; }
    public int Color { get; set; }
}


Roses are ff0000
Violets are ff
Sugar is 1337

To some, the above distinct construct maybe obvious, but rediscovering what Linq/lambda can do gives me a feeling of impostor syndrome, especially I'm way past the 10,000 hours on this beautiful craft called programming. Well, I'll just heed Scott Hanselman's advice, read the advice at the last part of this article: http://www.hanselman.com/blog/ImAPhonyAreYou.aspx


Live code: http://ideone.com/uWtZrT


If Linq is your thing, you can also implement the above code the following way:

public static IEnumerable<Fact> GetFacts(IEnumerable<Fact> list) 
{
    return 
        from x in list
        group x by new { x.Thing, x.Color } into grp
        select grp.First();                 
}

Live Code: http://ideone.com/J0PzRN

Output:

Roses are ff0000
Violets are ff
Sugar is 1337


If you are not a technology bigot, you won't be snotty on pure Linq or pure lambda approach, you will use the right tool for the job, in fact you won't mind mixing Linq and lambda in one statement, especially if this approach gives you the right balance on code maintainability and readability:
public static IEnumerable<Fact> GetFacts(IEnumerable<Fact> list)
{
     return 
         from x in list.GroupBy(x => new {x.Thing, x.Color })                 
         select x.First();                      
}

Live Code: http://ideone.com/tVMHS6

Output:
Roses are ff0000
Violets are ff
Sugar is 1337


And then there's another approach:

public static IEnumerable<Fact> GetFacts(IEnumerable<Fact> list) 
{
    return list.GroupBy(x => new {x.Thing, x.Color }).Select(x => x.First());                  
}

Live Code: http://ideone.com/9aUl4W

Output:

Roses are ff0000
Violets are ff
Sugar is 1337


This is why many folks called programming a subjective, er.. scratch that, an art thing. There are way too many approaches to achieve a common goal. Beauty is in the eye of the beholder and so they say.


If I become blind and my profession is still programming, the last thing I would want to do is to write things in pure lambda approach and forces me to make sure that every parenthesis are properly matched, lambda syntax won't let me fluently phrase the logic in the most natural way, it's not friendly to the finger joints too as one needs to incessantly press the shift key when typing fat arrows and parenthesis. Using Linq syntax instead of lambda, would allow me to write logic fluently, I can just fluently write a code without incessantly worrying if the parenthesis matches, there's little to no parenthesis if one goes the Linq approach, and it's not discriminating to people with impaired vision.


To sway you to avoid exclusively writing in Lambda approach and sell you on using Linq(plus mixing some lambda here and there if it would make sense and make the code readable) approach instead, imagine you are the protagonist in the movie Crank, if you are not able to see an output as fast as you needed to see, your heart rate will drop, then you will die in the process.

And so Jason Statham write the join construct using Lambda approach:

var categorizedProducts = product
    .Join(productcategory, p => p.Id, pc => pc.ProdId, (p, pc) => new { p, pc })
    .Join(category, ppc => ppc.CatId, c => c.Id, (ppc, c) => new { ppc, c }))
    .Select(m => new { 
        ProdId = m.ppc.Id, 
        CatId = m.c.CatId
        // other assignments
    });


And much to his chagrin, when he hit build, there are errors that indicates some properties cannot be found on object, and to add insult to his heart injury, the compiler says there's a syntax error, the frustration sets in, then his heart rate drops, then he die. If you cannot spot the errors on the above lambda construct, you shall die too! lol


To spot where the error on the above lambda code is, get the right code here: http://stackoverflow.com/questions/9720225/how-to-perform-join-between-multiple-tables-in-linq-lambda/9722744#9722744, then use the Diff plugin of Notepad++ to compare the code above to the right code. That has only two joins, but the navigation for ProdId is already three dots away(i.e., m.ppc.p.Id), then imagine if the above code has 5+ joins, property navigation will become way too deep, the volume of dots on the lambda approach could put to shame the amount of parenthesis in a typical Lisp code


Writing code in lambda would not make a good Crank movie script, the protagonist will die way too early. Upon realizing that grave mistake, the scriptwriter rewrite the part where Jason Statham is writing the join construct using lambda syntax; on the revised script, Jason Statham is writing the join construct using Linq syntax instead:


var categorizedProducts =
    from p in product
    join pc in productcategory on p.Id equals pc.ProdId
    join c in category on pc.CatId equals c.Id
    select new {
        ProdId = p.Id, // or pc.ProdId
        CatId = c.CatId
        // other assignments
    };


He was able to produce results with the above Linq code in no time, his heart rate didn't drop a bit, and for that Linq code he was able to survive and made the sequel a possibility. I think on the sequel he wrote AngularJS code instead of jQuery lol :D


Happy Computing! ツ

Thursday, June 20, 2013

Distinct() nuances

You are unique just like everyone else are, it's just hard to convince the computer. According to computer, you are just a series of bits of 1s and 0s


Using a straightforward Distinct() would work if the elements share the same memory address.

using System;
using System.Linq;
using System.Collections.Generic;
 
public class Test
{
        public static void Main()
        {
                var rose = new Fact { Thing = "Roses are", Color = 0xFF0000 };
                var violet = new Fact { Thing = "Violets are", Color = 0x0000FF };
                var sugar = new Fact { Thing = "Sugar is", Color = 0x1337 };
 
                var list = new List<Fact> () {
                    rose, rose, violet, sugar, sugar
                };
 
                foreach (var item in list.Distinct()) {
                    Console.WriteLine ("{0:x} {1:x}", item.Thing, item.Color);
                }
        }
}
 
 
public class Fact
{
    public string Thing { get; set; }
    public int Color { get; set; }
}




Live code: http://ideone.com/psEvjS
Output:
Roses are ff0000
Violets are ff
Sugar is 1337




In real application, it is not that simple, even some elements have the same content, those elements are not slotted to same memory, by default the Distinct() would not work on that kind of scenario. To wit, the following doesn't work:

using System;
using System.Linq;
using System.Collections.Generic;
 
public class Test
{
        public static void Main()
        {
                var list = new List<Fact> () {
                            new Fact { Thing = "Roses are", Color = 0xFF0000 },
                                    new Fact { Thing = "Roses are", Color = 0xFF0000 },
                                    new Fact { Thing = "Violets are", Color = 0x0000FF },
                                    new Fact { Thing = "Sugar is", Color = 0x1337 },
                                    new Fact { Thing = "Sugar is", Color = 0x1337 },
                            };
                
                foreach (var item in list.Distinct()) {
                    Console.WriteLine ("{0:x} {1:x}", item.Thing, item.Color);
                }
        }
}
 
 
public class Fact
{
    public string Thing { get; set; }
    public int Color { get; set; }
}

Live code: http://ideone.com/58gynx

Thus the above Distinct() incorrectly returns duplicate objects:
Roses are ff0000
Roses are ff0000
Violets are ff
Sugar is 1337
Sugar is 1337



You can use anonymous type on Distinct though. Anonymous types has implicit implementation of Equals and GetHashCode, that's why Distinct() on anonymous type can detect objects with similar content:


using System;
using System.Linq;
using System.Collections.Generic;
 
public class Test
{
        public static void Main()
        {
                var list = new List<Fact> () {
                            new Fact { Thing = "Roses are", Color = 0xFF0000 },
                                    new Fact { Thing = "Roses are", Color = 0xFF0000 },
                                    new Fact { Thing = "Violets are", Color = 0x0000FF },
                                    new Fact { Thing = "Sugar is", Color = 0x1337 },
                                    new Fact { Thing = "Sugar is", Color = 0x1337 },
                            };
                
                foreach (var item in list.Select(x => new { x.Thing, x.Color } ).Distinct()) {
                    Console.WriteLine ("{0} {1}", item.Thing, item.Color);
                }
        }
}
 
 
public class Fact
{
    public string Thing { get; set; }
    public int Color { get; set; }
}


Live code: http://ideone.com/8K79Dn
Hence that shall return distinct elements:
Roses are ff0000
Violets are ff
Sugar is 1337



However, you cannot use that technique when you need to return a list from a function, a function requires a non-anonymous type. Thus you cannot do the following, this will have a compilation error:


public static IEnumerable<Fact> GetFacts(IEnumerable<Fact> list) {
    return list.Select(x => new { x.Thing, x.Color } ).Distinct();
}

Live Code: http://ideone.com/1v6x0i



The following works, it felt kludgy and DRY-violating though:

public static IEnumerable<Fact> GetFacts(IEnumerable<Fact> list) {
            return list.Select(x => new { x.Thing, x.Color }).Distinct()
                       .Select(x => new Fact { Thing = x.Thing, Color = x.Color } );
}

Live Code: http://ideone.com/N4iKTx
Output:
Roses are ff0000
Violets are ff
Sugar is 1337


So what does a good developer shall do to be able to write an elegant code? The dev shall learn how to do a proper object similarity comparison. Do you remember the implicit implementation of Equals and GetHashCode on anonymous type I mentioned a while ago? That's what we will do, we will do it explicitly on the concerning class. To wit, we need to have this code on Fact class:


public override bool Equals (object obj)
{
    if (obj == null) return false;
    
    var x = obj as Fact;
    if (x == null) return  false;
    
    return this.Thing == x.Thing && this.Color == x.Color;
}


public override int GetHashCode ()
{        
    // http://stackoverflow.com/questions/263400/what-is-the-best-algorithm-for-an-overridden-system-object-gethashcode
    unchecked 
    {
        int hash = 17;
        hash = hash * 23 + this.Thing.GetHashCode ();
        hash = hash * 23 + this.Color.GetHashCode ();

        return hash;
    }
}   


When we use Distinct() on that class, that returns the distinct elements even each object has different memory addresses:

Live Code: http://ideone.com/eMrO4y
Output:
Roses are ff0000
Violets are ff
Sugar is 1337


That's it folks!

Happy Coding! ツ

Musings on class diagram vs database diagram

Database(and data model for that matter) look flat. If developers are looking at database diagram, it often gives a feel of leaky abstraction especially that the mental model of querying and saving the business entities is done via domain model. It’s hard to grasp a richer context of the whole business when looking at database diagram. Class Diagram provides richer context


it would be a lot better for developers to see the bigger picture via class diagram, not via database diagram, it’s easier and intuitive to grasp business entities/needs via class diagram (the domain model for that matter). This is in line with operating in the mental model of ”There’s No Database”. The developer’s mind can seamlessly work with the domain models when looking at class diagram more.


Ayende often uses class diagram. Class diagram paints a thousand words. And we don’t have to use F12 (Control+click or Control+B if using ReSharper) on a class references/collections to navigate around the domain models when we have a class diagram.


Happy Computing! ツ

Friday, June 14, 2013

Abstraction, a friend or foe?

If the repository component can persist things in memory, we would not have a need for mocking IQueryable, an example:
https://github.com/MichaelBuen/ToTheEfnhX/blob/master/TestProject/TestTheIRepository.cs

Here's the repository component I made that works on memory, NHibernate, Entity Framework:
https://github.com/MichaelBuen/ToTheEfnhX

Other's approach on unit testing without using repository component + mocking is by using an in-memory database like SQLite, this precludes the need for repository component:
http://ayende.com/blog/3983/nhibernate-unit-testing

I ceased the development of my repository component project as it is hard to abstract the differences of fetching strategies between NHibernate and Entity Framework:
http://www.ienablemuch.com/2012/08/solving-nhibernate-thenfetchmany.html

I tried many ways to make the repository component conform to Entity Framework's intuitive fetching API, but it's hard to make NHibernate's fetching strategy similar to Entity Framework. Even I don't like Entity Framework, I'm not a technology bigot and I'm able to see some of Entity Framework's good points. So if I'm able to abstract the fetching strategy on the repository component, it would be patterned after Entity Framework's fetching strategies, i.e., not on NHibernate's buggy ThenFetchMany nor on NHibernate's ToFuture. Entity Framework's fetching strategy is easier to use and intuitive, It Just Works.

http://www.ienablemuch.com/2012/08/solving-nhibernate-thenfetchmany.html
http://msdn.microsoft.com/en-us/library/gg671236%28VS.103%29.aspx



Here's what can happen if the API can't empower the developer, it compels them to use N+1. A case of too much abstraction: http://ayende.com/blog/156577/on-umbracos-nhibernates-pullout

Thursday, June 13, 2013

NHibernate query caching

Query caching won't work using this Linq construct:

var query = 
    (from p in session.Query<Product>()
    join t in session.Query<ProductTranslation>() on p.ProductId equals l.ProductId
    where p.YearIntroduced >= 1950 && string.Compare(t.ProductName, "B") >= 0
    select new { p, t }).Cacheable();

var list = query.ToList();



Must be done this way:
var query = from p in session.Query<Product>()
            join t in session.Query<ProductTranslation>() on p.ProductId equals l.ProductId                         
            select new { p, t };

query = query.Where(q => q.p.YearIntroduced >= 1950 && string.Compare(q.t.ProductName, "B") >= 0).Cacheable();

var list = query.ToList();


If you are a one-stop-shop kind of person, you might prefer the multiple Linq statements be kept together in just one Linq statement:

var query = 
            (from q in 
                from p in session.Query<Product>()
                join t in session.Query<ProductTranslation>() on p.ProductId equals l.ProductId 
                select new { p, t }
            where q.p.YearIntroduced >= 1950 && string.Compare(q.t.ProductName, "B") >= 0
            select q).Cacheable();               

var list = query.ToList();

The above things won't be possible if a DAL/Repository component layered on top of an ORM component hides or neglect to expose useful functionality, or if there's a project policy of not allowing the developer to use NHibernate's API directly, e.g., caching, fetching



The developer should be able to access the caching API when there's a need.



"You can solve every problem with another level of indirection, except for the problem of too many levels of indirection" – D.Wheeler



Here's a case of too many indirections: http://ayende.com/blog/156577/on-umbracos-nhibernates-pullout



Happy Coding! ツ