Monday, June 24, 2013

Returning strongly-typed distinct objects from a function without the drama of GetHashCode and Equals

On my last post, I detailed how to tame Distinct. The following does exactly that without the drama of GetHashCode and Equals. You might still need GetHashCode and Equals if you are using ORMs. However, for in-memory operations, a GroupBy approach would suffice for a distinct operation:


using System;
using System.Linq;
using System.Collections.Generic;
 
public class Test
{
        public static void Main()
        {
                var list = new List<Fact> () {
                            new Fact { Thing = "Roses are", Color = 0xFF0000 },
                                    new Fact { Thing = "Roses are", Color = 0xFF0000 },
                                    new Fact { Thing = "Violets are", Color = 0x0000FF },
                                    new Fact { Thing = "Sugar is", Color = 0x1337 },
                                    new Fact { Thing = "Sugar is", Color = 0x1337 },
                            };                            
                foreach(var item in GetFacts(list)) {
                    Console.WriteLine ("{0:x} {1:x}", item.Thing, item.Color);
                }
        }
        
        public static IEnumerable<Fact> GetFacts(IEnumerable<Fact> list) {
            return list.GroupBy(x => new { x.Thing, x.Color }, (key,group) => group.First());
        }
}
 
public class Fact
{
    public string Thing { get; set; }
    public int Color { get; set; }
}


Roses are ff0000
Violets are ff
Sugar is 1337

To some, the above distinct construct maybe obvious, but rediscovering what Linq/lambda can do gives me a feeling of impostor syndrome, especially I'm way past the 10,000 hours on this beautiful craft called programming. Well, I'll just heed Scott Hanselman's advice, read the advice at the last part of this article: http://www.hanselman.com/blog/ImAPhonyAreYou.aspx


Live code: http://ideone.com/uWtZrT


If Linq is your thing, you can also implement the above code the following way:

public static IEnumerable<Fact> GetFacts(IEnumerable<Fact> list) 
{
    return 
        from x in list
        group x by new { x.Thing, x.Color } into grp
        select grp.First();                 
}

Live Code: http://ideone.com/J0PzRN

Output:

Roses are ff0000
Violets are ff
Sugar is 1337


If you are not a technology bigot, you won't be snotty on pure Linq or pure lambda approach, you will use the right tool for the job, in fact you won't mind mixing Linq and lambda in one statement, especially if this approach gives you the right balance on code maintainability and readability:
public static IEnumerable<Fact> GetFacts(IEnumerable<Fact> list)
{
     return 
         from x in list.GroupBy(x => new {x.Thing, x.Color })                 
         select x.First();                      
}

Live Code: http://ideone.com/tVMHS6

Output:
Roses are ff0000
Violets are ff
Sugar is 1337


And then there's another approach:

public static IEnumerable<Fact> GetFacts(IEnumerable<Fact> list) 
{
    return list.GroupBy(x => new {x.Thing, x.Color }).Select(x => x.First());                  
}

Live Code: http://ideone.com/9aUl4W

Output:

Roses are ff0000
Violets are ff
Sugar is 1337


This is why many folks called programming a subjective, er.. scratch that, an art thing. There are way too many approaches to achieve a common goal. Beauty is in the eye of the beholder and so they say.


If I become blind and my profession is still programming, the last thing I would want to do is to write things in pure lambda approach and forces me to make sure that every parenthesis are properly matched, lambda syntax won't let me fluently phrase the logic in the most natural way, it's not friendly to the finger joints too as one needs to incessantly press the shift key when typing fat arrows and parenthesis. Using Linq syntax instead of lambda, would allow me to write logic fluently, I can just fluently write a code without incessantly worrying if the parenthesis matches, there's little to no parenthesis if one goes the Linq approach, and it's not discriminating to people with impaired vision.


To sway you to avoid exclusively writing in Lambda approach and sell you on using Linq(plus mixing some lambda here and there if it would make sense and make the code readable) approach instead, imagine you are the protagonist in the movie Crank, if you are not able to see an output as fast as you needed to see, your heart rate will drop, then you will die in the process.

And so Jason Statham write the join construct using Lambda approach:

var categorizedProducts = product
    .Join(productcategory, p => p.Id, pc => pc.ProdId, (p, pc) => new { p, pc })
    .Join(category, ppc => ppc.CatId, c => c.Id, (ppc, c) => new { ppc, c }))
    .Select(m => new { 
        ProdId = m.ppc.Id, 
        CatId = m.c.CatId
        // other assignments
    });


And much to his chagrin, when he hit build, there are errors that indicates some properties cannot be found on object, and to add insult to his heart injury, the compiler says there's a syntax error, the frustration sets in, then his heart rate drops, then he die. If you cannot spot the errors on the above lambda construct, you shall die too! lol


To spot where the error on the above lambda code is, get the right code here: http://stackoverflow.com/questions/9720225/how-to-perform-join-between-multiple-tables-in-linq-lambda/9722744#9722744, then use the Diff plugin of Notepad++ to compare the code above to the right code. That has only two joins, but the navigation for ProdId is already three dots away(i.e., m.ppc.p.Id), then imagine if the above code has 5+ joins, property navigation will become way too deep, the volume of dots on the lambda approach could put to shame the amount of parenthesis in a typical Lisp code


Writing code in lambda would not make a good Crank movie script, the protagonist will die way too early. Upon realizing that grave mistake, the scriptwriter rewrite the part where Jason Statham is writing the join construct using lambda syntax; on the revised script, Jason Statham is writing the join construct using Linq syntax instead:


var categorizedProducts =
    from p in product
    join pc in productcategory on p.Id equals pc.ProdId
    join c in category on pc.CatId equals c.Id
    select new {
        ProdId = p.Id, // or pc.ProdId
        CatId = c.CatId
        // other assignments
    };


He was able to produce results with the above Linq code in no time, his heart rate didn't drop a bit, and for that Linq code he was able to survive and made the sequel a possibility. I think on the sequel he wrote AngularJS code instead of jQuery lol :D


Happy Computing! ツ

No comments:

Post a Comment