Tuesday, October 30, 2012

Alphabet Soup 101: Sargable

Programming industry is notorious for alphabet soup of acronyms and jargons. If some concept or phrase don't have any acronym or catchy word yet, someone will make a made-up one.



When I'm talking to someone about database optimization, I'm referring to an index-friendly condition as index-friendly condition. Could it be any harder than that? But lo and behold, there's a handy word for that programming discipline, it's called SARGable



In relational databases, a condition (or predicate) in a query is said to be sargable if the DBMS engine can take advantage of an index to speed up the execution of the query. The term is derived from a contraction of Search ARGument ABLE.

A query failing to be sargable is known as Non-Sargable query and has an effect in query time, so one of the steps in query optimization is convert them to be sargable.

The typical thing that will make a sql query non-sargable is to include a function in left part of a condition of a Where clause.



Read the rest of the article at: http://en.wikipedia.org/wiki/Sargable

http://www.sql-server-performance.com/2007/t-sql-where/

Sunday, October 14, 2012

Collection initializer

public class Country
    {
        public int CountryId { get; set; }
        public string CountryText { get; set; }
        public List<City> Cities { get; set; }
    }



Found out some (not-so)obscure syntax allowed in C#, whereas given the above class, this is allowed, albeit has null runtime exception on Cities:


public List<Country> Countries = new List<Country>
{
 new Country 
 { 
  CountryId = 1, CountryText = "Philippines",
  Cities = 
  {
   new City { CityId = 1, CityText = "Manila" },
   new City { CityId = 2, CityText = "Makati" },
   new City { CityId = 3, CityText = "Quezon" }
  }
 }
};


Where normally, I do it as this:


public List<Country> Countries = new List<Country>
{
 new Country 
 { 
  CountryId = 1, CountryText = "Philippines",
  Cities = new List<City>
  {
   new City { CityId = 1, CityText = "Manila" },
   new City { CityId = 2, CityText = "Makati" },
   new City { CityId = 3, CityText = "Quezon" }
  }
 }
};


If you don't want to use new List<City> in collection initializer, assign it an instance first, otherwise the collection initializer would result to null runtime exception. The following could allow no new List<City> on collection initializer:

    public class Country
    {
        public int CountryId { get; set; }
        public string CountryText { get; set; }
        public List<City> Cities = new List<City>(); // instantiate List
    }

But that would break the sanctity of everything-must-be-a-property, property makes your code more future-proof

Redis

When your question gets answered very fast in stackoverflow, you have Redis to thanks for that. Ok, that's an oversimplification.


When you asked a question, its INSERT can be served very fast by SQL Server as there are no SELECTs needed be ran when other users are merely browsing for existing answers and questions, and that includes your currently written question too. Less database contentions, more questions gets easily served.


Sources:

http://highscalability.com/blog/2011/3/3/stack-overflow-architecture-update-now-at-95-million-page-vi.html

http://meta.stackoverflow.com/questions/69164/does-stack-overflow-use-caching-and-if-so-how

Salivating at C# 5's async/await + Redis combo: http://marcgravell.blogspot.com/2011/04/async-redis-await-booksleeve.html

Friday, October 12, 2012

Shades of XY problem

Shades of XY problem. A colleague asked me how to prevent nested substitution of the token when it's inside of an attribute, the following...


foreach(var token in arr) 
{
 sb.Replace( token, string.Format("<li id='{0}' class='meh'>{1}</li>", token, GetDescriptive(token)) );
}

...resulted to this (say token is "bp") nested substitution:

<li id='<li id='bp' class='meh'>BasePay</li>' class='meh'>BasePay</li> 
+ (<li id='<li id='bp' class='meh'>BasePay</li>' class='meh'>BasePay</li> 
* <li id='pi' class='meh'>PercentIncrease</li>)


...then I quickly suggested this:


var d = new Dictionary<Guid,string>();


foreach(var token in arr) 
{
 var g = Guid.NewGuid();
 d[g] = token;
 sb.Replace( token, string.Format("<li id='{0}' class='meh'>{1}</li>", g, GetDescriptive(token)) );
}


foreach(KeyValuePair<Guid,string> kv in d) 
{
        // essentially putting back the original token inside of attribute
 sb.Replace(kv.Key, kv.Value); 
}


That produces correct output:


<li id='bp' class='meh'>BasePay</li> 
+ (<li id='bp' class='meh'>BasePay</li> * <li id='pi' class='meh'>PercentIncrease</li>)



However, I dwell too much on how to prevent values substitution when it's inside of an attribute, in fact I'm thinking of regular expression approach too; sometimes though, when you formulate a regular expression solution you'll have two problems later. And so the proverbial light bulb lit up on my puny brain, why not just use Guid so when the next repeating substitution occur it won't replace those tokens inside of attribute?! Then I devise the Dictionary+Guid combo solution above. It works! Brilliant! Or so I thought I'm brilliant, it works but...


Then it occurred to me while I'm going back home, why did the string replacement occurred two times in the first place? Should doing it once would suffice?


Then upon arriving back to the office the next day, I re-check why the string substitution could occur two times, and then I saw this:


BasePay + (BasePay * PercentIncrease)


Darn, that's why the nested substitution occurred, BasePay occurred two times in the equation.


I even remember my colleague explained the above equation the day before. The colleague explained both the X (equation with repeating token) problem and the Y solution (prevent operands inside of attribute from being nestedly replaced) being attempted first. The colleague cannot be faulted, I focused too much on Y. Admit it or not, to most programmers, challenging problems are what piqued our curiosity the most, and to me it was how to prevent that-goddamn-token-inside-of-attribute-from-being-nestedly-replaced. When we can't connect to a database server, we are most likely to suggest to a colleague to look at the firewall configuration first, or connect directly to IP address, or connect to named pipes instead of TCP/IP, etc, it's rare of us to suggest to look first if the network cable is disconnected, you get my drift.


The most elegant solution is to prevent duplicate tokens from appearing multiple times in the list:


foreach(var token in arr.Distinct()) 
{
 sb.Replace(token, string.Format("<li id='{0}' class='meh'>", token));
}


It works and very simple. And while I'm writing this post, it hit my mind that it is best to tackle the problem from its root cause.


var arr = formula.Split("/+-*()".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Distinct();

foreach(var token in arr) 
{
 sb.Replace(token, string.Format("<li id='{0}' class='meh'>", token));
}


The root cause of any problems in a Y solution comes from X requirement. So when someone is asking you a question, and you think they are just presenting you their Y, demand for X.