The most annoying aspect of software development, for me, is debugging. I don't mind the kinds of bugs that yield to a few minutes' inspection. The bugs I hate are the ones that show up only after hours of successful operation, under unusual circumstances, or whose stack traces lead to dead ends. Fortunately, there’s a simple technique that will dramatically reduce the number of these bugs in your software. It won't reduce the overall number of bugs, at least not at first, but it'll make most defects much easier to find. The technique is to build your software to "fail fast."
In pursuit of speed, one will argue that First is better than Single:
var e = session.Query<Employee>().First(x => x.PersonId == personId);
An HR system which is a multi-tenant one and was designed around composite primary keys, but the dev forgot to add a company filter on the query above, then a few days later their client Microsoft is complaining that their system is randomly returning an employee: "Lee Kai-Fu is not working with us anymore, why is he included in our employee list? Please effin' correct your app, Steve Ballmer will throw chair at us, Tim Cook is not our CEO!"
The cause of the error is on the usage of First, which doesn't Fail Fast. Have they used Single, the following will not continue if it returns more than one employee, it will throw an exception, it will fail fast.
var e = session.Query<Employee>().Single(x => x.PersonId == personId);
Another scenario where we should especially strive for Fail Fast is on reports. While we are optimizing one stored proc which has a performance problem due to staging tables approach, we uncovered a subtle bug that doesn't fail fast, the error is not so pronounced with staging tables as compared with straight query.
The following is an oversimplification of the original stored proc we are optimizing, but is simple enough to illustrate the nature of the problem, given the following data:
create table Person ( PersonCode char(1) primary key, PersonName nvarchar(20) not null ); create table Product ( ProductCode char(1) primary key, ProductName nvarchar(20) not null ); create table Person_Likes_Product ( PersonCode char(1) not null references Person(PersonCode), ProductCode char(1) not null references Product(ProductCode), Rating int not null, constraint ck_Rating check(Rating between 1 and 5) ); insert into Person(PersonCode, PersonName) values ('J','John'), ('P','Paul'), ('G','George'), ('R','Ringo'); insert into Product(ProductCode, ProductName) values ('C','CPU'), ('K','Keyboard'), ('M','Mouse'), ('S','Smartphone'); insert into Person_Likes_Product(PersonCode, ProductCode, Rating) values ('J', 'C', 5), ('J', 'K', 3), ('P', 'C', 5), ('P', 'K', 4), ('P', 'S', 5), ('G', 'S', 5);
...we should list all products with Paul's rating on them. It should produce the following output:
ProductCode ProductName Rating ----------- -------------------- ----------- C CPU 5 K Keyboard 4 M Mouse NULL S Smartphone 5 (4 row(s) affected)This is the original stored proc:
create procedure PersonLikesStagingTables ( @PersonID char(1) ) as begin -- staging table declare @productRating table ( ProductCode char(1), ProductName nvarchar(20), Rating int ); insert into @productRating(ProductCode, ProductName) select p.ProductCode, p.ProductName from Product p; update p set p.Rating = l.Rating from @productRating p left join Person_Likes_Product l on p.ProductCode = l.ProductCode; select p.ProductCode, p.ProductName, p.Rating from @productRating p order by p.ProductName; end; go exec PersonLikesStagingTables 'P';The output of that stored proc has an error:
ProductCode ProductName Rating ----------- -------------------- ----------- C CPU 5 K Keyboard 3 M Mouse NULL S Smartphone 5 (4 row(s) affected)Holy Guacamole! Have you spotted the error on the code above? Why Paul's keyboard has a rating of 3 instead of 4? If this a price listing, your company could go bankrupt.
Upon further inspection, Paul's rating was not applied to the staging table:
update p set p.Rating = l.Rating from @productRating p left join Person_Likes_Product l on p.ProductCode = l.ProductCode;That should be done like this:
update p set p.Rating = l.Rating from @productRating p left join Person_Likes_Product l on p.ProductCode = l.ProductCode and l.PersonCode = @PersonCode;Had the original stored proc was written with no staging tables, that is it is using straight query instead, the error will be more pronounced, failing fast:
create procedure PersonLikesFailFast(@PersonCode char(1)) as begin select p.ProductCode, p.ProductName, l.Rating from Product p left join Person_Likes_Product l on p.ProductCode = l.ProductCode end; go exec PersonLikesFailFast 'P'That will have an output of this:
ProductCode ProductName Rating ----------- -------------------- ----------- C CPU 5 C CPU 5 K Keyboard 3 K Keyboard 4 M Mouse NULL S Smartphone 5 S Smartphone 5 (7 row(s) affected)The QA or user could quickly report to the devs that the person's product rating output has an error. If your system has an error, it should fail it fast, it's easy to correct errors when they are discovered early. Staging tables and First extension method deprives your system of failing fast. The errors done on the system with no fail fast mechanism is hard to debug as they sometimes semi-works, they can even linger producing corrupted data.
Aside from making our reports perform fast, it should also be fast on failing when it has an error.
How about the optimization after the removal of staging tables? After removing 5 insert statements, 13 update statements, 3 table variables then converted them to straight query, it improved from 20 seconds to 11 seconds.
And before we do aggressive optimization, let's pay attention with correctness first:
It's easier to make a correct program fast than to make a fast program correct.
Happy Computing! ツ