Wednesday, August 7, 2013

Fail Fast: Why we should avoid staging tables as much as possible

We are not talking about Fail Fast the philosophical one as not everyone are sold with that idea. We will tackle Fail Fast the engineering one



http://martinfowler.com/ieeeSoftware/failFast.pdf:
The most annoying aspect of software development, for me, is debugging. I don't mind the kinds of bugs that yield to a few minutes' inspection. The bugs I hate are the ones that show up only after hours of successful operation, under unusual circumstances, or whose stack traces lead to dead ends. Fortunately, there’s a simple technique that will dramatically reduce the number of these bugs in your software. It won't reduce the overall number of bugs, at least not at first, but it'll make most defects much easier to find. The technique is to build your software to "fail fast."



In pursuit of speed, one will argue that First is better than Single:


var e = session.Query<Employee>().First(x => x.PersonId == personId);


An HR system which is a multi-tenant one and was designed around composite primary keys, but the dev forgot to add a company filter on the query above, then a few days later their client Microsoft is complaining that their system is randomly returning an employee: "Lee Kai-Fu is not working with us anymore, why is he included in our employee list? Please effin' correct your app, Steve Ballmer will throw chair at us, Tim Cook is not our CEO!"



The cause of the error is on the usage of First, which doesn't Fail Fast. Have they used Single, the following will not continue if it returns more than one employee, it will throw an exception, it will fail fast.

var e = session.Query<Employee>().Single(x => x.PersonId == personId);


Another scenario where we should especially strive for Fail Fast is on reports. While we are optimizing one stored proc which has a performance problem due to staging tables approach, we uncovered a subtle bug that doesn't fail fast, the error is not so pronounced with staging tables as compared with straight query.



The following is an oversimplification of the original stored proc we are optimizing, but is simple enough to illustrate the nature of the problem, given the following data:

create table Person
(
    PersonCode char(1) primary key,
    PersonName nvarchar(20) not null
);
 
create table Product
(
    ProductCode char(1) primary key,
    ProductName nvarchar(20) not null
);
 
create table Person_Likes_Product
(
    PersonCode char(1) not null references Person(PersonCode),
    ProductCode char(1) not null references Product(ProductCode),
    Rating int not null,
    constraint ck_Rating check(Rating between 1 and 5)
);
 
insert into Person(PersonCode, PersonName) values
('J','John'), ('P','Paul'), ('G','George'), ('R','Ringo');
 
insert into Product(ProductCode, ProductName) values
('C','CPU'), ('K','Keyboard'), ('M','Mouse'), ('S','Smartphone');
 
insert into Person_Likes_Product(PersonCode, ProductCode, Rating) values
('J', 'C', 5),
('J', 'K', 3),
('P', 'C', 5),
('P', 'K', 4),
('P', 'S', 5),
('G', 'S', 5);

...we should list all products with Paul's rating on them. It should produce the following output:


ProductCode ProductName          Rating
----------- -------------------- -----------
C           CPU                  5
K           Keyboard             4
M           Mouse                NULL
S           Smartphone           5
(4 row(s) affected)
This is the original stored proc:

create procedure PersonLikesStagingTables
(
    @PersonID char(1)
) as
begin
     
    -- staging table
    declare @productRating table
    (
        ProductCode char(1),
        ProductName nvarchar(20),
        Rating int
    );
     
    insert into @productRating(ProductCode, ProductName)
    select p.ProductCode, p.ProductName
    from Product p;
     
     
    update p
        set p.Rating = l.Rating
    from @productRating p
    left join Person_Likes_Product l on p.ProductCode = l.ProductCode;
     
     
    select p.ProductCode, p.ProductName, p.Rating
    from @productRating p
    order by p.ProductName;
     
end;
go
 
exec PersonLikesStagingTables 'P';
The output of that stored proc has an error:

ProductCode ProductName          Rating
----------- -------------------- -----------
C           CPU                  5
K           Keyboard             3
M           Mouse                NULL
S           Smartphone           5
(4 row(s) affected)
Holy Guacamole! Have you spotted the error on the code above? Why Paul's keyboard has a rating of 3 instead of 4? If this a price listing, your company could go bankrupt.



Upon further inspection, Paul's rating was not applied to the staging table:

update p
    set p.Rating = l.Rating
from @productRating p
left join Person_Likes_Product l on p.ProductCode = l.ProductCode;
That should be done like this:

update p
    set p.Rating = l.Rating
from @productRating p
left join Person_Likes_Product l on p.ProductCode = l.ProductCode and l.PersonCode = @PersonCode;
Had the original stored proc was written with no staging tables, that is it is using straight query instead, the error will be more pronounced, failing fast:

create procedure PersonLikesFailFast(@PersonCode char(1)) as
begin
    select
        p.ProductCode,
        p.ProductName,
        l.Rating
    from Product p   
    left join Person_Likes_Product l on p.ProductCode = l.ProductCode
     
end;   
go
 
exec PersonLikesFailFast 'P'
That will have an output of this:
ProductCode ProductName          Rating
----------- -------------------- -----------
C           CPU                  5
C           CPU                  5
K           Keyboard             3
K           Keyboard             4
M           Mouse                NULL
S           Smartphone           5
S           Smartphone           5
(7 row(s) affected)
The QA or user could quickly report to the devs that the person's product rating output has an error. If your system has an error, it should fail it fast, it's easy to correct errors when they are discovered early. Staging tables and First extension method deprives your system of failing fast. The errors done on the system with no fail fast mechanism is hard to debug as they sometimes semi-works, they can even linger producing corrupted data.



Aside from making our reports perform fast, it should also be fast on failing when it has an error.



How about the optimization after the removal of staging tables? After removing 5 insert statements, 13 update statements, 3 table variables then converted them to straight query, it improved from 20 seconds to 11 seconds.



And before we do aggressive optimization, let's pay attention with correctness first:

http://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_are_switching_written_numbers_when_scanning



It's easier to make a correct program fast than to make a fast program correct.





Happy Computing! ツ

No comments:

Post a Comment