Monday, October 7, 2013

Thanks SQL Server! My Bonus is Now Higher* than My Salary

Given these two functions:

create function EmployeeMoolahs()
returns table
as
return
    select
        e.EmployeePersonID, e.Bonus, e.Salary
    from Employee e
go
 
create function EmployeeSummary()
returns table
as
return
    select p.FirstName, p.LastName, em.*
    from Person p
    join EmployeeMoolahs() em on p.PersonID = em.EmployeePersonId
go

The output of this query:

select top 5 * from EmployeeSummary() es order by es.EmployeePersonId




Seeing that it's best to present the salary information first before the bonus, we move the Salary information right after the EmployeePersonID.

alter function EmployeeMoolahs()
returns table
as
return
    select
        e.EmployeePersonID, e.Salary, e.Bonus
    from Employee e
go

Then you go back home, didn't even bother to check the output since it's a trivial fix anyway. On the 15th of the month, you receive your salary and you check your pay slip. You are so delighted that they make your bonus equal to your salary, then your jaw dropped, you don't receive any salary. Upon investigating the cause of that anomaly, you found out the erring query:

select top 5 * from EmployeeSummary() es order by es.EmployeePersonId




It yields a wrong output!


Whoa, holy guacamole! Why I don't have any salary? As if, the Bonus and Salary swapped contents. Yeah right, they really are. The presentation concern you tried to fix, by placing Salary right after EmployeePersonId, has an error. The salary is now slotted to bonus field, and the bonus is now slotted salary field.


Checking if your eyes is fooling you, you tried to explicitly select all the columns:

select top 5 es.FirstName, es.LastName, es.EmployeePersonId, es.Salary, es.Bonus from EmployeeSummary() es order by es.EmployeePersonId

But still, you still don't have any salary.




You also check the function you modified if it returns correct information:

select top 5 * from EmployeeMoolahs() em order by em.EmployeePersonId

However, it has correct output:




Then you are thinking what causes the error on EmployeeSummary function? What makes the query wild? Why it's not returning proper information? Hmm.. wild? Maybe it's the wild…card, there's something wild on the wildcard! Got to remove the wildcard on EmployeeSummary()

create function EmployeeSummary()
returns table
as
return
    select p.FirstName, p.LastName, em.*
    from Person p
    join EmployeeMoolahs() em on p.PersonID = em.EmployeePersonId
go

Change that to explicit form:

alter function EmployeeSummary()
returns table
as
return
    select p.FirstName, p.LastName, em.EmployeePersonId, em.Salary, em.Bonus
    from Person p
    join EmployeeMoolahs() em on p.PersonID = em.EmployeePersonId
go

Then you check the result of this query:

select top 5 * from EmployeeSummary() es order by es.EmployeePersonId

It's correct now!



Trying to check the fix if it is just a fluke, we swap again the order of Bonus and Salary of the source function to their original order:

alter function EmployeeMoolahs()
returns table
as
return
    select
        e.EmployeePersonID, e.Bonus, e.Salary
    from Employee e
go

Re-check again:

select top 5 * from EmployeeSummary() es order by es.EmployeePersonId

Below is the result, the columns' information are still correct! The order of fields won't change as we are selecting from EmployeeSummary which has explicitly selected columns. What is important, the information is always correct, the ordering of columns should be done on EmployeeSummary, not on EmployeeMoolahs.





Let's try to investigate the root cause of the error. If you got wildcard in the query of your function, SQL Server is trying to build a metadata for returned fields of that function, as if SQL Server burns a metadata for the returned fields of the query of your function, albeit invisibly.

create function EmployeeMoolahs()
returns table
-- burn these metadata...
(
    Bonus money,
    Salary money
)
-- ...burn
as
return
    select
        e.EmployeePersonID, e.Bonus, e.Salary
    from Employee e
go
 
create function EmployeeSummary()
returns table
-- burn metadata
(
    FirstName nvarchar(max),
    LastName nvarchar(max),
          -- burn these metadata based on the metadata obtained from EmployeeMoolah's wildcard
    EmployeePersonId int,   -- 1
    Bonus money,            -- 2
    Salary money            -- 3
)
-- ...burn
as
return
    select p.FirstName, p.LastName, em.*
     
          -- At the time this EmployeeSummary function is created, this is the order of fields on
         -- EmployeeMoolah()'s wildcard:
     
          -- 1. EmployeePersonId
          -- 2. Bonus
          -- 3. Salary
     
     
    from Person p
    join EmployeeMoolahs() em on p.PersonID = em.EmployeePersonId
go

In fact that is how other RDBMS does it, i.e. you need to explicitly return the metadata of your returned fields. Wildcard problem can be avoided by application developers using other RDBMS as they can see the wildcard could one day not be slotted properly to the correct column metadata, so they tend to be explicit on selecting each columns. They tend to avoid wildcards on functions. This kind of problem is virtually non-existent to them.


So in SQL Server, if we change the order of the fields on the source function (EmployeeMoolahs) of the dependent function(EmployeeSummary), e.g. swap the position of Bonus and Salary, Salary comes before Bonus on EmployeeMoolahs, the EmployeeSummary's metadata will not be updated accordingly, it will not be automatically updated. That is, EmployeeSummary metadata's position will still be the same, i.e. Bonus still comes before Salary. Hence when we issue a query on wildcard-bearing EmployeeSummary function, the wildcard will fall to the wrong slots.


EmployeeSummary() 's Metadata is not automatically updated     EmployeePersonId    Bonus    Salary
EmployeeMoolah()'s expanded wildcard from its swapped columns  EmployeePersonId    Salary   Bonus

So that's how the wildcard can cause problems on SQL Server. Trying to imagine what fields are good to slot on the salary :P


Think of the damages that can be caused by wrong information slotted by the wildcard. IsHired field goes to IsAdmin, OriginalPrice goes to CurrentPrice, PassingScore goes to Score, etc.


* It can happen, try to copy the problem above :P


Sample data:
create table Person
(
 PersonId int identity(1,1) primary key,
 FirstName nvarchar(max) not null,
 LastName nvarchar(max) not null,
);

create table Employee
(
 EmployeePersonId int not null primary key references Person(PersonId),
 Salary money not null,
 Bonus money not null default(0)
);

insert into Person(FirstName, LastName) values
('John', 'Lennon'),
('Paul', 'McCartney'),
('George', 'Harrison'),
('Ringo', 'Starr');


insert into Employee(EmployeePersonId, Salary, Bonus) values
(1, 60000, 20000),
(2, 60000, 0),
(3, 40000, 0),
(4, 40000, 0);




Happy Computing! ツ

2 comments:

  1. Stop using * anywhere; magically, the problem disappears!

    ReplyDelete
    Replies
    1. True indeed ツ But somehow like some application developers who believes they are not paid enough(http://visualstudiogallery.msdn.microsoft.com/7179e851-a263-44b7-a177-1d31e33c84fd) to align their codes, some back-end developers who writes queries likewise believe they are not paid enough to write out each columns :D

      Delete