Monday, October 7, 2013

Thanks SQL Server! My Bonus is Now Higher* than My Salary

Given these two functions:

create function EmployeeMoolahs()
returns table
as
return
    select
        e.EmployeePersonID, e.Bonus, e.Salary
    from Employee e
go
 
create function EmployeeSummary()
returns table
as
return
    select p.FirstName, p.LastName, em.*
    from Person p
    join EmployeeMoolahs() em on p.PersonID = em.EmployeePersonId
go

The output of this query:

select top 5 * from EmployeeSummary() es order by es.EmployeePersonId




Seeing that it's best to present the salary information first before the bonus, we move the Salary information right after the EmployeePersonID.

alter function EmployeeMoolahs()
returns table
as
return
    select
        e.EmployeePersonID, e.Salary, e.Bonus
    from Employee e
go

Then you go back home, didn't even bother to check the output since it's a trivial fix anyway. On the 15th of the month, you receive your salary and you check your pay slip. You are so delighted that they make your bonus equal to your salary, then your jaw dropped, you don't receive any salary. Upon investigating the cause of that anomaly, you found out the erring query:

select top 5 * from EmployeeSummary() es order by es.EmployeePersonId




It yields a wrong output!


Whoa, holy guacamole! Why I don't have any salary? As if, the Bonus and Salary swapped contents. Yeah right, they really are. The presentation concern you tried to fix, by placing Salary right after EmployeePersonId, has an error. The salary is now slotted to bonus field, and the bonus is now slotted salary field.


Checking if your eyes is fooling you, you tried to explicitly select all the columns:

select top 5 es.FirstName, es.LastName, es.EmployeePersonId, es.Salary, es.Bonus from EmployeeSummary() es order by es.EmployeePersonId

But still, you still don't have any salary.




You also check the function you modified if it returns correct information:

select top 5 * from EmployeeMoolahs() em order by em.EmployeePersonId

However, it has correct output:




Then you are thinking what causes the error on EmployeeSummary function? What makes the query wild? Why it's not returning proper information? Hmm.. wild? Maybe it's the wild…card, there's something wild on the wildcard! Got to remove the wildcard on EmployeeSummary()

create function EmployeeSummary()
returns table
as
return
    select p.FirstName, p.LastName, em.*
    from Person p
    join EmployeeMoolahs() em on p.PersonID = em.EmployeePersonId
go

Change that to explicit form:

alter function EmployeeSummary()
returns table
as
return
    select p.FirstName, p.LastName, em.EmployeePersonId, em.Salary, em.Bonus
    from Person p
    join EmployeeMoolahs() em on p.PersonID = em.EmployeePersonId
go

Then you check the result of this query:

select top 5 * from EmployeeSummary() es order by es.EmployeePersonId

It's correct now!



Trying to check the fix if it is just a fluke, we swap again the order of Bonus and Salary of the source function to their original order:

alter function EmployeeMoolahs()
returns table
as
return
    select
        e.EmployeePersonID, e.Bonus, e.Salary
    from Employee e
go

Re-check again:

select top 5 * from EmployeeSummary() es order by es.EmployeePersonId

Below is the result, the columns' information are still correct! The order of fields won't change as we are selecting from EmployeeSummary which has explicitly selected columns. What is important, the information is always correct, the ordering of columns should be done on EmployeeSummary, not on EmployeeMoolahs.





Let's try to investigate the root cause of the error. If you got wildcard in the query of your function, SQL Server is trying to build a metadata for returned fields of that function, as if SQL Server burns a metadata for the returned fields of the query of your function, albeit invisibly.

create function EmployeeMoolahs()
returns table
-- burn these metadata...
(
    Bonus money,
    Salary money
)
-- ...burn
as
return
    select
        e.EmployeePersonID, e.Bonus, e.Salary
    from Employee e
go
 
create function EmployeeSummary()
returns table
-- burn metadata
(
    FirstName nvarchar(max),
    LastName nvarchar(max),
          -- burn these metadata based on the metadata obtained from EmployeeMoolah's wildcard
    EmployeePersonId int,   -- 1
    Bonus money,            -- 2
    Salary money            -- 3
)
-- ...burn
as
return
    select p.FirstName, p.LastName, em.*
     
          -- At the time this EmployeeSummary function is created, this is the order of fields on
         -- EmployeeMoolah()'s wildcard:
     
          -- 1. EmployeePersonId
          -- 2. Bonus
          -- 3. Salary
     
     
    from Person p
    join EmployeeMoolahs() em on p.PersonID = em.EmployeePersonId
go

In fact that is how other RDBMS does it, i.e. you need to explicitly return the metadata of your returned fields. Wildcard problem can be avoided by application developers using other RDBMS as they can see the wildcard could one day not be slotted properly to the correct column metadata, so they tend to be explicit on selecting each columns. They tend to avoid wildcards on functions. This kind of problem is virtually non-existent to them.


So in SQL Server, if we change the order of the fields on the source function (EmployeeMoolahs) of the dependent function(EmployeeSummary), e.g. swap the position of Bonus and Salary, Salary comes before Bonus on EmployeeMoolahs, the EmployeeSummary's metadata will not be updated accordingly, it will not be automatically updated. That is, EmployeeSummary metadata's position will still be the same, i.e. Bonus still comes before Salary. Hence when we issue a query on wildcard-bearing EmployeeSummary function, the wildcard will fall to the wrong slots.


EmployeeSummary() 's Metadata is not automatically updated     EmployeePersonId    Bonus    Salary
EmployeeMoolah()'s expanded wildcard from its swapped columns  EmployeePersonId    Salary   Bonus

So that's how the wildcard can cause problems on SQL Server. Trying to imagine what fields are good to slot on the salary :P


Think of the damages that can be caused by wrong information slotted by the wildcard. IsHired field goes to IsAdmin, OriginalPrice goes to CurrentPrice, PassingScore goes to Score, etc.


* It can happen, try to copy the problem above :P


Sample data:
create table Person
(
 PersonId int identity(1,1) primary key,
 FirstName nvarchar(max) not null,
 LastName nvarchar(max) not null,
);

create table Employee
(
 EmployeePersonId int not null primary key references Person(PersonId),
 Salary money not null,
 Bonus money not null default(0)
);

insert into Person(FirstName, LastName) values
('John', 'Lennon'),
('Paul', 'McCartney'),
('George', 'Harrison'),
('Ringo', 'Starr');


insert into Employee(EmployeePersonId, Salary, Bonus) values
(1, 60000, 20000),
(2, 60000, 0),
(3, 40000, 0),
(4, 40000, 0);




Happy Computing! ツ

Saturday, October 5, 2013

Variable-based Table-Valued Function vs Inline Table-Valued Function

Variable-Based Table-Valued Function, a.k.a. Multistatement Table-Valued Function are performance killers, for a good explanation, see the answer here: http://stackoverflow.com/questions/2554333/multi-statement-table-valued-function-vs-inline-table-valued-function

Here's a sample variable-based table-valued Function:

create function getSamplePersonOrgsViaVariableBasedTableValuedFunction()
returns @t table
(
    BusinessEntityID int primary key clustered,
    OrgUnits nvarchar(max)
)
as
begin
    insert into @t(BusinessEntityID, OrgUnits)
    select p.BusinessEntityID, eorg.orgUnits
    from Person.Person p
    cross apply
    (
        select 'blah'
    ) eorg(orgUnits);
     
    return;
     
end;
go


Here's a sample inline table-valued function:

create function getSamplePersonOrgsViaInlineTableValuedFunction() returns table
as
return
    select p.BusinessEntityID, eorg.orgUnits
    from Person.Person p
    cross apply
    (
        select 'blah'
    ) eorg(orgUnits)
go


For 50 invocations of this sample query on inline table-valued function, this took 1 second only

SELECT p.BusinessEntityID, p.FirstName, p.LastName, gsp.orgUnits
FROM Person.Person p
join getSamplePersonOrgsViaInlineTableValuedFunction() gsp on p.BusinessEntityID = gsp.BusinessEntityID
where p.BusinessEntityID < 9
go 50


Contrast that efficiency to variable-based table-valued function, variable-based table-valued function took 9 seconds:

SELECT p.BusinessEntityID, p.FirstName, p.LastName, gsp.orgUnits
FROM Person.Person p
join getSamplePersonOrgsViaVariableBasedTableValuedFunction() gsp on p.BusinessEntityID = gsp.BusinessEntityID
where p.BusinessEntityID < 9
go 50

Eager loading is the root cause of the performance problem for variable-based table-valued function, i.e. even we only need 8 rows on source table (p.BusinessEntityID < 9), when the source table is joined to variable-based table-valued function, the source table has to wait first for the result (19,972 rows) of the variable-based table-valued function before it is finally being joined to.


Inline table-valued function is very efficient and smart, it treats the function like a view, i.e., the query of the inline table-valued function is expanded to an actual tables when being joined to another table, behaves exactly like the joining of a table to a table-deriving query / view / CTE. Hence when the inline table-valued function is joined to a source table, when you fetch 8 rows only on source table (p.BusinessEntityID < 9), the query will also fetch 8 rows only on inline-table-valued function too.


On variable-based table-valued function, the result of the function is the one being expanded then put to another table bucket (variable table), hence causing performance problem, so when for example we have 8 rows then we join it to a variable-based table-valued function, we are joining the 8 rows to eagerly-loaded 19,972 rows.


To illustrate the efficiency of inline table-valued function, let's cause a divide by zero when the record encounters Person ID number 9.

alter function getSamplePersonOrgsViaInlineTableValuedFunction() returns table
as
return
    select p.BusinessEntityId, eorg.orgUnits
    from dbo.Person.Person p
    cross apply
    (
        select 'blah ' + convert(varchar,case when p.BusinessEntityId = 9 then 1 / 0 else 7 end)
    ) eorg(orgUnits)
go
 
alter function getSamplePersonOrgsViaVariableBasedTableValuedFunction()
returns @t table
(
    BusinessEntityId int primary key clustered,
    OrgUnits nvarchar(max)
)
as
begin
    insert into @t(BusinessEntityId, OrgUnits)
    select p.BusinessEntityId, eorg.orgUnits
    from dbo.Person.Person p
    cross apply
    (
        select 'blah ' + convert(varchar,case when p.BusinessEntityId = 9 then 1 / 0 else 7 end)
    ) eorg(orgUnits);
     
    return;
     
end;
go


This query will not cause divide-by-zero error on inline table-valued function, a proof that the function doesn't fetch organization units after BusinessEntityID number 8

SELECT p.BusinessEntityID, p.FirstName, p.LastName, gsp.orgUnits
FROM Person.Person p
join getSamplePersonOrgsViaInlineTableValuedFunction() gsp on p.BusinessEntityID = gsp.BusinessEntityID
where p.BusinessEntityID < 9


This query gets a divide-by-zero error though, a proof that the query fetch all the 19,972 rows of the variable-based table-valued function first before it is being joined to.

SELECT p.BusinessEntityID, p.FirstName, p.LastName, gsp.orgUnits
FROM Person.Person p
join getSamplePersonOrgsViaVariableBasedTableValuedFunction() gsp on p.BusinessEntityID = gsp.BusinessEntityID
where p.BusinessEntityID < 9




Happy Computing! ツ

Wednesday, October 2, 2013

Stored Procedure Name + F12 + Tada!

Repartee is something we think of twenty-four hours too late



Someone made an astute observation that the ap + tab + stored proc name + tab + tada (tada is not included, you have to utter it yourself :p) combo shortcut has a problem on scripting the content, i.e. it strips out the comments on the top of the code.



I should have replied that it's okay to remove those comments, those comments are just a pile of name(s) of the original developer(s) tacked on one after another, too tacky :p Those comments does not in any way helps a new dev on deciphering the intent of the code. Hmm.. but it has a use, it helps others on their witch-hunt on the culprit(s) of erring code heheh



But I would not give an answer that would strip others of their rights to maintain comments on top of the code, any preferences for that matter, cut the drama :P



Without further ado, this is the new shortcut:

exec + stored proc name + F12



RedGate will script back the whole contents of the stored procedure to a new query editor, i.e. RedGate will also include the comments on top of the stored proc.



Other shortcuts:

For view, type: ssf + tab + view name + F12

For function, type: ssf + tab + function name + backspace + F12



An observation, there's no need to put EXEC or SELECT * FROM, just type in the stored proc / view / function name directly, then press F12. Sometimes, RedGate is flaky though, type in EXEC or ssf anyway :-)





Happy Computing! ツ