Saturday, April 24, 2010

How to speed up ORDER BY COALESCE(col,9999999) in Sql Server?

No we won't, Sql Server 2008 doesn't support creating index on function operations yet.

So how can we speed up that expression? First of all, that expression is a common idiom and a common given answer when the query need to display null values last. Though some RDBMS already has a construct for putting nulls at the end of the sorted list, Sql Server 2008 still hasn't.





For reference, here's the construct for displaying nulls last on RDBMS that supports it, works on Postgres (with or without NULLS LAST, the database still uses the col's index)

SELECT * FROM tbl ORDER BY col NULLS LAST

Now let's do the same for Sql Server. Performance aside, how we will solve it without using sentinel/magic number? Use CASE WHEN:

SELECT * FROM tbl ORDER BY CASE WHEN col IS NULL THEN 1 ELSE 0 END, col

Well it is longer than sentinel approach, but one less magic numbers to remember(e.g. 9999999,ZZZZZZZ). Though it could be a lot simpler using other RDBMS which has boolean type:

SELECT * FROM tbl ORDER BY col IS NULL, col

That works on both Postgres and Mysql, true value sorts last, false first.

So if we have the following data...

create table xtemp(Col int)
INSERT INTO xtemp VALUES(1)
INSERT INTO xtemp VALUES(5)
INSERT INTO xtemp VALUES(4)
INSERT INTO xtemp VALUES(NULL)
INSERT INTO xtemp VALUES(3)
INSERT INTO xtemp VALUES(6)
INSERT INTO xtemp VALUES(NULL)
INSERT INTO xtemp VALUES(9999999)
INSERT INTO xtemp VALUES(2)

...the Sql Server ORDER BY CASE WHEN construct, sorts like this under the hood:

Sort Col
0    1
0    2
0    3
0    4
0    5
0    6
0    9999999
1    NULL
1    NULL

Now, let's go back to our performance goal on Sql Server. First, let's check if there's an ROI for using an extra field for sorting our list:

create table ytemp(Col int, IsColNull int)
INSERT INTO ytemp VALUES(1,0)
INSERT INTO ytemp VALUES(5,0)
INSERT INTO ytemp VALUES(4,0)
INSERT INTO ytemp VALUES(NULL,1)
INSERT INTO ytemp VALUES(3,0)
INSERT INTO ytemp VALUES(6,0)
INSERT INTO ytemp VALUES(NULL,1)
INSERT INTO ytemp VALUES(9999999,0)
INSERT INTO ytemp VALUES(2,0)

Then let's create index on both xtemp and ytemp
create index ix_xtemp on xtemp(Col)
create index ix_ytemp on ytemp(IsColNull, Col)

Now let's compare their performance relative to each other. Highlight these two statements, then press Ctrl+L...

select * from ytemp order by IsColNull, col
select * from xtemp order by case when col is null then 1 else 0 end, col


...here's the output:




As we can see, the first one is faster than the CASE WHEN approach

We can conclude that an extra field is a sure win in terms of sorting performance, so let's automate that extra column(IsColNull) so it's not a chore to populate them.

First, let's drop the existing column IsColNull:

drop index ix_ytemp on ytemp;
alter table ytemp drop column IsColNull;

Then re-add the column, but this time we specify it as computed/formula column:

ALTER TABLE dbo.ytemp ADD IsColNull  AS 
(case when col is null then 1 else 0 end) PERSISTED


Don't forget the index:
create index ix_ytemp on ytemp(iscolnull, col) 

Now let's recheck the execution plan of these two statements...

select * from ytemp order by IsColNull, col
select * from xtemp order by case when col is null then 1 else 0 end, col

...here's the output:




As we can see, they have the same performance as our first test. Well it should be, the column IsColNull is a real column on our table, and it is automatically assigned the formula's result

Now if you are curious how the COALESCE fares against the extra field approach, here is the execution plan:



The extra field approach is faster than coalesce approach. Until Sql Server supported NULLS LAST, we can use the extra field approach to speed up our query that needs to put nulls last.

Final note, we can use the extra field approach on sorting that we need to override, let's say we wanted to put US, UK on the top of our dropdown list, just do this in computed column:

ALTER TABLE dbo.ytemp ADD
 CustomSort AS 
(case 
when country = 'US' then 1
when country = 'UK' then 2
when country is NULL then 4 
else 3 
end) PERSISTED

That will put US on top of list, then UK, then other countries, then other blank(null) countries last

Related: http://www.ienablemuch.com/2010/12/postgresql-speeding-up-slow-coalesce.html

No comments:

Post a Comment