Thursday, December 16, 2010

Postgresql Speeding Up Coalesce

Using this:

create table x as 
select * from generate_series(1,1000) as x(n);

insert into x values(null);

If we perform a query such as this one:

select * from x order by coalesce(n,0)

We will receive Seq Scan query plan:
QUERY PLAN                         
-----------------------------------------------------------
 Sort  (cost=64.90..67.40 rows=1001 width=4)
   Sort Key: (COALESCE(n, 0))
   ->  Seq Scan on x  (cost=0.00..15.01 rows=1001 width=4)
(3 rows)

To optimize that coalesce, we should make an index on coalesce expression:
create index ix_x on x(coalesce(n,0));

And that will produce Index Scan query plan:
QUERY PLAN                            
------------------------------------------------------------------
 Index Scan using ix_x on x  (cost=0.00..55.27 rows=1001 width=4)
(1 row)

But the problem with using sentinel approach on that kind of coalesce, if the sentinel changes, your database won't use the index you've created, so this...

select * from x order by coalesce(n,-1);

...produces Seq Scan query plan again:

QUERY PLAN                         
-----------------------------------------------------------
 Sort  (cost=64.90..67.40 rows=1001 width=4)
   Sort Key: (COALESCE(n, (-1)))
   ->  Seq Scan on x  (cost=0.00..15.01 rows=1001 width=4)
(3 rows)

On that simple example, since Postgresql has a first class support for sorting nulls first, we could just index on that expression directly, let's check first what's the execution plan of the following query without the index...

select * from x order by n nulls first;


...that produces:


QUERY PLAN                         
-----------------------------------------------------------
 Sort  (cost=64.90..67.40 rows=1001 width=4)
   Sort Key: n
   ->  Seq Scan on x  (cost=0.00..15.01 rows=1001 width=4)
(3 rows)

Now let's create an index on x's n...

create index ix_x_y on x(n nulls first);

...then the last query (select * from x order by n nulls first;), will produce:
QUERY PLAN                             
--------------------------------------------------------------------
 Index Scan using ix_x_y on x  (cost=0.00..43.27 rows=1001 width=4)
(1 row)

That's more neat than using COALESCE for sorting nulls

Related: http://www.ienablemuch.com/2010/04/how-to-speed-up-order-by.html


No comments:

Post a Comment