Friday, January 14, 2011

Postgresql LAG windowing function

Given this:

create table t
(
ts timestamp not null,
code int not null
);

insert into t values
('2011-01-13 10:00:00', 5),
('2011-01-13 10:10:00', 5),
('2011-01-13 10:20:00', 5),
('2011-01-13 10:30:00', 5),
('2011-01-13 10:40:00', 0),
('2011-01-13 10:50:00', 1),
('2011-01-13 11:00:00', 1),
('2011-01-13 11:10:00', 1),
('2011-01-13 11:20:00', 0),
('2011-01-13 11:30:00', 5),
('2011-01-13 11:40:00', 5),
('2011-01-13 11:50:00', 3),
('2011-01-13 12:00:00', 3),
('2011-01-13 12:10:00', 3);

An stackoverflow user asked:
How can I select the first date of each set of identical numbers, so I end up with this:
2011-01-13 10:00:00, 5
2011-01-13 10:40:00, 0
2011-01-13 10:50:00, 1
2011-01-13 11:20:00, 0
2011-01-13 11:30:00, 5
2011-01-13 11:50:00, 3

I answered:

with sequencer as 
(
SELECT ROW_NUMBER() OVER(ORDER BY ts) seq, ts, Code
FROM T
)
select a.ts, a.Code
from sequencer a 
left join sequencer b on b.seq + 1 = a.seq 
where b.code IS DISTINCT FROM a.code;

But I'm not giving the problem a first-class treatment it deserves, I noticed somebody gave an answer that uses lag function, albeit he/she uses where prevcode <> code or prevcode is null

with r as
(
select ts, code, lag(code,1,null) over(order by ts) as prevcode
from t
)
select ts, code 
from r
where prevcode is distinct from code

Hmm.. :-) that code is very elegant, sleek and performant

http://www.postgresql.org/docs/8.4/static/functions-window.html

No comments:

Post a Comment