Thursday, April 8, 2010

Find two particular fish in same tank

How to query which container contains two particular items?



First, let's create test data

create table tank as 
select * from (values('01',7),('02',8),('03',9),('04',10),('05',11)) as x(tank_num, volume);

create table tank_fish as 
select * from 
(values
('01','Black Moor'),
('01','Bubble Eye'),
('01','Comet'),
('02','Bubble Eye'),
('02','Lion Head'),
('02','Pompom'),
('02','Ryukin'),
('03','Comet'),
('03','Lion Head'),
('03','Ranchu'),
('04','Betta')) as x(tank_num, fish);


test=# select * from tank order by tank_num;
 tank_num | volume
----------+--------
 01       |      7
 02       |      8
 03       |      9
 04       |     10
 05       |     11
(5 rows)


test=# select * from tank_fish order by tank_num, fish;
 tank_num |    fish
----------+------------
 01       | Black Moor
 01       | Bubble Eye
 01       | Comet
 02       | Bubble Eye
 02       | Lion Head
 02       | Pompom
 02       | Ryukin
 03       | Comet
 03       | Lion Head
 03       | Ranchu
 04       | Betta
(11 rows)


Then let's use some nifty function to facilitate reporting rows in comma-delimited format.

CREATE AGGREGATE array_accum (
    sfunc = array_append,
    basetype = anyelement,
    stype = anyarray,
    initcond = '{}'
);


Then let's count the fish in each tank
select t.tank_num, array_to_string(array_accum(f.fish), ', ') as fishes, count(*)
from tank t
left join tank_fish f using(tank_num)
group by t.tank_num
order by t.tank_num

Output:
 tank_num |                fishes                 | count
----------+---------------------------------------+-------
 01       | Black Moor, Bubble Eye, Comet         |     3
 02       | Bubble Eye, Lion Head, Pompom, Ryukin |     4
 03       | Comet, Lion Head, Ranchu              |     3
 04       | Betta                                 |     1
 05       |                                       |     1
(5 rows)

As we can see there's something wrong in output, the tank# 5 indicates one fish, while there is none.

So make it a practice to use COUNT(field_name_here) instead of COUNT(*)

Amend the query:

select t.tank_num, array_to_string(array_accum(f.fish), ', ') as fishes, count(f.fish)
from tank t
left join tank_fish f using(tank_num)
group by t.tank_num
order by t.tank_num

Output:
 tank_num |                fishes                 | count
----------+---------------------------------------+-------
 01       | Black Moor, Bubble Eye, Comet         |     3
 02       | Bubble Eye, Lion Head, Pompom, Ryukin |     4
 03       | Comet, Lion Head, Ranchu              |     3
 04       | Betta                                 |     1
 05       |                                       |     0
(5 rows)

Now let's find the tank which has Comet and Ranchu in it:

select t.tank_num, array_to_string(array_accum(f.fish), ', ') as fishes, count(f.fish in ('Comet', 'Ranchu'))
from tank t
left join tank_fish f using(tank_num)
group by t.tank_num
order by t.tank_num

Output:
 tank_num |                fishes                 | count
----------+---------------------------------------+-------
 01       | Black Moor, Bubble Eye, Comet         |     3
 02       | Bubble Eye, Lion Head, Pompom, Ryukin |     4
 03       | Comet, Lion Head, Ranchu              |     3
 04       | Betta                                 |     1
 05       |                                       |     0
(5 rows)

Oops.. there's something wrong with the query, it should only show tank# 1 with count of 1, and tank# 3 with count of 2.

COUNT also counts false, it only exclude NULL.


So this query:
select count(null),count(0),count(1),count(2),count(false),count(true)

Outputs:
 count | count | count | count | count | count
-------+-------+-------+-------+-------+-------
     0 |     1 |     1 |     1 |     1 |     1
(1 row)

Here's how the GROUP BY query above looks like when expanded
select t.tank_num, f.fish, f.fish in ('Comet', 'Ranchu')
from tank t
left join tank_fish f using(tank_num)
order by t.tank_num, f.fish

Here's the output:
 tank_num |    fish    | ?column?
----------+------------+----------
 01       | Black Moor | f
 01       | Bubble Eye | f
 01       | Comet      | t
 02       | Bubble Eye | f
 02       | Lion Head  | f
 02       | Pompom     | f
 02       | Ryukin     | f
 03       | Comet      | t
 03       | Lion Head  | f
 03       | Ranchu     | t
 04       | Betta      | f
 05       |            |
(12 rows)

By the way, f.fish in ('Comet','Ranchu') is equivalent to: f.fish = 'Comet' or f.fish = 'Ranchu'.


False is also countable, like for example there's a survey "Do you think lochness monster is real?" some will answer True, some will answer False, some are undecided (NULL). Say the answers are False, True, True, Null, False, True, Null, True. If you want to count how many has an answer to the question, it should only yield 6, not 8, COUNT(Interviewed.Answer) excludes the undecided, don't do COUNT(*) nor COUNT(Interviewed.*). But if you want to count how many have answered, count the person, COUNT(Interviewed.PersonName).

And why 0 is also countable, well number 0 could be a primary key, Steve Jobs employee ID is numbered 0. So if you want to count how many employees in Apple has been issued an ID, you should do COUNT(Employee.EmployeeID). COUNT(Employee.EmployeeID) instead of COUNT(*), also takes care of not counting employees which don't have an ID yet.


So how to exclude those false? Just use NULLIF function. NULLIF converts the first parameter to null if its value is same as parameter 2. Typical uses are NULLIF(field_or_expression_here, 0),
NULLIF(field_or_expression_here, false)



select t.tank_num, array_to_string(array_accum(f.fish), ', ') as fishes, 
   count(nullif( f.fish in ('Comet', 'Ranchu'), false ))
from tank t
left join tank_fish f using(tank_num)
group by t.tank_num
order by t.tank_num

Output:
 tank_num |                fishes                 | count
----------+---------------------------------------+-------
 01       | Black Moor, Bubble Eye, Comet         |     1
 02       | Bubble Eye, Lion Head, Pompom, Ryukin |     0
 03       | Comet, Lion Head, Ranchu              |     2
 04       | Betta                                 |     0
 05       |                                       |     0
(5 rows)

Now the output is correct, so finding the tank which has Comet and Ranchu fish in it, should be as simple matter as moving the count statement from SELECT to GROUP BY's HAVING clause:

select t.tank_num, array_to_string(array_accum(f.fish), ', ') as fishes   
from tank t
left join tank_fish f using(tank_num)
group by t.tank_num
having count(nullif( f.fish in ('Comet', 'Ranchu'), false )) >= 2
order by t.tank_num

Output:
 tank_num |          fishes
----------+--------------------------
 03       | Comet, Lion Head, Ranchu
(1 row)

If you want to take advantage of the fact that booleans can be casted to integer in Postgresql, cast the boolean to integer, then just summarize results using SUM function, SUM accepts numbers only.

Query
select t.tank_num, array_to_string(array_accum(f.fish), ', ') as fishes   
from tank t
left join tank_fish f using(tank_num)
group by t.tank_num
having sum(f.fish in ('Comet', 'Ranchu') ::int) >= 2
order by t.tank_num

No comments:

Post a Comment