How to test if two NULLS are equal in SQL Server, ala-C#. Before our dear DBAs waxes philosophical that NULL = NULL is unknown, NULL <> NULL is also unknown, we must note that comparing things that results to knowns has uses, say for example we want to make audit logger in trigger code, that trigger code must be optimized. We will have database performance problem if we don't detect the field when logging changes, as this will do unconditional logging, whether the pertinent field(s) changes or not. We will also have problem in our logging logic if we cannot properly detect if there is a change that have occured in nullable field.
As typified by this:
IF new.sub_account_of <> old.sub_account_of THEN
// log changes here
ELSE
// do nothing
END
That will not work, if one of them has null. We must use this(doesn't work on Sql Server, works on Postgres, DB2):
IF new.sub_account_of IS DISTINCT FROM old.sub_account_of THEN
// log changes here
ELSE
// do nothing
END
Truth table for C#-like null comparison:
A == B
NULL == NULL : True
NULL == 1 : False
NULL == 2 : False
1 == NULL : False
1 == 1 : True
1 == 2 : False
2 == NULL : False
2 == 1 : False
2 == 2 : True
A != B
NULL != NULL : False
NULL != 1 : True
NULL != 2 : True
1 != NULL : True
1 != 1 : False
1 != 2 : True
2 != NULL : True
2 != 1 : True
2 != 2 : False
For the code above, expand the following:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace NullComparison
{
class Program
{
struct Rec
{
public int? a, b;
public Rec(int? a, int? b)
{
this.a = a;
this.b = b;
}
}
static void Main(string[] args)
{
Console.WriteLine("{0,4} == {1,4}", "A", "B");
foreach (Rec r in Values())
{
Console.WriteLine("{0,4} == {1,4} : {2}",
r.a != null ? r.a.ToString() : "NULL",
r.b != null ? r.b.ToString() : "NULL",
r.a == r.b);
}
Console.WriteLine();
Console.WriteLine("{0,4} != {1,4}", "A", "B");
foreach (Rec r in Values())
{
Console.WriteLine("{0,4} != {1,4} : {2}",
r.a != null ? r.a.ToString() : "NULL",
r.b != null ? r.b.ToString() : "NULL", r.a != r.b);
}
Console.ReadLine();
}
static IEnumerable<Rec> Values()
{
yield return new Rec(null, null);
yield return new Rec(null, 1);
yield return new Rec(null, 2);
yield return new Rec(1, null);
yield return new Rec(1, 1);
yield return new Rec(1, 2);
yield return new Rec(2, null);
yield return new Rec(2, 1);
yield return new Rec(2, 2);
}
}//class Program
}//namespace NullComparison
If you want to go directly to Sql Server logic,
click this, otherwise continue reading...
Now let's do it first in
PostgreSQL to verify the logic of our code, since it has first class support for C#-like nullable values comparison behavior. For equality, it uses
IS NOT DISTINCT FROM, for inequality it uses
IS DISTINCT FROM, pretty intuitive isn't it? If I remember correctly, MySQL uses
<=> for its IS DISTINCT FROM, I don't recall if there's an IS NOT DISTINCT FROM operator in MySQL.
To create the table(also works in SQL Server 2008):
select a,b into x_m
from
(
values
(null,null),
(null,1),
(null,2),
(1,null),
(1,1),
(1,2),
(2,null),
(2,1),
(2,2)
)as x(a,b)
Without further ado, this is the logic of equality:
coalesce(a = b or (a is null and b is null), false) as manual_comp
Why coalesce is needed there? It's when either of a or b is null, the equality comparison will result to null. So we need to translate null to false, for that we will use coalesce.
To test:
select a,b
,coalesce(a = b or (a is null and b is null), false) as manual_comp
,a is not distinct from b as built_in_comp
from x_m
Outputs:
a | b | manual_comp | built_in_comp
---+---+-------------+---------------
| | t | t
| 1 | f | f
| 2 | f | f
1 | | f | f
1 | 1 | t | t
1 | 2 | f | f
2 | | f | f
2 | 1 | f | f
2 | 2 | t | t
(9 rows)
Now how to check if b is different from a? use IS DISTINCT FROM, now how do we do it in manual SQL? One approach is we just put NOT on the whole expression, another one is we rewrite the query:
select a,b
,coalesce(a <> b and (a is not null or b is not null), true) as manual_comp
,a is distinct from b as built_in_comp
from x_m
Outputs:
a | b | manual_comp | built_in_comp
---+---+-------------+---------------
| | f | f
| 1 | t | t
| 2 | t | t
1 | | t | t
1 | 1 | f | f
1 | 2 | t | t
2 | | t | t
2 | 1 | t | t
2 | 2 | f | f
(9 rows)
As we can see,
re-wiring rewriting the query seems a bit tedious to read:
coalesce(a <> b and (a is not null or b is not null), true) as manual_comp
Programmers also tend to re-read their logic if it involves multiple NOT EQUAL comparison. I cannot just help myself to skim-read that query. Maybe one of those NOT EQUAL comparisons could launch a missile!
Let's find one example how using only one NOT happens in real life. If an interviewer asked the interviewee about her favorite singer, and the answer is neither John, Paul, George nor Ringo, then the interviewer should go find another interviewee.
Now how most programmers(not all programmers) translate that to program?
IF ans != 'John' AND ans != 'Paul' AND ans != 'George' AND ans != 'Ringo' THEN
FindAnotherInterviewee
END
See the pattern there? You never read written instructions like this in real life "Please interview another person if the interviewee's answer is not john and not paul and not george and not ringo." In real life, you just use one NOT. The code above doesn't mimic the original English sentence anymore, it's hard to read that logic, it compels the programmer to re-read the logic, multiple NOT EQUALs promotes doubtful assertion, perhaps you've seen others prefer this (translation of
if answer is neither Y, N nor C):
IF ans != 'Y' AND ans != 'N' AND ans != 'C' THEN ask again END
Over this readable one:
IF NOT(ans = 'Y' OR ans = 'N' or ans = 'C') THEN ask again END
So I shun multiple NOT EQUALs in logic, they make the program harder to read. So the English sentence of neither John,Paul,George nor Ringo, should be written like this:
IF NOT(ans = 'John' OR ans = 'Paul' OR ans = 'George' OR ans = 'Ringo') THEN
FindAnotherInterviewee
END
That directly mimics the english sentence with only one NOT. Actually the word
nor from actual English sentence is a signal in itself that one should not use
AND, should use
OR instead when translating the English sentence to code.
So let's stick with readable programs.
To continue, for inequality comparisons, we just stick NOT in front of the equality expression.
select a,b
,not coalesce(a = b or (a is null and b is null), false) as manual_comp
,a is distinct from b as built_in_comp
from x_m
Here's the output:
a | b | manual_comp | built_in_comp
---+---+-------------+---------------
| | f | f
| 1 | t | t
| 2 | t | t
1 | | t | t
1 | 1 | f | f
1 | 2 | t | t
2 | | t | t
2 | 1 | t | t
2 | 2 | f | f
(9 rows)
Now let's go back to the main gist of this article, since up to this time of writing, Microsoft SQL Server 2008 still doesn't support
IS DISTINCT FROM and
IS NOT DISTINCT FROM operators. We must find our way to support those conditions.
So how to translate those IS DISTINCT FROM and IS NOT DISTINCT FROM to Sql Server?
Postgresql version:
select a,b
,coalesce(a = b or (a is null and b is null), false) as manual_comp
,a is not distinct from b as built_in_comp
from x_m
To translate to Sql Server-consumable one, use CASE WHEN:
select a,b,
case
when a = b then 1
when a is null and b is null then 1
else 0
end as mssql_manual_comp
from x_m
Outputs:
a b mssql_manual_comp
----------- ----------- -----------------
NULL NULL 1
NULL 1 0
NULL 2 0
1 NULL 0
1 1 1
1 2 0
2 NULL 0
2 1 0
2 2 1
(9 row(s) affected)
Working properly, now how to check if
a IS DISTINCT FROM b? Since Sql Server don't have first-class support for booleans, we cannot just put NOT on front of CASE statement. This will result to incorrect syntax:
select a,b,
NOT
case
when a = b then 1
when a is null and b is null then 1
else 0
end as mssql_manual_comp
from x_m
So let's get some help from XOR operator to toggle the logic:
select a,b,
case
when a = b then 1
when a is null and b is null then 1
else 0
end ^ 1 as mssql_manual_comp
from x_m
So what's the output for our inequality comparison?
a b mssql_manual_comp
----------- ----------- -----------------
NULL NULL 0
NULL 1 1
NULL 2 1
1 NULL 1
1 1 0
1 2 1
2 NULL 1
2 1 1
2 2 0
(9 row(s) affected)
So our logic is correct, so now let's give some
IS DISTINCT FROM love to Sql Server, which if I may add can already be done on
other major RDBMS
To recap, how to do
IS NOT DISTINCT FROM in Sql Server?
select a,b
from x_m
where a IS NOT DISTINCT FROM b
Here's the equivalent in Sql Server:
select a,b
from x_m
where
(case
when a = b then 1
when a is null and b is null then 1
else 0
end) = 1
Outputs:
a b
----------- -----------
NULL NULL
1 1
2 2
(3 row(s) affected)
And how to do
IS DISTINCT FROM in Sql Server?
select a,b
from x_m
where a IS DISTINCT FROM b
Here's the equivalent in Sql Server:
select a,b
from x_m
where
(case
when a = b then 1
when a is null and b is null then 1
else 0
end) = 0
Outputs:
a b
----------- -----------
NULL 1
NULL 2
1 NULL
1 2
2 NULL
2 1
(6 row(s) affected)
Please do note that we don't need to use XOR in our CASE expression in WHERE clause, we can just directly compare the CASE expression to 1 or 0. 1 denotes equality, 0 denotes inequality.
Regarding DeMorgan logic simplifier and why multiple != (aka <>) should be suspected of code smell
http://stackoverflow.com/questions/4615113/how-can-the-and-and-or-key-word-used-together-in-a-select-query/4615182#4615182
UPDATE 2011-08-08
Here's another approach for IS NOT DISTINCT FROM:
select a,b
from x_m
where
(
a = b
or
(a is null and b is null)
)
For IS DISTINCT FROM:
select a,b
from x_m
where
a <> b
or
(
not (a is null and b is null )
and ( a is null or b is null )
)
UPDATE 2014-04-08
Here's another approach for IS DISTINCT FROM:
http://sqlfiddle.com/#!15/d41d8/1731
with x as
(
select a, b
from
(
values
(null,null),
(null,1),
(null,2),
(1,null),
(1,1),
(1,2),
(2,null),
(2,1),
(2,2)
)as x(a,b)
)
select *,
x.a IS DISTINCT FROM x.b as perfect,
x.a <> x.b as flawed,
a <> b
or
(
not (a is null and b is null )
and ( a is null or b is null )
)
as good_workaround,
(x.a <> x.b or x.a is null or x.b is null)
and not (x.a is null and x.b is null)
as perfect_workaround
from x;
Output:
| A | B | PERFECT | FLAWED | GOOD_WORKAROUND | PERFECT_WORKAROUND |
|--------|--------|---------|--------|-----------------|--------------------|
| (null) | (null) | 0 | (null) | (null) | 0 |
| (null) | 1 | 1 | (null) | 1 | 1 |
| (null) | 2 | 1 | (null) | 1 | 1 |
| 1 | (null) | 1 | (null) | 1 | 1 |
| 1 | 1 | 0 | 0 | 0 | 0 |
| 1 | 2 | 1 | 1 | 1 | 1 |
| 2 | (null) | 1 | (null) | 1 | 1 |
| 2 | 1 | 1 | 1 | 1 | 1 |
| 2 | 2 | 0 | 0 | 0 | 0 |