Saturday, December 11, 2010

Exceptions to Rule #1 and Rule #2. One-to-one relationship on both ends of two tables

According to Rule-1-Rule-2, the following query falls under rule #2 (primary key on first table appears on second table's join condition). This means the second table is many-to-one to first table, there are many countries to a person:

SELECT Country.CountryName, Leader.PersonName AS Leader
FROM Person Leader
JOIN Country ON Country.CurrentLeaderID = Leader.PersonID


How this applies in real world?

Let's take Hongkong as an example, imagine that the records of Hongkongers are not yet updated to China's id; so one cannot delete HK from Country table, so it means that HK still has an entry on Country table, so what one can do is just update HK's leader to China's leader. So there goes the case of two (or more) countries that belong to same leader only.

But what if we really want to enforce these conditions: (1) one leader to one country only (2) one country to one leader only. How can we do those?

First, though at first glance, we can put a LeadingCountryID to Person table so it would satisfy the second condition, this would however introduce a problem, a country can be governed by two leaders with that solution. And the Person table will be filled with nulls on those who are not leaders, that's a waste of I/O; and at most, only one row with null column is allowed on unique nullable column on Sql Server 2005 and below; there's a work-around on SQL Server 2008 though, you create unique index out of filtered-out nulls. Another solution is we put unique constraints on Country's CurrentLeaderID and reverse the query:

SELECT Country.CountryName, Person.PersonName 
FROM Country
JOIN Person Leader ON Leader.PersonID = Country.CurrentLeaderID

Hmm.. it seems we are able to achive our goal of stating the intent on code. It beautifully fall under rule #1.

However, that would have problems on entities with circular references. Take an example: a Person is born in a Country, a Country has a leader(Person), see this article: http://www.ienablemuch.com/2010/12/do-not-make-circular-references-on.html

If you can be sure that your entities won't have a circular references on the future, you can stop here.



.
.
.
.
.
.
.


Otherwise, the only neat solution is to remove the CurrentLeaderID from the Country table and make an association table:

CREATE TABLE CountryCurrentLeader
(
  dummy serial not null primary key,
  CountryID int NOT NULL unique key references Country(CountryID),
  CurrentLeaderID int NOT NULL unique references CurrentLeaderID(PersonID)
)


So does our design automatically fall to rule #1 if we use this query?

SELECT CountryCurrentLeader.CountryID, Person.PersonName 
FROM Person
JOIN CountryCurrentLeader 
ON CountryCurrentLeader.CurrentLeaderID = Person.PersonID



Apparently it isn't, a primary key on first table (though not selected on SELECT clause) does appear on second table's join condition, so that's rule #2, it means the second table is many-to-one on first table.



So how can we at least make it appear in query that the new design(association table) conforms to a right rule(one-to-one on both ends), rule #1?

To achieve rule #1, we simply arrange the query to this form...

SELECT CountryCurrentLeader.CountryID, Person.PersonName 
FROM CountryCurrentLeader
JOIN Person ON Person.PersonID = CountryCurrentLeader.CurrentLeaderID 


...so by using an association table, there won't be a primary key from first table that will appear on second table's join condition. Conforms to rule #1, FTW!


Let's extend our query further if our two rules will still hold water on complex queries:

SELECT CountryName.CountryName AS Country, Person.PersonName As LeaderName, LeaderBirthPlace.CountryName AS LeaderBirthPlace 
FROM CountryCurrentLeader -- 1
JOIN Person ON Person.PersonID = CountryCurrentLeader.CurrentLeaderID -- 2
JOIN Country CountryName ON CountryName.CountryID = CountryCurrentLeader.CountryID -- 3
JOIN Country LeaderBirthPlace ON LeaderBirthPlace.CountryID = Person.BirthCountryID -- 4

Let's dissect our query:

J1) Line 1 to Line 2: Rule #1 one-to-one. OK
J2) Line 1 to Line 3: Rule #1 one-to-one. OK
J3) Line 2 to Line 4: Rule #1 one-to-one. OK

So all is well, J1 and J2 safely conforms to Rule #1 as long we use association table with dummy primary key. J3 conventionally fall under Rule #1


There's no fullproof way to deduce the intent of a query based on just two rules. But we can at least follow the existing simplest rule (it's the simplest, one have to remember two rules only) by re-arranging the query and/or redesigning the database design.

No comments:

Post a Comment