class Programmer : IEnableMuch {: One Trick Pony

Showing posts with label One Trick Pony. Show all posts

Saturday, December 11, 2010

Plural table names are one trick pony

English plurals are not ORM-friendly.

Is that a proof by blatant assertion? It is if you can assert otherwise ;-)

Case in point, Chinese have two form of plurals, one for person, another for non-person, you use "men" suffix for person, "henduo" prefix for non-person. Filipinos plural form for everything(person or non-person) is facilitated by using "mga" prefix

Chinese-way:

SELECT * FROM EmployeeMen

SELECT * FROM ChildMen

SELECT * FROM HenduoFruit

SELECT * FROM HenduoMouse

Filipino-way:

SELECT * FROM MgaEmployee

SELECT * FROM MgaChild

SELECT * FROM MgaFruit

SELECT * FROM MgaMouse

English-way:

SELECT * FROM Employees

SELECT * FROM Children

SELECT * FROM Fruits

SELECT * FROM Mice

See? Mice and Children are not directly mappable to ORMs. And any decent ORM without an AI of sort, will not be able to infer that Mice's singular form is Mouse. In Filipino and Chinese languages, pluralizers are consistent, more amenable to ORMs

Let's say you are in a radical programming environment and the management forbids you from using mnemonic aliasies...

SELECT E.Name AS EmployeeName, C.Name as ChildName
FROM Employees AS E
JOIN Children AS C ON C.ParentId = E.ID

...you will find AS keyword a glorified singularizer for table names:

SELECT Employee.Name AS EmployeeName, Child.Name AS ChildName 
FROM Employees AS Employee
JOIN Children AS Child ON Child.ParentId = Employee.ID

And let's say you are in a very very extremely radical programming environment, they won't allow you to use aliases...

SELECT Employees.Name AS EmployeeName, Children.Name AS ChildName -- ugh, those look odd
FROM Employees 
JOIN Children ON Children.ParentId = Employee.ID

..., your queries will look odd.

BTW, we programmers are good in switching mental model, we don't really read this...

box.setX( box.getX() + 1);

...as set the X based on its old X with 1.

we read that as:

box.X = box.X + 1;
box.X += 1;
box.X++;

And the real best programmers are the ones that are trying to infer the intent of code given the lack of comments or proper method name, they don't go through the motions, they infer intents, they read the code above as: Move the box to the right

The best programmers are the ones with mental models that makes them able to abstract a concept very well, with or without the first-class constructs on syntaxes.

So the best programmer, would really read this...

SELECT * FROM Computer 

SELECT * FROM Toy

...as:

SELECT * FROM Computer(s)

SELECT * FROM Toy(s)

And since I'm a Filipino programmer, prefixing words to entity to make them plural easily gets old...

SELECT * FROM MgaEmployee

SELECT * FROM MgaChild

SELECT * FROM MgaFruit

SELECT * FROM MgaMouse

...and almost akin to incessant prefixing of tbl to table names:

SELECT * FROM tblEmployee

SELECT * FROM tblChild

SELECT * FROM tblFruit

SELECT * FROM tblMouse

Naming tables in plural forms are one of the design patterns that is like a novelty, aside from they easily wear off, they are just a one-trick pony. Plural names only work well if you will query one table only; reading SELECT Lastname, Firstname, Birthdate FROM Employees gives one a warm-and-fuzzy feeling; but the norm is, we join tables, and aliasing table names are warranted, especially to plural-advocates. To plural-advocates, wanna bet you would give your tables aliases rather than going through the motion when joining tables? i.e. This feels icky to you despite your plural-form-fervor for naming tables...

SELECT Persons.Name AS PersonName, Countries.CountryCode 
FROM Persons JOIN Countries ON Countries.CountryID = Persons.BirthCountryID

... Why that feels icky to you? hint: because to any self-respecting programmers, any selected columns with table qualifier would look odd if the table qualifier is of plural form.

With plural form table names, you can't seamlessly plug-and-play them to other components(think ORM), can't(or won't) leave them alias-less when querying two or more tables(looks funneh):

SELECT Children.Name AS ChildName, Children.Age, Fathers.Name AS FatherName 
FROM Children 
JOIN Fathers ON Fathers.ID = Children.FatherID)

So next time you encounter advocates of singular names, don't argue with them, they are mostly right.

And if you don't want to flunk MCSD 70-300 exam, use singular names

In terms of ORM-friendly plural naming conventions, here's their rankings:

0. Singular (hey, you said ORM-friendly plural naming conventions! how this manage to sneak in here?)

1. Filipino plurals

2. Chinese plurals (words are immutable, you just prefix "henduo" or suffix "men")

3. English plurals (that's why this article exist ;-) ) Aside from suffix "s", you have irregular and mutable plural forms (child -> children, mouse -> mice, goose -> geese, moose -> moose( not mutable, i just want to expose the inconcistencies of the language ;-) ) )

So keep in mind folks, plural table names are one trick pony only, the only instance it could give you a warm and fuzzy feeling is when you are getting data from one table only.

SELECT Name, FavoriteLanguage, FavoritePizzaBrand FROM Programmers

When joining tables with plural names, pleasant-wise, all bets are off

English plural 101:
http://en.wikipedia.org/wiki/English_plural

[UPDATE March 11 2011]

My desire to name a DbSet on Entity Framework was not honored, EF tried to look for People table for my public DbSet Persons, need to override things on OnModelCreating :-)

Wednesday, April 28, 2010

Debunking the myth that COUNT('Dracula') is faster than COUNT(*)

Ok, that was an attempt at humor ;-)

How to debunk the myth that COUNT(1) is faster than COUNT(*) ?

The most simple mistaken assumption is that the asterisk sign makes the RDBMS read all the columns. Fact is, asterisk just represent the cardinality(an unfortunate name, very computer-sciencey-sounding) of the set

Thinking that COUNT(1) or COUNT(anyNonNullConstantHere) is faster than COUNT(*), is the most common programming-by-superstition done by many. First point, if this language construct had made it into SQL standard...

SELECT COUNT(cat)
FROM cats

...we will not need COUNT(*) (and COUNT(1) for that matter) in SQL language, but as RDBMS don't have an AI to infer that the entity(cat) you want to count is the singular form of the cats table, this didn't make it into the final ANSI SQL spec

Second point, a compromise, if this language construct permits counting the table name directly...

SELECT users.user_id, COUNT(answers)
FROM users
JOIN answers ON answers.user_id = users.user_id
GROUP BY users.user_id

...we will not need COUNT(*) in SQL languages, it's very natural to read. But only Postgres supports this principle. But that will still fail if there's a nullable answers(same name as its table) column in either of the two table, as Postgres gives more priority to field than table; though can be worked around if we put asterisk after the name, so the RDBMS can detect that the thing we we wanted to count is the table rows rather than the column name:

SELECT users.user_id, COUNT(answers.*)
FROM users
JOIN answers ON answers.user_id = users.user_id
GROUP BY users.user_id

The ultimate proof that the database isn't reading all columns, this script outputs 4, not 0.

create table person
(
lastname varchar(50),
firstname varchar(50),
age int
);

insert into person(lastname,firstname,age) values(null,null,null);
insert into person(lastname,firstname,age) values(null,null,null);
insert into person(lastname,firstname,age) values(null,null,null);
insert into person(lastname,firstname,age) values(null,null,null);

select count(*) from person; -- output is 4, not 0

If the RDBMS reads all the columns, why the count still resolves to 4? Shouldn't it be 0 since all the values are null?

What makes the COUNT(1) different from COUNT(0), COUNT(2)? COUNT('your blessings')? COUNT('Dracula')? Answer: None. This will all output 4 on the above data. There's nothing special about COUNT(1) or COUNT(anyNonNullValues) that makes it different from COUNT(*); there's nothing slow in COUNT(*) as we can plainly see that from the output above, that it don't care whatever the column values are. How can COUNT(1) be faster than COUNT(*) when COUNT(*) doesn't evaluate any values on all columns?

And one could argue as well that COUNT(*) is faster than COUNT(fieldhere) or COUNT(1) for that matter, since it doesn't need interpreting any when it count rows.

The asterisk inside of COUNT is just a directive for the RDBMS to fetch the cardinality of the set, not to evaluate all columns of the row.

And SELECT COUNT(1) is a one trick pony, it's only pleasant and possible to use on one table query only.

SELECT COUNT(1) FROM person

You cannot mold the COUNT(1) for LEFT JOINed tables

Won't work on other scenarios(e.g. counting all bids of person, regardless if he/she has bids):

SELECT person.person_id, COUNT(bid.1) -- ugh! won't work, syntax error
FROM person
LEFT JOIN bid ON bid.person_id = person.person_id
GROUP BY person.person_id

This is the proper way to do it:

SELECT person.person_id, COUNT(bid.person_id) -- works on all RDBMS(Sql Server, Postgresql, Mysql, Access, etc)
FROM person
LEFT JOIN bid ON bid.person_id = person.person_id
GROUP BY person.person_id

This is the proper way to do it:

-- works on Postgresql only(afaik). More intuitive, you don't count the column, you count the cardinality of the set
SELECT person.person_id, COUNT(bid.*) 
FROM person
LEFT JOIN bid ON bid.person_id = person.person_id
GROUP BY person.person_id

/* COUNT(1) is a mistake on queries like below, 
this will always return at least 1 bid belonging to person, even that person don't have any bids. 
Not because the parameter is 1, even if you change the parameter of COUNT to COUNT(2), it won't double the rows count. 

You will never get a COUNT of 0 with COUNT(1), or even COUNT(0) will not result to a COUNT of 0 when you are using LEFT JOIN 


*/
SELECT person.person_id, COUNT(1) 
FROM person
LEFT JOIN bid ON bid.person_id = person.person_id
GROUP BY person.person_id

So if COUNT(1) is just a one-trick pony, why others are still using it on their queries? Cargo cult programming perhaps?

COUNT(*) is such a common programming idiom to be misinterpreted by the database vendor's engineers; to instill one a confidence that COUNT(*) is not slow compared to COUNT(1), one must visualize that the asterisk inside of COUNT(*)indicates cardinality of the set, it doesn't pertain to columns, full stop. Asterisk on COUNT(*) has no bearing with the asterisk on SELECT *, they just share a common token, i.e. asterisk. Database vendor's engineers are smarter than you and I will ever be, they won't dumbly implement the asterisk inside of COUNT to perform reading all columns, asterisk inside of COUNT indicates set cardinality.

Well it's just an unfortunate thing also that this doesn't make it to ANSI SQL:

SELECT COUNT() FROM person;

That will surely make debating 1 vs asterisk inside a function a moot point. The recommended way to give the RDBMS the hint that you want to count the cardinality of set is to use asterisk, rather than using 1; the parameterless COUNT was not able to made its way on ANSI SQL language constructs though

Open Source

Saturday, December 11, 2010

Plural table names are one trick pony

Wednesday, April 28, 2010

Debunking the myth that COUNT('Dracula') is faster than COUNT(*)