Sunday, May 15, 2011

What is MVCC? If you don't know, your boss will fire you and hire someone with nanosecond-perfect timing

First of all, a disclaimer: SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED (revoke someone's DBA license if he/she consider it for a money-transacting app) is not even remotely considered in this discussion. We are talking serious business here, we are talking money :p

Simply put, MVCC can solve many of the database timeout woes. The writers never block the readers. The old school Sql Server application developers and even the new ones(those who are not aware that Sql Server 2005 onwards already has MVCC capability) argues that you should not be able to read a row if some code is making a transaction on that particular row. Don't heed that suggestion, especially Sql Server 2005 onwards already has MVCC, don't let yourself lose your job.


Now here's a contrive yet amusing scenario, let's say the transaction(e.g. long running batch job) takes a whole day, yet you still love your flavor of RDBMS, warts and all, as it is very performant on aggregating informations and delivering the results under 1 second. Would you let your boss down and not let him know today's sales just because some code obtained a lock on 1 row only for the entire day?


I know, I know, that bottleneck is very absurd, one day, but guess what, the pain threshold on waiting for the database to return results varies with everyone, a 30 second waiting might be tolerable to you, but a 10 second waiting might irk your boss, especially everyone now is accustomed to speed of thought web(e.g. google, online stock exchange, e-commerce, etc).


If you don't have consistent user experience when it comes to returning information to your users or web audience, as your database experiences recurring timeouts; it might turn off your users and you will lose them to your competitors.


An example, on a fairly busy website, many are repeatedly editing information(think Facebook users who often change their information, motto, slogan, status message, etc), SELECTing all rows (say, through aggregations, e.g. SUM, COUNT) won't have a fighting chance to be able to perform what you told it to do, provide reports. As in-between milliseconds, your database is experiencing UPDATEs.



MVCC can prevent that problem, as it provide row readers the last snapshot of row(s) prior someone is busy transacting on those row(s). Note: the actual implementation of MVCC is, the writers has separate copy of rows they are updating/deleting/inserting, those copy are independent from active rows, and those are not effected to database until the transaction is committed.


If someone suggested SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED to make your code able to read rows, tell them it's not a good practice; as your code can potentially read partially finished transaction, think fund transfers. There's 5 bucks on Account A, and 10 bucks on Account B. If you deduct 2 bucks from Account A and increase the Account B with 2 bucks, but however, in-between that two steps, someone perform checking the total money of bank(i.e. SELECT SUM(Amount) FROM BankAccount), your code can read 13 only instead of 15.


With MVCC turned on on Sql Server, your program would be able to read a grand total of 15 no matter how slow your transaction processing is, no matter what place in code your transaction is currently processing at. Even if the other code deleted(think customers who pull-out their account) those two rows inside a transaction, your code will still be able to read 15. Why it should be that way? At that point-in-time (during transaction) say 1:15 PM, your boss is asking how much money you have in bank, and that point-in-time 1:15 PM, your bank still has money, and the transaction(pulling-out accounts) is not yet committed(code-wise and real-world-wise), your system is just honest, you still have money at that point-in-time and reports 15 bucks, and on the report printout it has a footer that indicates Printed on: May 15, 2011 1:15 PM. Then 10 minutes later(say you are unfortunate that your RDBMS of choice performs very slowly) the customer's pull-outs of his two accounts is committed to the database, then that's the only time you shall return money to your customer, and that's the only time(May 15, 2011 1:25 PM) your system can honestly report that your bank don't have money.


You should not let your system be rendered unusable if it is pressed for information, it should still be able to perform reporting values even your business operation has very heavy transactions. Give yourself a fighting chance to be able to keep your job. You: "Boss, I cannot print the report you requested, our business is doing extremely well, we have transactions in between milliseconds and perhaps microseconds, it's hard for me to make a perfect timing around the system so then I can print the report you requested" Boss: "Ok then, you are fired! I'll just hire someone who has nanosecond-perfect timing"


Here's how to turn on MVCC on your database:

ALTER DATABASE <database name>
   SET READ_COMMITTED_SNAPSHOT ON;


So what is MVCC? If you don't know it, learn it so you can concentrate on business logic rather than resolving technical problems such as timeouts. If you are just a humble user, demand it from programmers so you can keep your job.


Oracle already has MVCC since version 3 (1983). Postgresql(my favorite RDBMS) already has it since version 6.5 (1999).


Sql Server is a late-bloomer, be easy on it, don't bash it too much for causing you so much griefs on incessant timeouts. Anyway, on Sql Server 2005 and up, turning MVCC on is just an ALTER command away :-)


UPDATE


This need to be set too:

ALTER DATABASE <database name> SET ALLOW_SNAPSHOT_ISOLATION ON;

Details here:

http://stackoverflow.com/questions/1483725/select-for-update-with-sql-server/1529377#1529377

No comments:

Post a Comment