Friday, June 17, 2011

Materialized view in Sql Server

If you have a high volume database, and you want to aggregate things, materializing/pre-computing the result is the way things should be done. Fortunately for Sql Server users, we don't have to roll our own views materialization, this is a built-in functionality in Sql Server, without such, we will need to put triggers on each involved tables and increment/decrement the summary column of the summary table.

Let's create the tables and populate them:

create table Country
(
CountryId int identity(1,1) not null primary key,
CountryCode varchar(4) not null unique,
CountryName varchar(30) not null unique
);



insert into Country(CountryCode,CountryName)
values('PH','Philippines');

insert into Country(CountryCode,CountryName)
values('CN','China');

insert into Country(CountryCode,CountryName)
values('JP','Japan');


create table City
(
CityId int identity(1,1) not null primary key,
CityCode varchar(3) not null unique,
CityName varchar(30) not null unique,
CountryId int not null references Country(CountryId)
);




insert into City(CityCode, CityName, CountryId)
values('MKT','Makati',1);

insert into City(CityCode, CityName, CountryId)
values('MNL','Manila',1);


insert into City(CityCode, CityName, CountryId)
values('BEI','Beijing',2);

insert into City(CityCode, CityName, CountryId)
values('SHA','Shanghai',2);


insert into City(CityCode, CityName, CountryId)
values('TKY','Tokyo',3);

insert into City(CityCode, CityName, CountryId)
values('KYT','Kyoto',3);



create table Sales
(
OrNo int identity(1,1) not null primary key,
OrYear int not null,
CityId int not null references City(CityId),
OrAmount numeric(18,6) not null
);



declare @i int;

set @i = 0;
while @i < 500000 begin
 insert into Sales(OrYear,CityId,OrAmount) values(2010,1,2);
 set @i = @i + 1;
end;

set @i = 0;
while @i < 500000 begin
 insert into Sales(OrYear,CityId,OrAmount) values(2010,2,2);
 set @i = @i + 1;
end;


set @i = 0;
while @i < 500000 begin
 insert into Sales(OrYear,CityId,OrAmount) values(2011,1,2);
 set @i = @i + 1;
end;

set @i = 0;
while @i < 500000 begin
 insert into Sales(OrYear,CityId,OrAmount) values(2011,2,2);
 set @i = @i + 1;
end;



set @i = 0;
while @i < 500000 begin
 insert into Sales(OrYear,CityId,OrAmount) values(2010,3,2);
 set @i = @i + 1;
end;

set @i = 0;
while @i < 500000 begin
 insert into Sales(OrYear,CityId,OrAmount) values(2010,4,2);
 set @i = @i + 1;
end;



set @i = 0;
while @i < 500000 begin
 insert into Sales(OrYear,CityId,OrAmount) values(2011,3,2);
 set @i = @i + 1;
end;

set @i = 0;
while @i < 500000 begin
 insert into Sales(OrYear,CityId,OrAmount) values(2011,4,2);
 set @i = @i + 1;
end;


set @i = 0;
while @i < 500000 begin
 insert into Sales(OrYear,CityId,OrAmount) values(2010,5,2);
 set @i = @i + 1;
end;

set @i = 0;
while @i < 500000 begin
 insert into Sales(OrYear,CityId,OrAmount) values(2010,6,2);
 set @i = @i + 1;
end;

set @i = 0;
while @i < 500000 begin
 insert into Sales(OrYear,CityId,OrAmount) values(2011,5,2);
 set @i = @i + 1;
end;

set @i = 0;
while @i < 500000 begin
 insert into Sales(OrYear,CityId,OrAmount) values(2011,6,2);
 set @i = @i + 1;
end;
Then let's create a view for the summary:
create view SalesOnEachCountry 
as 
select x.CountryId, x.CountryName, s.OrYear, 
COUNT_BIG(*) AS SalesCountPerCountryPerYear, sum(s.OrAmount) as TotalSales
from dbo.Sales s
join dbo.City c on c.CityId = s.CityId
join dbo.Country x on c.CountryId = x.CountryId
group by x.CountryId, s.OrYear
        ,x.CountryName -- ancillary field
Running this...
select * from SalesOnEachCountry order by CountryName, OrYear
, has a result of...
CountryId   CountryName                    OrYear      SalesCountPerCountryPerYear TotalSales
----------- ------------------------------ ----------- --------------------------- ---------------------------------------
2           China                          2010        1000000                     2000000.000000
2           China                          2011        1000000                     2000000.000000
3           Japan                          2010        1000000                     2000000.000000
3           Japan                          2011        1000000                     2000000.000000
1           Philippines                    2010        1000000                     2000000.000000
1           Philippines                    2011        1000000                     2000000.000000

(6 row(s) affected)
...which took 4 seconds. My benchmarking is muddled up a bit, as my development machine has SSD(with read speed of 250 MB/s), 7200 rpm drive has 35 MB/s read speed only. Anyway, for the sake of discussion, we are interested with how fast can materialized view(indexed view in Sql Server parlance) optimize aggregations. Now that 4 seconds, could really hurt the web response time, it's time to rectify that. We shall materialize that 6 rows view. It's just a two step process, first we must schema-bind the views:
alter view SalesOnEachCountry with SchemaBinding
as 
select x.CountryId, x.CountryName, s.OrYear, 
COUNT_BIG(*) AS SalesCountPerCountryPerYear, sum(s.OrAmount) as TotalSales
from dbo.Sales s
join dbo.City c on c.CityId = s.CityId
join dbo.Country x on c.CountryId = x.CountryId
group by x.CountryId, s.OrYear
        ,x.CountryName -- ancillary field
Second, we put index on those fact columns(i.e. CountryId, OrYear):
CREATE UNIQUE CLUSTERED INDEX ux_Country on SalesOnEachCountry(CountryId, OrYear);
Then try querying the view again:
select * from SalesOnEachCountry order by CountryName, OrYear;
Surprise! That query took 0 second(I don't know how to switch the display to millisecond) :-) And even if you try to insert/update/delete a row in Sales table, Sql Server will not re-aggregate the 3 million and 1 rows, instead it will just increase/decrease the count and summary column. Likewise, updating the OrAmount won't cause re-aggregation on rows; instead it will efficiently increment/decrement the summary column based on the delta of updated OrAmount. Your query will not visit the million rows to produce information for 6 rows when using materialized views indexed views. It will just get the 6 rows.

Ancillary field(s)

No comments:

Post a Comment