Wednesday, August 1, 2012

Use components that uses yield return

When to use yield return?

You were given a task to shop for a component you cannot make by yourself, say fibonacci number generators.

You found two vendors. Both of their components produces correct output, both of them can generate up to fifty fibonaccis, and both of them are priced the same.

All things being equal, you are left with one qualifier, a good qualifier, you want to choose the component with a better quality. But for some reasons, these two vendors don't want to give you their source codes, hence you cannot see how they implement things. They just want to give you the DLL only.

But those vendors still have some kindness left in their hearts, they at least give you a clue how their components work.

Vendor A told you they uses eager loading(i.e. uses IList)
Vendor B told you they uses lazy loading(i.e. uses yield return)

In case like this, you need to purchase vendor B's component.

Vendor A's approach uses too much memory, they put elements to an allocated memory. Vendor B generates elements on-the-fly.


Desirability of vendor B's approach is more apparent if for example you just want the 8th fibonacci.

Using vendor A's component, your code still need to wait a long time in order to get the 8th element. Why it is so? Even you just want the 8th element, your code still need for vendor A's component to generate the whole 50 elements, after doing so, then that's the only time you can pick the 8th element.


Using vendor B's component, if you just want to get the 8th element, it will stop generating on the 8th element. Your code don't need vendor B's component to generate all the 50 elements in order for your code to get the 8th element. Vendor B's component has some smart on it.


To contrast the difference between IList and yield return, check the output of this code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
 
namespace ChooseLesserOfTheTwoEvils
{
    class Program
    {
        static void Main(string[] args)
        {
            TestLoopOnEvilA();
            Console.WriteLine("------");
            TestLoopOnEvilB();
            Console.WriteLine("------");
 
            Console.ReadKey();
             
            TestPick8thOnEvilA();
            Console.WriteLine("------");
            TestPick8thOnEvilB();
            Console.WriteLine("------");
 
            Console.ReadKey();
        }
 
        static void TestPick8thOnEvilA()
        {
            Console.WriteLine("TestPick8thOnEvilA");
            long eighth = EvilCompanyA.MathProvider.Fibonacci().Skip(7).Take(1).Single();
            Console.WriteLine("\nEvil A's 8th fibonacci is {0}", eighth);
        }
 
        static void TestPick8thOnEvilB()
        {
            Console.WriteLine("TestPick8thOnEvilB");
            long eighth = EvilCompanyB.MathProvider.Fibonacci().Skip(7).Take(1).Single();
            Console.WriteLine("\nEvil B's 8th fibonacci is {0}", eighth);
        }
 
        static void TestLoopOnEvilA()
        {
            Console.WriteLine("Test Loop On Evil A");
            IEnumerable<long> bucket = EvilCompanyA.MathProvider.Fibonacci();
            Console.WriteLine("\nTest start");
 
            foreach (var item in bucket)
            {
                Console.WriteLine("Evil A's Fib: {0}", item);
            }
            Console.ReadLine();
        }
 
         
        static void TestLoopOnEvilB()
        {
            Console.WriteLine("Test Loop On Evil B");
            IEnumerable<long> bucket = EvilCompanyB.MathProvider.Fibonacci();
            Console.WriteLine("\nTest start");
 
            foreach (var item in bucket)
            {               
                Console.WriteLine("Evil B's Fib: {0}", item);
            }
            Console.ReadLine();
        }
    }
 
     
}
 
 
namespace EvilCompanyA
{
    public static class MathProvider
    {
        public static IEnumerable<long> Fibonacci()
        {           
            IList<long> il = new List<long>();
 
            long a = 0, b = 1;
 
            for (int i = 1; i <= 50; ++i)
            {
                Console.Write("eager{0} ",i);
                il.Add(a);
                long n = a;
                a += b;
                b = n;
            }
                   
 
             
            return il;
        }
    }
}
 
namespace EvilCompanyB
{
    public static class MathProvider
    {
        public static IEnumerable<long> Fibonacci()
        {
            long a = 0, b = 1;
 
            for (int i = 1; i <= 50; ++i)
            {
                Console.Write("lazy{0} ",i);
                yield return a;
                long n = a;
                a += b;
                b = n;
            }
        }
    }
}

Output:

Test Loop On Evil A
eager1 eager2 eager3 eager4 eager5 eager6 eager7 eager8 eager9 eager10 eager11 eager12 eager13 eager14 eager15 eager16 eager17 eager18 eager19 eager20 eager21 eager22 eager23 eager24 eager25 eager26 eager27 eager28 eager29 eager30 eager31 eager32 eager33 eager34 eager35 eager36 eager37 eager38 eager39 eager40 eager41 eager42 eager43 eager44 eager45 eager46 eager47 eager48 eager49 eager50 
Test start
Evil A's Fib: 0
Evil A's Fib: 1
Evil A's Fib: 1
Evil A's Fib: 2
Evil A's Fib: 3
Evil A's Fib: 5
Evil A's Fib: 8
Evil A's Fib: 13
Evil A's Fib: 21
Evil A's Fib: 34
Evil A's Fib: 55
Evil A's Fib: 89
Evil A's Fib: 144
Evil A's Fib: 233
Evil A's Fib: 377
Evil A's Fib: 610
Evil A's Fib: 987
Evil A's Fib: 1597
Evil A's Fib: 2584
Evil A's Fib: 4181
Evil A's Fib: 6765
Evil A's Fib: 10946
Evil A's Fib: 17711
Evil A's Fib: 28657
Evil A's Fib: 46368
Evil A's Fib: 75025
Evil A's Fib: 121393
Evil A's Fib: 196418
Evil A's Fib: 317811
Evil A's Fib: 514229
Evil A's Fib: 832040
Evil A's Fib: 1346269
Evil A's Fib: 2178309
Evil A's Fib: 3524578
Evil A's Fib: 5702887
Evil A's Fib: 9227465
Evil A's Fib: 14930352
Evil A's Fib: 24157817
Evil A's Fib: 39088169
Evil A's Fib: 63245986
Evil A's Fib: 102334155
Evil A's Fib: 165580141
Evil A's Fib: 267914296
Evil A's Fib: 433494437
Evil A's Fib: 701408733
Evil A's Fib: 1134903170
Evil A's Fib: 1836311903
Evil A's Fib: 2971215073
Evil A's Fib: 4807526976
Evil A's Fib: 7778742049
------
Test Loop On Evil B

Test start
lazy1 Evil B's Fib: 0
lazy2 Evil B's Fib: 1
lazy3 Evil B's Fib: 1
lazy4 Evil B's Fib: 2
lazy5 Evil B's Fib: 3
lazy6 Evil B's Fib: 5
lazy7 Evil B's Fib: 8
lazy8 Evil B's Fib: 13
lazy9 Evil B's Fib: 21
lazy10 Evil B's Fib: 34
lazy11 Evil B's Fib: 55
lazy12 Evil B's Fib: 89
lazy13 Evil B's Fib: 144
lazy14 Evil B's Fib: 233
lazy15 Evil B's Fib: 377
lazy16 Evil B's Fib: 610
lazy17 Evil B's Fib: 987
lazy18 Evil B's Fib: 1597
lazy19 Evil B's Fib: 2584
lazy20 Evil B's Fib: 4181
lazy21 Evil B's Fib: 6765
lazy22 Evil B's Fib: 10946
lazy23 Evil B's Fib: 17711
lazy24 Evil B's Fib: 28657
lazy25 Evil B's Fib: 46368
lazy26 Evil B's Fib: 75025
lazy27 Evil B's Fib: 121393
lazy28 Evil B's Fib: 196418
lazy29 Evil B's Fib: 317811
lazy30 Evil B's Fib: 514229
lazy31 Evil B's Fib: 832040
lazy32 Evil B's Fib: 1346269
lazy33 Evil B's Fib: 2178309
lazy34 Evil B's Fib: 3524578
lazy35 Evil B's Fib: 5702887
lazy36 Evil B's Fib: 9227465
lazy37 Evil B's Fib: 14930352
lazy38 Evil B's Fib: 24157817
lazy39 Evil B's Fib: 39088169
lazy40 Evil B's Fib: 63245986
lazy41 Evil B's Fib: 102334155
lazy42 Evil B's Fib: 165580141
lazy43 Evil B's Fib: 267914296
lazy44 Evil B's Fib: 433494437
lazy45 Evil B's Fib: 701408733
lazy46 Evil B's Fib: 1134903170
lazy47 Evil B's Fib: 1836311903
lazy48 Evil B's Fib: 2971215073
lazy49 Evil B's Fib: 4807526976
lazy50 Evil B's Fib: 7778742049
------
TestPick8thOnEvilA
eager1 eager2 eager3 eager4 eager5 eager6 eager7 eager8 eager9 eager10 eager11 eager12 eager13 eager14 eager15 eager16 eager17 eager18 eager19 eager20 eager21 eager22 eager23 eager24 eager25 eager26 eager27 eager28 eager29 eager30 eager31 eager32 eager33 eager34 eager35 eager36 eager37 eager38 eager39 eager40 eager41 eager42 eager43 eager44 eager45 eager46 eager47 eager48 eager49 eager50 
Evil A's 8th fibonacci is 13
------
TestPick8thOnEvilB
lazy1 lazy2 lazy3 lazy4 lazy5 lazy6 lazy7 lazy8 
Evil B's 8th fibonacci is 13
------



Live Code: http://ideone.com/QxU7Sn

Sunday, July 29, 2012

From raw parameters to structured data

To avoid stringly-typed processing, convert string-based parameters to structured data. Then do further processing on that internal data structure, where the information can be easily processed.


public class Node
{
    public Node ChildOf { get; set; }
    public string Path { get; set; }

    public IList<Node> Children { get; set; }

}



public static class Helper
{

    public static Node ArrayToTree(params string[] paths)
    {
        Node root = new Node
        {
            ChildOf = null,
            Path = null,
            Children = new List<Node>()
        };

        

        foreach (string path in paths)
        {
            string[] subPaths = path.Split('.');

            Node addTo = root;

            foreach (string subPath in subPaths)
            {
                Node child = addTo.Children.FirstOrDefault(x => x.Path == subPath);

                if (child == null)
                {
                    child = new Node
                    {
                        ChildOf = addTo,
                        Path = subPath,
                        Children = new List<Node>()
                    };

                    addTo.Children.Add(child);
                }//if

                addTo = child;
            }//foreach

        }//foreach


        return root;
    }
}


Unit test:


public void TestStructuredData()
{
    Node n = Helper.ArrayToTree(
        new[]
        { 
            "Answers", 
            "Answers.Comments", 
            "Answers.Greats", 
            "Answers.Balls.Of.Fire", 
            "Comments" 
        
        });


    Assert.AreEqual(2, n.Children.Count);
    Assert.AreEqual(3, n.Children[0].Children.Count);


    Assert.AreEqual("Answers", n.Children[0].Path);

    Assert.AreEqual("Comments", n.Children[0].Children[0].Path);
    Assert.AreEqual("Greats", n.Children[0].Children[1].Path);
    Assert.AreEqual("Balls", n.Children[0].Children[2].Path);
    Assert.AreEqual("Of", n.Children[0].Children[2].Children[0].Path);
    Assert.AreEqual("Fire", n.Children[0].Children[2].Children[0].Children[0].Path);

    Assert.AreEqual("Comments", n.Children[1].Path);
}


Avoid stringly-typed programming, use proper data structure.

http://www.google.com/search?q=stringly-typed+programming+-strongly

Sunday, July 15, 2012

CASE WHEN ELSE is faster than OR approach

Common denominator feature. When ideals get trumped by reality.


I encountered an SQL query where I thought it would be ideal to make it ORM-friendly, so when time comes you need to port it to an ORM query, porting shall be easier. OR so I thought.


I saw this question on http://stackoverflow.com/questions/10646018/can-you-use-case-to-specify-a-join-to-use-in-mysql:

I am wondering if it is possible to use some sort of case in a mysql query to effectively achieve the following and reduce the amount of data that has to be looked up each time ..

SELECT um.message, UNIX_TIMESTAMP(um.time), um.read, user.username
FROM usermessages um
CASE
WHEN um.friendId = 1
INNER JOIN user ON um.sourceUserId = user.id
WHEN um.sourceUserId = 1
INNER JOIN user ON um.friendId = user.id
END
WHERE (um.friendId = 1 OR um.sourceUserId = 1)



I offered this solution: http://stackoverflow.com/questions/10646018/can-you-use-case-to-specify-a-join-to-use-in-mysql/10646233#10646233

SELECT um.message, UNIX_TIMESTAMP(um.time), um.read, user.username
FROM usermessages um
INNER JOIN user ON 
    (um.friendId = 1 AND um.sourceUserId = user.id)
    OR
    um.friendId = user.id
WHERE (um.friendId = 1 OR um.sourceUserId = 1);


Then another stackoverfellow offer this solution: http://stackoverflow.com/questions/10646018/can-you-use-case-to-specify-a-join-to-use-in-mysql/10646078#10646078


SELECT um.message, UNIX_TIMESTAMP(um.time), um.read, user.username
FROM usermessages um
INNER JOIN user ON 
    CASE 
        WHEN um.friendId = 1 THEN um.sourceUserId
                             ELSE um.friendId
    END = user.id
WHERE (um.friendId = 1 OR um.sourceUserId = 1)



I disliked his solution initially, as I embrace ORM more and more, I'm of the opinion that we should write portable code, and what's more portable than writing the query with constructs that exists on both SQL and ORM? I disliked the CASE WHEN approach to the problem as it don't have any analogous construct on ORM, Linq in particular.

With that in mind, I tried to dissuade the user from using the CASE WHEN approach. But we need to wise-up though, portability is not a panacea that can magically make our applications have more customers. How can a customer be happy if bad application performance spoils the fun on using your application. With this in mind, I tried to profile both queries to see which one is faster. I'm expecting the OR to be slower(RDBMS don't do short-circuit, it uses cost-based analysis on determining how it should perform the query) than CASE WHEN, but I'm not expecting very big difference in performance, then much to my chagrin when I see a staggering difference in performance, the CASE WHEN approach can finish the job in 88 milliseconds, while the OR approach took 4.7 seconds. The performance of OR is not acceptable




CASE WHEN approach's Query Cost is 3% only, while the OR approach is 97%. So there goes our portability ideals, it's not ideal. We need to tap the strengths of a given tool in order to get our job done in an efficient manner. I will not be dead set anymore to make a given code construct be very compatible on another platform or API.



Expand to see the code used for benchmarking: http://www.sqlfiddle.com/#!6/29531/2
create table usermessages(
  message varchar(30),
  friendId int,
  sourceUserId int
);

create table "user"(
  userId int identity(1,1) primary key,
  userName varchar(30)
);


create index ix_usermessages__friendid 
on usermessages(friendid);

create index ix_usermessages__combined
on usermessages(friendid,sourceUserId);


create index ix_usermessages__sourceuserid
on usermessages(sourceuserid);



insert into usermessages
select 'hello',row_number() over(order by x.message_id) % 50,1 from sys.messages x;


insert into "user"(userName)
select 'hello' from sys.messages;


select count(*)
from usermessages um
join "user" u

on
(case when um.friendId = 1 then um.sourceUserId else um.friendId end) = u.userId;


select count(*)
from usermessages um
join "user" u

on 
(um.friendId = 1 and um.sourceUserId = u.userId)
or 
(um.friendId = u.userId);

Wednesday, July 11, 2012

Monitoring perfect attendance. Consecutive calendar date query

In this article, I will show you how to get the longest attendance streak a given student had achieved. This will also take the weekends and holidays into consideration.





Given a student has this attendance:

July 3 Tuesday
July 4 Holiday -- Independence Day
July 5 Thursday
July 6 Friday
July 9 Monday
July 12 Thursday
July 13 Friday


The query should show the student had achieved 4 consecutive attendance on these dates:
July 3 Tuesday
July 5 Thursday
July 6 Friday
July 9 Monday


July 4 is a holiday and should not affect or penalize the student's attendance performance, the query should show the student has good four consecutive attendance from July 3 to July 9; likewise, the weekends(Saturday and Sunday) should not have an effect on attendance performance.


Student gets an attendance performance of 11 from July 18 to August 1 when she attend classes on following dates:

July 18 Wednesday
July 19 Thursday
July 20 Friday
July 23 Monday
July 24 Tuesday
July 25 Wednesday
July 26 Thursday
July 27 Friday
July 30 Monday
July 31 Tuesday
August 1 Wednesday
August 3 Friday


How to determine if the given dates are consecutive? This is a solved problem, and it has a terminology to it, it's called islands and gaps algorithm.

The core thing with islands and gaps problem is we must have a mechanism to group consecutive rows so in the final query we can count them. One mechanism is to use row_number windowing function, though row numbering approach will fail if there's duplicate rows, row_number would suffice if the rows are unique, e.g. attendance.

So given this numbers, we have 5 consecutive numbers(1 to 5) and another 3 consecutive numbers(8 to 10):
1
2
3
4
5
8
9
10
12


With row_number we can determine the consecutive-ness of those numbers
N    R#
1    1
2    2   
3    3 
4    4
5    5
8    6
9    7
10   8
12   9


Subtract row number from N, this will give us related rows
N    R#   GRP
1    1    0
2    2    0
3    3    0
4    4    0
5    5    0
8    6    2
9    7    2
10   8    2
12   9    3


Given the GRP column, we can do this query to count the longest attendance streak a given student has:


with a as
(
  select n, n -  row_number() over(order by n) as grp
  from tbl
)
select 
  *
  ,dense_rank() over(order by grp)  as nth_streak
  ,count(*) over(partition by grp)  as streak
from a;

|  N | GRP | NTH_STREAK | STREAK |
----------------------------------
|  1 |   0 |          1 |      5 |
|  2 |   0 |          1 |      5 |
|  3 |   0 |          1 |      5 |
|  4 |   0 |          1 |      5 |
|  5 |   0 |          1 |      5 |
|  8 |   2 |          2 |      3 |
|  9 |   2 |          2 |      3 |
| 10 |   2 |          2 |      3 |
| 12 |   3 |          3 |      1 |    


Now, for the final hurdle to tackle, if 5 is Friday, 6 and 7 are Saturday and Sunday respectively. So on the above data, the student should have an 8 consecutive attendance instead of separate 5 and 3. To do that, we need to shift all the date if it is over 7.


Here's the formula for that:

n - ((n/7) * 2) - row_number()over(order by n) as grp


To illustrate, use this query:

with a as
(
  select n, n - ((n/7) * 2) as shift_up, row_number() over(order by n) as rn
  from tbl
)
select 
  *
  ,n - rn as grp
from a;

Here's the output:

|  N | SHIFT_UP | RN | GRP |
----------------------------
|  1 |        1 |  1 |   0 |
|  2 |        2 |  2 |   0 |
|  3 |        3 |  3 |   0 |
|  4 |        4 |  4 |   0 |
|  5 |        5 |  5 |   0 |
|  8 |        6 |  6 |   2 |
|  9 |        7 |  7 |   2 |
| 10 |        8 |  8 |   2 |
| 12 |       10 |  9 |   3 |    

Notice that the value Monday's value(8) shifted to 6, likewise with Tuesday(9, this shifted to 7) and so on.

With this logic at hand, we can now easily determine the consecutive-ness of adjacent rows. We just need to subtract row_number from SHIFT_UP in order to make the current Monday consecutive to last week's Friday. This is the query:

with a as
(
  select n, n - ((n/7) * 2) - row_number()over(order by n) as grp
  from tbl
)
select 
  *
  ,dense_rank() over(order by grp)  as nth_streak
  ,count(*) over(partition by grp)  as streak
from a;


This is the output:
|  N | GRP | NTH_STREAK | STREAK |
----------------------------------
|  1 |   0 |          1 |      8 |
|  2 |   0 |          1 |      8 |
|  3 |   0 |          1 |      8 |
|  4 |   0 |          1 |      8 |
|  5 |   0 |          1 |      8 |
|  8 |   0 |          1 |      8 |
|  9 |   0 |          1 |      8 |
| 10 |   0 |          1 |      8 |
| 12 |   1 |          2 |      1 |    


It now yields the date from last week's Monday to Friday(1 to 5) and this week's Monday to Friday(8 to 10) as consecutive 8 days(indicated by GRP=0). Hence yielding a 5+3 consecutive attendance, this yields 8 consecutive attendance.


Another way to visualize how the date are being shifted(to determine date consecutive-ness) two days back:

Original data:
                  Weekends are not in attendance(e.g. 6,7,13,14)
 1  2  3  4  5     6  7
 8  9 10 11 12    13 14
15

If a given date falls on 15th, this will be computed as 15/7 * 2 == 4; then subtract 4 from the original number, 15 - 4 == 11. 15 will become the 11th day. Likewise the 8th day becomes the 6th day; 8 - (8/7 * 2) == 6. The above data yields the following output when the computation is applied:

 1  2  3  4  5    
 6  7  8  9 10    
11

The above attendance yields 11 consecutive good attendance. For holidays, you need to slot them on attendance, so the consecutive-ness of dates could be easily determined, then just remove them from the final query.


To apply the above logic on determining consecutive attendance even there are holidays and weekends, use this:


drop table tx;
drop table holiday;

create table tx
(
i int identity(1,1) not null primary key,
n varchar(10), d date,
constraint ux_tx unique(n,d)
);

insert into tx(n,d) values
('john','2012-7-3'),
('john','2012-7-5'),
('john','2012-7-6'),
('john','2012-7-9'),
('john','2012-7-12'),
('john','2012-7-13'),
('john','2012-7-16'),
('john','2012-7-17'),
('john','2012-7-18'),
('john','2012-7-20'),
('john','2012-7-30'),
('john','2012-7-31'),

('paul','2012-7-3'),
('paul','2012-7-5'),
('paul','2012-7-18'),
('paul','2012-7-19'),
('paul','2012-7-20'),
('paul','2012-7-23'),
('paul','2012-7-24'),
('paul','2012-7-25'),
('paul','2012-7-26'),
('paul','2012-7-27'),
('paul','2012-7-30'),
('paul','2012-7-31'),
('paul','2012-8-1'),
('paul','2012-8-3'),
('paul','2012-8-6'),
('paul','2012-8-7');

create table holiday(d date);

insert into holiday(d) values
('2012-7-4');



--- query

with first_date as
(
 select dateadd( ww, datediff(ww,0,min(d)), 0 ) as first_date -- get the monday of the earliest date
 from tx 
)
,shifted as
(
 select 
  tx.n, tx.d, 
    
  diff = datediff(day, fd.first_date, tx.d) - (datediff(day, fd.first_date, tx.d)/7 * 2) 
  
 from tx
 cross join first_date fd
 union
 select 
  xxx.n, h.d, 
  
  diff = datediff(day, fd.first_date, h.d) - (datediff(day, fd.first_date, h.d)/7 * 2) 
 from holiday h 
 cross join first_date fd
 cross join (select distinct n from tx) as xxx
)
,grouped as
(
 select 
  *
  , grp = diff - row_number() over(partition by n order by d)
 from shifted
)
select 
    
    -- remove staging columns from the output...
    -- * 
    
    -- ...just output what the user will see:
    d, n
    
 ,dense_rank() over (partition by n order by grp) as nth_streak
 ,count(*) over (partition by n, grp) as streak
from grouped
where d not in (select d from holiday)  -- remove the holidays

Output:
|          D |    N | NTH_STREAK | STREAK |
-------------------------------------------
| 2012-07-03 | john |          1 |      4 |
| 2012-07-05 | john |          1 |      4 |
| 2012-07-06 | john |          1 |      4 |
| 2012-07-09 | john |          1 |      4 |
| 2012-07-12 | john |          2 |      5 |
| 2012-07-13 | john |          2 |      5 |
| 2012-07-16 | john |          2 |      5 |
| 2012-07-17 | john |          2 |      5 |
| 2012-07-18 | john |          2 |      5 |
| 2012-07-20 | john |          3 |      1 |
| 2012-07-30 | john |          4 |      2 |
| 2012-07-31 | john |          4 |      2 |
| 2012-07-03 | paul |          1 |      2 |
| 2012-07-05 | paul |          1 |      2 |
| 2012-07-18 | paul |          2 |     11 |
| 2012-07-19 | paul |          2 |     11 |
| 2012-07-20 | paul |          2 |     11 |
| 2012-07-23 | paul |          2 |     11 |
| 2012-07-24 | paul |          2 |     11 |
| 2012-07-25 | paul |          2 |     11 |
| 2012-07-26 | paul |          2 |     11 |
| 2012-07-27 | paul |          2 |     11 |
| 2012-07-30 | paul |          2 |     11 |
| 2012-07-31 | paul |          2 |     11 |
| 2012-08-01 | paul |          2 |     11 |
| 2012-08-03 | paul |          3 |      3 |
| 2012-08-06 | paul |          3 |      3 |
| 2012-08-07 | paul |          3 |      3 |    

Live test: http://www.sqlfiddle.com/#!3/709a0/1

DATEFIRST settings side-effect on DATEPART

Upon reading this comment:

DATEPART(DW is dependant upon @@datefirst – Martin Smith Aug 24 '11 at 0:03


I tried this code:


set nocount on;

declare @d date = '2012-07-4';
 
set language italian;
-- SET DATEFIRST 1; -- Italy is Monday-first week. this is equivalent to above
select @@DATEFIRST as df;
select DATEPART(DW, @d) as x;
SELECT DATEADD(ww, DATEDIFF(ww,0,@d), 0) as y;
SELECT DATEADD(DD, 1 - DATEPART(DW, @d), @d) as z;
 
set language us_english;
-- SET DATEFIRST 7; -- US is Sunday-first week. this is equivalent to above
select @@DATEFIRST as df;
select DATEPART(DW, @d) as x;
SELECT DATEADD(ww, DATEDIFF(ww,0,@d), 0) as y;
SELECT DATEADD(DD, 1 - DATEPART(DW, @d), @d) as z;


Output
L'impostazione della lingua è stata sostituita con Italiano.
df
----
1

x
-----------
3

y
-----------------------
2012-07-02 00:00:00.000

z
----------
2012-07-02

Changed language setting to us_english.
df
----
7

x
-----------
4

y
-----------------------
2012-07-02 00:00:00.000

z
----------
2012-07-01

There's a discrepancy(the last queries, z column) when using datepart on obtaining first day of week. On countries(e.g. USA) that start their calendar week with Sunday, the first day of week for July 4, 2012 is July 1, 2012(Sunday). For countries that start their calendar week with Monday, the first day of week for July 4, 2012 is July 2, 2012(Monday).


To avoid this discrepancy, instead of using date functions(e.g. DATEPART) that has a dependency on @@DATEFIRST settings, use date functions that doesn't rely on @@DATEFIRST settings e.g. DATEDIFF