cuda convolution separable sdk example test failed

I just download the cuda sdk example 4.0. After running the convolution separable example, I found the code fails the correctness test. The relative L2 norm is in the order of E-4. Can someone comment on this? I am running windows 7.

I’m also using 4.0 on windows 7; mine passed with an L2 norm of 0

I tried the same code on the others machines in the lab. The L2 norms are 0 as well. So I think it is a rare case.