"cudaError at memory location" when using CUDA Toolkit 4.0

Hi,
I have simplest .cu file which
do only one cudaMalloc() //nothing else

If i compile it on CUDA RUNTIME 3.2 it works
but if i change it to CUDA RUNTIME 4.0 it throws me
two errors “cudaError at memory location”

Do you have any idea why it do that.

I read that this error can be thrown when we use device memory in host function(but it is not this case)
or when we try to run program on device that don’t have capabilities

I have GF 460 so it should can use CUDA 2.1 (Of course i try it also on sm1.0 and it still throw error)

I noticeed also that after use
cudaGetDeviceCount(&devCount);
in devCount is 0

I have win7 x64 and had installed sdk x64