Did any tried double precision computation?

According NVIDIA OpenCL Programming Guide, section 3.1.1.1, “When
compiling for devices without native double-precision floating-point support, such
as devices of compute capability 1.2 and lower, each double variable gets
converted to single-precision floating-point format (but retains its size of 64 bits)
and double-precision floating-point arithmetic gets demoted to single-precision
floating-point arithmetic.”

I tried a kernel which basically assigns the input number to the output number.
It gives incorrect results when the numbers are double precision. I use GTX 295
which is actually capable of double precision arithmetics.