I have a kernel that works perfectly on multiple cards in single precision (GTX-280 and Tesla c1060). If I switch to double precision, literally by replacing ‘float’ with ‘double’, the kernel runs but gives all zero results. This happens on multiple cards and under Linux or Windows (both 64-bit). I’m using CUDA 2.1 and compiling on the GTX-280. I added ‘-arch sm_13’ to the nvcc command line but it appears to make no difference to the final output.
Have other people seen similar to this? Is there something extra that needs to be done to get double precision to work? The kernel is simple and I can post the code if it is deemed helpful.