I have written a program, which does not use double precision in any obvious way on the device side (as far as I can see). However, it is fairly complex. Most CUDA devices are much faster using single precision. However, when compiling with computation capability 1.0, I get
ptxas somefile.ptx, line xyz Double is not supported. Demoting to float
I was not able to find this double. I have to get rid of it, because it forces me to use sm_10 which does not provide all CUDA features. sm_13 provides a performance drop of about 50% for my device. How can I find this annoying double?
In the ptx code, it appears near a function pointer. Here is a sketch
ld.param.u32 %r12, [__cudaparm__kernelname…;
ld.global.f64 %fd1, [%r13+32]
So there is some array which comes with the function call. But I have no idea what exactly fd1 is. All floating point data used in the kernel is stored in arrays of structs, containing only floats. All register variables are float. Why is there a double? Doesn’t make sense to me.
Is there a way to use the ptx code to identify the problem? Is it at all a coding problem, or merely a compiler problem? I tried commenting out parts of the code - resulting in an inconistend behaviour with respect to the double problem above. Sometimes it appeared, sometimes not. I am grateful for any hints to identify the problem.