Double precision numbers, emulation, and compute capability < 1.3


I had a very hard time tracking down a bug in my code that occurred when I was using an array of doubles instead of floats. The code worked perfectly in emulation mode but not on the device. After a lot of experimenting, I realized that using -arch=sm_13 makes it work. However, that was different from my understanding of how earlier compute capabilities handled doubles.

Essentially, what I’m doing is:

__global__ kernel(double* data)


   float d = float(data[ someindex]);



The programming guide says that for devices with cc<1.3, double precision arithmetic is demoted to single precision. But that’s precisely what I’m doing here. There is no double precision arithmetic apart from the conversion to float. Yet when compiled without sm_13, d would contain garbage.

I have two questions:

  1. What exactly is the behavior for devices with cc<1.3? My example above seems to be the simplest possible use of doubles, I don’t want to do anything with them except convert them to floats (because the host data is in doubles). Yet that seems to just yield crap.

  2. Is the failure of emulation mode to work correctly here a bug? It doesn’t seem that I’m doing anything illegal.



Consider the consequences of demotion to single precision. sizeof(double) on the device would then equal sizeof(float), so when you pass in a double* from the CPU to a kernel compiled with < sm_13, the kernel will read four bytes instead of eight. Not only will it only read four bytes, but it will read four bytes at position N/2 instead of position N. So it doesn’t work at all.

The only time demotion to single precision results in what you expect is when you declare a double in your kernel–it will be demoted to float at compile time, the size differences will be taken into account, everything’s fine.

also, code that works in emulation does not necessarily work on the device (you could be passing host pointers, for example–emu won’t care, the device will throw an exception).

Ok, I see. I think it would be good if the programming guide was a little clearer on this point and made a stronger statement that it’s actually impossible to get doubles onto the device in the first place. (I can’t find it now, but I seem to recall reading something about how doubles still were occupying 8 bytes even though only single precision arithmetic is used, but that’s obviously incorrect.)

But it’s perfectly possible - cudaMemcpy takes a couple of pointers (of type void*, I believe) and a number of bytes to copy between them. It’s just that, once on the device, the poor little multiprocessors don’t know what to do with the bit patterns :)

What might be better is if the compiler issued a warning when a kernel with double (or double*) arguments was compiled for sm<1.3. And on a similar point, can we have cudaNew, cudaDelete and cudaCopy, which would help minimise these confusions too… perhaps I should put that in the request thread.

Welcome to CUDA 2.3! (yes we added that)

Hmm, I get no warnings when omitting the sm_13 when compiling my kernels that take double*'s…