Double precision numbers, emulation, and compute capability < 1.3

lutormx · August 6, 2009, 5:40pm

Hi,

I had a very hard time tracking down a bug in my code that occurred when I was using an array of doubles instead of floats. The code worked perfectly in emulation mode but not on the device. After a lot of experimenting, I realized that using -arch=sm_13 makes it work. However, that was different from my understanding of how earlier compute capabilities handled doubles.

Essentially, what I’m doing is:

__global__ kernel(double* data)

{

   float d = float(data[ someindex]);

...

}

The programming guide says that for devices with cc<1.3, double precision arithmetic is demoted to single precision. But that’s precisely what I’m doing here. There is no double precision arithmetic apart from the conversion to float. Yet when compiled without sm_13, d would contain garbage.

I have two questions:

What exactly is the behavior for devices with cc<1.3? My example above seems to be the simplest possible use of doubles, I don’t want to do anything with them except convert them to floats (because the host data is in doubles). Yet that seems to just yield crap.
Is the failure of emulation mode to work correctly here a bug? It doesn’t seem that I’m doing anything illegal.

Thanks,

/Patrik

tmurray · August 6, 2009, 5:49pm

Consider the consequences of demotion to single precision. sizeof(double) on the device would then equal sizeof(float), so when you pass in a double* from the CPU to a kernel compiled with < sm_13, the kernel will read four bytes instead of eight. Not only will it only read four bytes, but it will read four bytes at position N/2 instead of position N. So it doesn’t work at all.

The only time demotion to single precision results in what you expect is when you declare a double in your kernel–it will be demoted to float at compile time, the size differences will be taken into account, everything’s fine.

also, code that works in emulation does not necessarily work on the device (you could be passing host pointers, for example–emu won’t care, the device will throw an exception).

lutormx · August 6, 2009, 9:42pm

Ok, I see. I think it would be good if the programming guide was a little clearer on this point and made a stronger statement that it’s actually impossible to get doubles onto the device in the first place. (I can’t find it now, but I seem to recall reading something about how doubles still were occupying 8 bytes even though only single precision arithmetic is used, but that’s obviously incorrect.)

YDD · August 7, 2009, 1:24pm

But it’s perfectly possible - cudaMemcpy takes a couple of pointers (of type void*, I believe) and a number of bytes to copy between them. It’s just that, once on the device, the poor little multiprocessors don’t know what to do with the bit patterns :)

What might be better is if the compiler issued a warning when a kernel with double (or double*) arguments was compiled for sm<1.3. And on a similar point, can we have cudaNew, cudaDelete and cudaCopy, which would help minimise these confusions too… perhaps I should put that in the request thread.

tmurray · August 7, 2009, 6:30pm

Welcome to CUDA 2.3! (yes we added that)

lutormx · August 11, 2009, 8:00pm

Hmm, I get no warnings when omitting the sm_13 when compiling my kernels that take double*'s…

Topic		Replies	Views
Doble not working in device code! Is is not supported ? CUDA Programming and Performance	2	1172	February 19, 2009
Double precision in CUDA 2.3 CUDA Programming and Performance	5	38226	March 5, 2010
Reporting a problem while using double precision in cuda 2.0 double precision has a strange behavor& CUDA Programming and Performance	6	6929	December 22, 2008
How to activate double-precision computation CUDA Programming and Performance	4	30407	September 14, 2009
Problem with running code with double precision values Double precision gives wrong result CUDA Programming and Performance	2	1240	August 28, 2009
discrepancy between results from emulation mode and actual device mode CUDA Programming and Performance	1	5492	November 25, 2009
Newb question: Different results for dbg, emu and release CUDA Programming and Performance	4	1404	April 9, 2009
Strange change in behaviour between float and double CUDA Programming and Performance	6	1379	April 1, 2009
Using double precision in CUDA how to turn on double precision in CUDA CUDA Programming and Performance	2	3074	July 27, 2008
Double precision CUDA Programming and Performance	9	6434	June 7, 2009

Double precision numbers, emulation, and compute capability < 1.3

Related topics