double precision bug?

I have a code which works fine in single precision mode ( emulation, device, and device-debug) and in double precision mode ( emulation, device-debug, but not on the device). I am using Cuda 2.3 on FC9 with the 190.18 driver - could this be a compiler/cuda issue? I am concerned that it works with the device debug flags but “-g -G” but not in normal mode -O0 or -O3. Could this be a bug/driver issue/ or some interaction with FC9? Usually when things like this occur - it is an uninitialized value - but since it works perfectly in single precision it doesn’t seem to be so here.

-Nachiket

My crystal ball is a bit foggy today. Could you be a little more precise about what (appears) to be going wrong?

Are you compiling the double precision verion for compute 1.3 architecture?

Yes.

OK, so care to elaborate a bit about what the problem is? “Doesn’t work” is pretty general.

I’ll try - but since I can’t post the code here - it’s hard to be precise.

My kernel performs some operations and comes back with a vector. At the end of the first call to the kernel (on the device, no debug ) that array is filled with large numbers (think 10E41 - 10E231). Later on, these numbers become NaNs. The kernel does not crash on the GPU - it merely produces incorrect results - only in DP mode.

-Nachiket

Well, as a general rule of thumb, any time you start seeing 1e41 appearing in your calculations, you know that you’ve screwed up badly. You say this runs in single precision OK. Is that with the exact same input data? Have you checked against another card? To get much more help out of this forum, you’ll have to give some code which shows the bug. If it’s a big kernel, try reducing it to a minimal size which reproduces the bug.

Yes, exactly. It runs on the same exact input data in single precision mode, and in double precision ( device debug and emulation ). It runs correctly on the device in debug - and that is the strange thing. I’ll try reducing it.

-Nachiket

Looks like that old bug again…

[topic=“103726”]different output when compiled for emulation, device, and device with -g -G[/topic]

[topic=“105700”]Changing from float to double generate wrong result[/topic]

Could you try with CUDA 3.0 beta so that we know whether this bug is finally fixed?

What is your register usage or shared-memory usage count ? U mite be overshooting there making the kernel not to launch at all ! … just a wild guess…

Sylvain,

Thanks a lot for this!. It was this bug. The workaround -Xopencc -O0 worked. This was a frustrating bug, and I wonder how I missed that old post. I’ll try with Cuda 3.0 beta, and I’ll post here to let you know.

-Nachiket

This bug still seems to exist in CUDA 3.1, and the -Xopencc -O0 work around doesn’t seem to work either. Tested on Tesla. Which sucks. The workaround still seems to work on GTX 280 with Cuda 2.3

-Nachiket

This bug still seems to exist in CUDA 3.1, and the -Xopencc -O0 work around doesn’t seem to work either. Tested on Tesla. Which sucks. The workaround still seems to work on GTX 280 with Cuda 2.3

-Nachiket