I have a code which works fine in single precision mode ( emulation, device, and device-debug) and in double precision mode ( emulation, device-debug, but not on the device). I am using Cuda 2.3 on FC9 with the 190.18 driver - could this be a compiler/cuda issue? I am concerned that it works with the device debug flags but “-g -G” but not in normal mode -O0 or -O3. Could this be a bug/driver issue/ or some interaction with FC9? Usually when things like this occur - it is an uninitialized value - but since it works perfectly in single precision it doesn’t seem to be so here.
I’ll try - but since I can’t post the code here - it’s hard to be precise.
My kernel performs some operations and comes back with a vector. At the end of the first call to the kernel (on the device, no debug ) that array is filled with large numbers (think 10E41 - 10E231). Later on, these numbers become NaNs. The kernel does not crash on the GPU - it merely produces incorrect results - only in DP mode.
Well, as a general rule of thumb, any time you start seeing 1e41 appearing in your calculations, you know that you’ve screwed up badly. You say this runs in single precision OK. Is that with the exact same input data? Have you checked against another card? To get much more help out of this forum, you’ll have to give some code which shows the bug. If it’s a big kernel, try reducing it to a minimal size which reproduces the bug.
Yes, exactly. It runs on the same exact input data in single precision mode, and in double precision ( device debug and emulation ). It runs correctly on the device in debug - and that is the strange thing. I’ll try reducing it.
Thanks a lot for this!. It was this bug. The workaround -Xopencc -O0 worked. This was a frustrating bug, and I wonder how I missed that old post. I’ll try with Cuda 3.0 beta, and I’ll post here to let you know.
This bug still seems to exist in CUDA 3.1, and the -Xopencc -O0 work around doesn’t seem to work either. Tested on Tesla. Which sucks. The workaround still seems to work on GTX 280 with Cuda 2.3
This bug still seems to exist in CUDA 3.1, and the -Xopencc -O0 work around doesn’t seem to work either. Tested on Tesla. Which sucks. The workaround still seems to work on GTX 280 with Cuda 2.3