Hello again,
While debugging my CUDA project I have encountered a quite disturbing problem. Simple c-style calculations in my kernel code give completely wrong results when I checked through the single steps with cuda-gdb.
Let me give an example:
float _phi1 = _proj0[n*2] * (_globalMaxs_d[0] - _globalMins_d[0]) + 90.f;
_globalMaxs_d and _globalMins_d are constant parameters and correctly initialized, as checked in the debugger: _globalMaxs_d[0] = 180.f, _globalMins_d[0] = 0.f.
_proj0[n*2] correctly is set to 0.5f. The result should clearly be 180, but it actually is 90!!
I re-checked this 10 times, it is the same for every thread of course and reproducible over many runs.
Changing this line of code to
float _phi1 = __fadd_rn(_proj0[n*2]*(_globalMaxs_d[0] - _globalMins_d[0]), 90.f);
cures the problem. Problems like these appear all over the execution of this particular kernel, and until now could all be “cured” by using the __f**** floating point functions even for the most simple statements. There are also many spots where even statements like the above “cure” don’t work unless I remove all nested functions, in the example above this would mean writing explicitely
float phi1 = __fadd_rn(_globalMaxs_d[0], - _globalMins_d[0]);
_phi1 = __fmul_rn(_phi1, _proj0[n*2]);
_phi1 = __fadd_rn(_phi1, 90.f);
There is definitely something afoul here. Could this have something to do with directly accessing the constant parameters? The only other ideas I have is a problem with the fact that my card (GTX 285) is the primary display device at the moment (I stop the X server for debugging…).
I don’t know if it is of any importance for the problem, but let me describe the layout of the project: The kernel is wrapped inside a C++ interface class accessing C wrapper functions calling the cuda-functions. Actually, the design is very similar to the particles example in the official SDK. I am running Ubuntu 8.10 and I am using CMAKE for compilation.
Thanks ins advance for any insights!