Basic calculations fail in kernel code

Hello again,

While debugging my CUDA project I have encountered a quite disturbing problem. Simple c-style calculations in my kernel code give completely wrong results when I checked through the single steps with cuda-gdb.

Let me give an example:

float _phi1 = _proj0[n*2] * (_globalMaxs_d[0] - _globalMins_d[0]) +  90.f;

_globalMaxs_d and _globalMins_d are constant parameters and correctly initialized, as checked in the debugger: _globalMaxs_d[0] = 180.f, _globalMins_d[0] = 0.f.

_proj0[n*2] correctly is set to 0.5f. The result should clearly be 180, but it actually is 90!!

I re-checked this 10 times, it is the same for every thread of course and reproducible over many runs.

Changing this line of code to

float _phi1 = __fadd_rn(_proj0[n*2]*(_globalMaxs_d[0] - _globalMins_d[0]), 90.f);

cures the problem. Problems like these appear all over the execution of this particular kernel, and until now could all be “cured” by using the __f**** floating point functions even for the most simple statements. There are also many spots where even statements like the above “cure” don’t work unless I remove all nested functions, in the example above this would mean writing explicitely

float phi1 = __fadd_rn(_globalMaxs_d[0], - _globalMins_d[0]);

_phi1 = __fmul_rn(_phi1, _proj0[n*2]);

_phi1 = __fadd_rn(_phi1, 90.f);

There is definitely something afoul here. Could this have something to do with directly accessing the constant parameters? The only other ideas I have is a problem with the fact that my card (GTX 285) is the primary display device at the moment (I stop the X server for debugging…).

I don’t know if it is of any importance for the problem, but let me describe the layout of the project: The kernel is wrapped inside a C++ interface class accessing C wrapper functions calling the cuda-functions. Actually, the design is very similar to the particles example in the official SDK. I am running Ubuntu 8.10 and I am using CMAKE for compilation.

Thanks ins advance for any insights!

That’s mighty odd. Where does _proj0 lie?

I’ve just tried

__constant__ float _proj0[1] = { 0.5f }; 

__constant__ float _globalMins_d[1] = { 0.f };

__constant__ float _globalMaxs_d[1] = { 180.f };

__global__ void wut(float p[])


	float _phi1 = _proj0[0] * (_globalMaxs_d[0] - _globalMins_d[0]) +  90.f;

	p[0] = _phi1;


The result is, obviously, 180.0.

Going to need source, preferably a small repro case.

“small” will be a problem. Are there any ideas what could generally cause such a behaviour ? Maybe some memory corruption, something I mess up completely without noticing?

I can try to isolate something :)

If I knew, I wouldn’t be asking for a repro case :)

I’ll try to deliver something over the weekend.

If there is enough motivation amongst you I could also supply the whole code :)