I will describe my problem, and I won’t post any code because it is too long, and it is difficult to locate the bug.
So, I have a Tesla C1060 on a Fedora 10 machine, with CUDA 2.1. Trying to debug my code (I cannot use cuda-gdb because I get an error caused by a bug known to CUDA developers) I store variables in the global memory and I read them back.
I my kernel code I call a device function that calls another device function which returns an integer which is not correct. HOWEVER, if I use the global memory (for debugging purposes) to store internal variables of the function, it returns the correct integer value!!
In emulation the function returns the correct value regardless if I store the intermediate variables in the global memory.
Has anybody experienced similar problems? What other information would you like me to give you to understand the problem better?
Thanks!