The code I wrote was compiled by two modes: emulation and release mode. Now the problem is that:
-
The emulated code runs ok, but the program generated from normal release mode failed to run: cuitlCheckMsg() CUTIL CUDA error: … unspecified launch failure. I am wondering whether there are some bugs in my code though I have checked my code, on the other hand, the codes runs over in the emulation mode, though,
-
The result got from the emulation run varied among different kernel callings. In my code, I just repeated several times to call the kernel, the code organization seems as,
…
kernel <<< grid, threads >>>( … );
kernel <<< grid, threads >>>( … );
kernel <<< grid, threads >>>( … );
…
however, after the execution of the kernel, I copied device memory to host memory, but the results from each call of the kernel are not consistent.
Could any one please help me? Any suggestion is appreciated.
BTW, my card is GTX 285 under Fedora 8.