In this thread, I just want to understand what is happening, why and how to counteract it. I tend to attract strange behaviour of hardware/software :)
In general, there are some smaller or bigger situations that one could trigger this behaviour (accidently or not). In my case I am working with my simple assertion system for the kernels running on the device (not deviceemu).
For a kernels hundred of lines long tracking a bug is not an easy task and deviceemu has proven to be not very helpful. Certain parts of the code will work on device while on deviceemu will fail, most likely due to the fact that ‘warpSize’ on deviceemu is 1 and not 32.
Therefore, instead of fighting with device and deviceemu compatibility, I am running my (often unfinished) code immediately on GPU guarded with lots of asserts. When one fails, I want the kernel to terminate as quickly as possible and report the correct problem. Currently the macro looks like follows:
[codebox]#define CUDA_ASSERT(condition , raiseErrorCode, ad1,ad2,ad3,ad4) do { \
if (!(condition)) { \
if (atomicAdd(&_globalError->accessSemaphor,1)==0) { \ <-- to ensure exactly one thread is reporting the problem
_globalError->errorCode=(raiseErrorCode); \
_globalError->dimGrid=gridDim; \
_globalError->dimBlock=blockDim; \
_globalError->block=blockIdx; \
_globalError->thread=threadIdx; \
_globalError->a.x=(ad1); \ <-- helper values, can be anything
_globalError->a.y=(ad2); \
_globalError->a.z=(ad3); \
_globalError->a.w=(ad4); \
} \
__threadfence(); \
return; \
} \
if (_globalError->accessSemaphor>0) \ <-- maybe some other thread ended in error state (could be in a different assert)
return; \
} while(0)
[/codebox]
Once kernel is terminated, I load to host _globalError and check if errorCode is nonzero. If it is, I output to the screen all data associated with the event.
Note I don’t want to use __syncthreads() in the macro, because I want it to be usable in branches as well.
Macro will work only in global function, otherwise ‘return’ won’t terminate the kernel. If I am not mistaken there is no exit() function which could terminate the thread from anywhere, is there?
This simple mechanism has proven to be very helpful for me on several occasions already, but sometimes tricky things like those described above occur and I get confused.