Hello everyone.
I’m a beginner in CUDA programming.
I am having a weird problem. When i call the kernell using a limited amount of threads, my code works fine.
But when the number of threads is higher then that limit, i have a if to verify this condition and no processing is made. That’s when the problem occurs. I think that all threads in the block, that only have threads with id higher then my limit, process so quickly that the host prints wrong results for me.
Here is my kernel code:
[codebox]void test(Noh* cabeca, Noh* FPTree) {
int id = blockIdx.x * blockDim.x + threadIdx.x;
if(id <= ID_MAXIMO){
int j = 0;
for(int i = cabeca[id].irmao; i != -1; i = FPTree[i].irmao)
++j;
cabeca[id].frequencia = j;
}
}[/codebox]
So my question is: Is there a way to force the host to only continue execution when all the threads from all blocks have finished processing?
Thanks for the help!