I’m trying to initialize a unique counter in global memory that is shared by all blocks.
int bx = blockId.x;
int tx = threadId.x;
// done only once for entire launch
if (!bx && !tx) counter = 0;
Would it ever be the case that block 0 was not in the first set of blocks to be executed on the device? If so, this initialization would not work correctly. I don’t know how blocks are scheduled.
It appears that so far all cards start block 0 first, but there is no guarantee. I would initialize it from the host (or add the initialization to the previous kernel).
anyway, that’s not what a kernel is there for, and therefore that if condition is overhead, and not really necessary, simple add a int or long as an argument to the kernel to use a counter.