i’m using a big grid of various dimensions for the call of a kernel. Depends on problem, i resize that dimensions.
My problem now arises when i calculate a global id for each thread executing kernel. The formula i used is:
((blockDim.x * (blockIdx.x + blockIdx.y * gridDim.x) + threadIdx.x))
and i think is right to get my unique id per thread, but it isn’t.
An example of my execution is:
name_kernel <<< Grid, 256 >>> (params…)
where Grid, in the last execution, was (464, 464, 0)