Debugging in Nsight reveals two variables, [font=“Courier New”]flattenedBlockIdx[/font] and [font=“Courier New”]flattenedThreadIdx[/font], which might remove the need to manually linearise thread indices when using a multidimensional thread block.
However, I can’t find any documentation on these. Would these be safe to use?
Are you sure these are not things that Nsight shows for your convenience, i.e. they don’t exist in your actual program?
I have no idea. They could very well be, thats why I was curious if anyone knew anything about yhem
The only thing I do not fully understand in the disassembly of various cubins that I’ve seen is this:
MOV R1, c [0x1] [0x100];
This is present at the beginning of every kernel. But it’s a constant and all threads access the same value, so I wouldn’t expect it to be the flattenedBlockIdx or flattenedThreadIdx which is different for every thread.
Seem like these are added by Nsight, so let’s not use them!
Having said that, do you know how the compiler figures out the linear ID of each thread/block in a multidimensional block?
The CUDA C Programming Guide gives the formula in Section 2.2.
Right. That’s how I’ve been manually linearising my thread indices, within the grid:
tid = (blockDim.x*blockDim.y*blockDim.z)*blockIdx.x + (blockDim.x*blockDim.y)*threadIdx.z + blockDim.x*threadIdx.y + threadIdx.x;
What I meant to ask was surely during execution the hardware would need to know the [font=“Courier New”]tid[/font] as I’ve defined above anyway? Therefore can I get at that value without explicitly doing that calculation each time (perhaps in the ptx – though I know nothing about assembly code)?