I am trying to debug unstable block timings (sometimes a certain block runs quickly, sometimes it seems to stall and take 10x longer than usual.) This is with monte carlo code so it’s hard to reproduce… the problem tends to jump to new blocks for every run. To aid debugging I am measuring per-block runtime with clock() sentinels at start and end.
For this debugging I want to see if “brother” blocks are slowed at the same time… a brother block being run on the same SM simultaneously. This is abstracted away from CUDA so blocks don’t really know where they’re being run, but now I’d like to know to help debugging.
I do remember a thread here on the forums from two years ago or so (?) which found that SM IDs could be found by sneaking illegal peeks into unallocated shared memory.
Something weird (and clearly an unsupported hack!!) like (from memory):
extern __shared__ unsigned int array[];
int mySMnumber=array[-3];
I’m unsuccessful finding this thread again via Google (or the forum’s own less powerful search). Does anyone remember the details or have a link?
And yes, I do realize how ugly and unsupported this hack is…
Thanks!