In order to investigate the power and thermal distribution,
we would like to know the physical locations of the thread blocks or threads?
Are there any instructions we can use to look into the mechanism of CUDA to allocate the thread block or threads to physical cores?
Can anyone give me some advice or information?
I would appreciate that
Scheduling of blocks to MPs is all done in the hardware scheduler. There is no way to modify it in software in CUDA.
In older versions of the PTX manual there was documented a special register that gave the ID of the multiprocessor running the current thread. The reference to it has since been removed. I don’t know if the 2.0 version of ptxas will still assemble it. But you can try it. Start by looking in the README file of decuda, and search the forums.