I dont think you can find out which core executed which thread. The manual says that the order in which warps inside a block are executed and the order in which the blocks are executed are non-deterministic. The software should not make any assumptions on that.
I met the similar questions,too. I use GeForce 866GT which has 32 streaming cores. I really can’t get the information which cores is runing? How do I know the utility of 32 streaming cores? Thank you.
Actually, you can, using the %physid special register. But you’d need to code PTX to use it. There’s some threads about it from a while back, search for it. Its mention has been deleted from the latest PTX manuals.