I dont think you can find out which core executed which thread. The manual says that the order in which warps inside a block are executed and the order in which the blocks are executed are non-deterministic. The software should not make any assumptions on that.
Actually, you can, using the %physid special register. But you’d need to code PTX to use it. There’s some threads about it from a while back, search for it. Its mention has been deleted from the latest PTX manuals.