I have multiple related questions regarding how CUDA threads are scheduled to run on CUDA core:
First, I need to know if a CUDA thread gets assigned to a specific core from the moment it starts running and until it finishes execution. In other words, we know that warps run concurrently on and SM in the following way: threads in a warp (name it warp 1) start running an instruction, then the warp may stall if threads need sometime to finish the instruction (say wait for the memory because of a load/store instr.), so another warp (name it warp 2) is selected to run during this time, my question is, when warp 1 comes back to continue, does each thread in this warp get assigned to the same physical core that it started running on at the first beginning, or it may be run by another core within the same SM?
Second question is, did this very low-level scheduling get changed/modified by Fermi? and how? I know that Fermi has enhanced the scheduling with the gigaThread scheduling engine for concurrent kernel execution, but i’m concerned about the specific part of “assigning threads to physical cores”.
Third question, if i call the same kernel twice for the same data and everything, just calling it twice consecutively, do i know for sure that each thread will run on the same core it has run on during the first call? since the indexing is the same?
I’ve been trying to find out the answers to these questions and made a lot of search but didn’t find answers.