I realize this is probably more of a hardware question and probably not really necessary to think about when developing CUDA applications, but I was wondering what the level of parallelism is for a single warp in a physical sense.
If we have a threadblock assigned to a multiprocessor and this multiprocessor schedules a warp to run, how is this accomplished?
Let’s assume 8 cores per multiprocessor.
How many threads can execute in parallel physically on a single core?
Are all 8 cores potentially involved in executing the warp physically at the same time, or is it only a single core?