Number of threads physically executing in parallel per core? Whats the physical level of parallelism

I realize this is probably more of a hardware question and probably not really necessary to think about when developing CUDA applications, but I was wondering what the level of parallelism is for a single warp in a physical sense.

If we have a threadblock assigned to a multiprocessor and this multiprocessor schedules a warp to run, how is this accomplished?

Let’s assume 8 cores per multiprocessor.

How many threads can execute in parallel physically on a single core?
Are all 8 cores potentially involved in executing the warp physically at the same time, or is it only a single core?

I realize this is probably more of a hardware question and probably not really necessary to think about when developing CUDA applications, but I was wondering what the level of parallelism is for a single warp in a physical sense.

If we have a threadblock assigned to a multiprocessor and this multiprocessor schedules a warp to run, how is this accomplished?

Let’s assume 8 cores per multiprocessor.

How many threads can execute in parallel physically on a single core?
Are all 8 cores potentially involved in executing the warp physically at the same time, or is it only a single core?

8 cores execute 8 threads of a warp at the same time, so a warp needs 4 cycles to issue one instruction.

8 cores execute 8 threads of a warp at the same time, so a warp needs 4 cycles to issue one instruction.

Great! Thanks for your precise answer.

Great! Thanks for your precise answer.