number of threads on device at given time

Hi,

I’m trying to figure out the maximum number of threads that could be being executed at any given instant in time. A pdf from NVIDIA’s webinar says,

  1. 1 thread block = 32 threads = Warp

  2. 1 Warp is executed physically on a multiprocessor.

So, at any given instant in time,

N_threads_total = N_multiprocs * N_threads_per_warp

To take a specific example, on the GTX 280 there are 30 multiprocessors, and therefore 30*32 = 960 threads being executed any any given time. Does this look right to you guys?

Thanks,

-Nachiket

It depends on what your definition of “instant” and “execute” are. :)

For the GTX 280:

  • Number of threads completing an instruction in clock cycle: 30 multiprocessors * 8 stream processors/multiprocessor = 240
  • Number of threads propagating through the instruction pipeline: 30 multiprocessors * [hard to say, at least 192 though] = >5760
  • Max number of threads active on the device: 30 multiprocessors * 1024 threads/multiprocessor = 30720

Active threads have been allocated registers and shared memory for their entire lifetime (this is why thread switching in CUDA has basically no overhead), so the last line does represent an instantaneous commitment of chip resources.

So take your pick. :)

Well it goes like this, each SM contains 8 SP’s, each capable of running a thread of its own. Does doing 8 threads per “fast” clock cycles

Now for reasons that can be debated, the instruction scheduler hands out a new instructions once every 4 “fast” clock cycles. So in order to keep all of the SP’s busy they’ve devided it up into 32 threads ( 8 threads/cc * 4 cc).

The instruction scheduler runs at half the speed of the SP’s (“slow clock cycles”).

There are a number of theories of why its been designed this way and I’m also looking for more answers.

As to your question that would mean that only 8 threads * 30 SM = 240 threads are being executed at a fast clock cycle ( typically 1200 mhz ). So as seibert said take your pick

//jp