Concurrent threads on a Kepler (GTX 680)

Hello guys :)

What is the maximum concurrent threads scheduled on a Geforce GTX 680?

  1. According to documentation, the maximum number of resident threads per multiprocessor is 2048.
  2. Since the GTX 680 board has 8 SMX (multiprocessors), does that mean 16,000 threads will be attended concurrently? what does “resident” means?
  3. How does the thread->cuda-core relation works? For instance, this board has 1536 cuda cores, does this means that 1536 threads will be attended concurrently?
  4. How do you calculate speed when using a GPU? the old fashion way had to do with the number of identical processors used:

Speed Up= (Time of the best sequential algorithm to solve problem X)/(Time for p processors to solve problem X in parallel)

Thanks in advance for the help!



  1. Resident thread means that the thread context is on the SMX, i.e. registers are dedicated to a thread. The SMX switches between the resident threads depending on which threads (or more precise which warp) are ready for operation. In many applications the register usage of a kernel is a limiting factor. For example if you have a kernel that uses that maximum of 63 registers (on a GK104 device) you can have only 65536 registers/63 registers/thread ~ 1000 resident threads per SMX. (To calculate this you can use the occupancy calculator excel sheet which is included in the toolkit.)
    That means you can have up to ~ 16000 resident threads on your device.

  2. The term “cuda core” refers to the single precision floating point units. That means up to 1536 floating point operations can be executed in one cycle. There are other units (like DP units, load/store, …), i.e. one warp (a bunch 32 threads) might execute floating point operations, another warp is doing load/store operation, while other warps might be inactive.

  3. I think the most useful and commonly used definition of “speed up” is speed up compared to a CPU: time of CPU core/time of GPU, i.e. you measure how many CPU cores equal one GPU.
    (Comparing the speed up of a whole GPU over one of its cores would not be very meaningful.)

Reading the whitepapers might provide you more information, for example or