Concurrent threads on a Kepler (GTX 680)

ElGuapo_Oficial · July 8, 2014, 12:07am

Hello guys :)

What is the maximum concurrent threads scheduled on a Geforce GTX 680?

According to documentation, the maximum number of resident threads per multiprocessor is 2048.
Since the GTX 680 board has 8 SMX (multiprocessors), does that mean 16,000 threads will be attended concurrently? what does “resident” means?
How does the thread->cuda-core relation works? For instance, this board has 1536 cuda cores, does this means that 1536 threads will be attended concurrently?
How do you calculate speed when using a GPU? the old fashion way had to do with the number of identical processors used:

Speed Up= (Time of the best sequential algorithm to solve problem X)/(Time for p processors to solve problem X in parallel)

Thanks in advance for the help!

Regards!

hadschi118 · July 8, 2014, 9:07am

Hi!

Resident thread means that the thread context is on the SMX, i.e. registers are dedicated to a thread. The SMX switches between the resident threads depending on which threads (or more precise which warp) are ready for operation. In many applications the register usage of a kernel is a limiting factor. For example if you have a kernel that uses that maximum of 63 registers (on a GK104 device) you can have only 65536 registers/63 registers/thread ~ 1000 resident threads per SMX. (To calculate this you can use the occupancy calculator excel sheet which is included in the toolkit.)
That means you can have up to ~ 16000 resident threads on your device.
The term “cuda core” refers to the single precision floating point units. That means up to 1536 floating point operations can be executed in one cycle. There are other units (like DP units, load/store, …), i.e. one warp (a bunch 32 threads) might execute floating point operations, another warp is doing load/store operation, while other warps might be inactive.
I think the most useful and commonly used definition of “speed up” is speed up compared to a CPU: time of CPU core/time of GPU, i.e. you measure how many CPU cores equal one GPU.
(Comparing the speed up of a whole GPU over one of its cores would not be very meaningful.)

Topic		Replies	Views
What is the difference between SP and CUDA core? CUDA Programming and Performance	7	7125	October 12, 2021
Maximum of threads On 8600GT CUDA Programming and Performance	6	3569	April 9, 2008
Increased number of concurrent kernels for kepler? How many concurrent kernels can a kepler card lau CUDA Programming and Performance	7	4379	March 30, 2012
Maximum number of threads in a GPU CUDA Programming and Performance cuda	5	5407	December 29, 2022
Experimentally determining the number of concurrent threads CUDA Programming and Performance	5	1077	March 28, 2019
How many thread are executed at the same time ? CUDA Programming and Performance	9	7676	January 21, 2024
How many concurrent threads are running on my GeForce GTX 1080 Ti? CUDA Programming and Performance	4	25967	January 3, 2018
What would happen with my program on Kepler with 1024 threads, 8 blocks, 32 reg max? CUDA Programming and Performance	16	4278	June 13, 2012
Scheduling threads as Warps CUDA Programming and Performance	3	872	July 11, 2013
max number of threads CUDA Programming and Performance	3	3158	October 10, 2008