You have a number of active threads that the physical “GPU cores” are context switching between.
The number of active threads will depend on their resource requirements (register, shared memory) or hit the upper limits specified by your particular GPU:s compute capability (ex max 1024 threads per SM, and then you have N SM:s on your GPU).
The number of threads executing each clock-cycle should be equal to the total number of FPU:s/SP:S/“CUDA cores” on your device ( ~3500 ish on your card), so #warps = NbCores / 32.