Our C++/CUDA application is implemented with CPU multithreading and at each thread has to run 4 CUDA Kernels.
When we increase the number of CPU threads, the GPU usage increases 11/12% aprox. per each CPU thread launched. That is normal and logical.
The problem comes up when we run more than 6 CPU threads, then the GPU stucks at its 65-70% usage and the performance starts decreasing due to launching more threads with the same gpu usage as a result. It seems like if it would be a kind of limit in the Operative System (Windows) or the gpu has reached the maximum number of its possible active threads (I don’t understand it very well yet).
We are developing in Visual Studio.
It has to do with how the CUDA kernels are implemented (#blocks, #threads)?
Thank you in advance