OpenCL and GPU resource usage

Dear All,
I’m facing a problem regarding the usage of the computing power of my GPU.
Here is my problem: I have a notebook with a GTX460M that I’m using for demonstration.
During the demonstration I run 3 different programs, each using its own openCL kernel, meaning that during the presentation the GPU has to work on windows 7 graphic interface and 3 kernels. Obviously the amount of work for this GPU is relevant and the graphic interface becomes slow. To keep the graphic interface fluid enough I decided to reduce the number of threads that each kernel can use during the clEnqueueNDRangeKernel. The idea behind it is: I have 100 cores, each kernel uses 25 cores, I have still 25 cores for the system graphic interface.
My idea implies also that running a single kernel on 25 cores I should have a 25% usage of my GPU work load.
But this is not true, if I run single kernel the GPU usage reaches 99% work load even using a single core. How this is possible?
To check the GPU workload I use GPU-Z.
Is there a way to limit the core usage? What is the mechanism behind clEnqueueNDRangeKernel and the global_work_size setting?
Thank you for your help.

First you have 192 core on a GTX 460M (see nVidia specifications).

With current OpenCL Drivers from nVidia, there’s no way to launch simultaneously multiple OpenCL kernels, the CPU is just locked by the each kernel, exactly as it was on old CUDA 1.0 Drivers. So whatever number of threads you launch, the GPU will be unavailable (that is what GPU-Z reports, nor real GPU core usage).

You’d better change your strategy:

    [*]Launch at least 192 core x 24 threads (4608 threads) to exploit your GPU ad-minima

    [*]Make you kernel run as fast as possible, and if not possible to finish work within 1/50th second, …

    [*]Use a Persistence strategy to stop your kernel and relaunch it, conserving it’s state between the 2 launch (context should be kept!)

1 Like

I see, then reducing the resources available for each kernel I’m simply going in the wrong direction because each kernel will be slower, increasing the graphic interface performance degradation because the GPU will be busy for more time.
It is the same strategy used when programming openGL, to keep the program fluid you have to be sure that the program can be executed in a fraction of time compatible with the frame rate.
Thank you for your help. For the future I have to pay more attention on the driver implementation. Are the information you gave me available on the Nvidia website?