Can GPU usage be controlled through CUDA code?

The GPU usage through my Cuda code is reaching 100 %.
Is there a way to control the GPU usage?

why you need it? how it’s calculated?

Background : Working on Windows platform CUDA

I have a requirement where GPU needs to be used for rendering purpose.
So if I reduce the GPU usage of CUDA code , then I assume that rendering can use the GPU ( I may be wrong - new to CUDA).

I calculated the GPU usage through Performance monitor.

Does anyone know how “Performance monitor” calculates it? AFAIK, it’s a part of Win10 Task Manager?

So, your question is to perform “rendering” and cuda program simultaneously. Now you should describe what you mean by rendering. Is it some 3D modeling program or just using video card to show your desktop?

The rendering is 3D modeling.

great. so, you need to perform two gpgpu programs simultaneously? one is written in CUDA and second one (3D renderer) is CUDA? or OpenCL?

I know that two CUDA programs can be run simultaneously via MPS service, which perform essentially time-slicing of their kernels so each kernel call should be reasonably fast (100 ms or less)

I assume MPS service is only supported on Linux.
Can you help me out with some other alternative settings through which CUDA will not utilize the entire GPU and simultaneously GPU can be used for OpenGL rendering?

so, it’s opengl? can you try using shorter kernels (100 ms or lower)? my gpu programs doesn’t prevent desktop environment to work

yes, there are hard ways to enforce CUDA program to leave some SMs free. But it shouldn’t be discussed before you will try simpler ways (and i’m not sure whether it will solve your problem)

Well calculated pauses inbetween consecutive kernel launches could provide a chance for 3D rendering and other screen updates in between CUDA work - giving each a fair share of the GPU.

If your kernel has a long runtime, cut the computation work into small slices (each just a few milliseconds long). Optionally add a CPU sleep statement after each completed kernel execution so you would leave the GPU unused for a well defined time. By adjusting the ratio of sleep time vs kernel runtime you can control the CUDA related GPU utilization pretty accurately. Tools like cuda-z or nvidia-smi perform some averaging of the instantaneous GPU utilization (a binary on or off state!) within a short time window and will report a value between 0 and 100% as a result.

The typical kernel runtime (plus the optional sleep statement) should not exceed 1/60th of a second if you want other software to be able to do 3D rendering at smooth 60 FPS while your CUDA application computes.

Pascal devices might already be able to preempt long running kernels with other work, but I am not aware of any APIs to control this behavior from an application.