Prioritization of GPU time between CUDA and DirectX

I have two processes. Process 1 renders a simulation using DirectX11, while process 2 performs calculations using CUDA. Naively running both processes in parallel causes unacceptable stuttering of the simulation, as process 2 steals too much GPU time from the simulation.

I want to maintain a stable 30fps for the simulation while performing CUDA calculations whenever process 1 is idle and not using DirectX. How can I configure the system to ensure that process 1 (or DirectX) always gets GPU priority when requested?

Alternatively, is it possible to artificially slow down the CUDA calculations by inserting sleep commands to give process 1 more GPU time?

I’m not aware of any settings or any method to tell the GPU to adjust the priority of graphics work relative to CUDA/compute work.

It is certainly possible to slow down the rate at which you issue CUDA kernels by modifying your CUDA code, and it might be that such adjustment would improve your observations, I don’t know.

Also, if I were attempting something like this, I would prefer trying it on a cc6.x (Pascal) or later GPU, as these GPUs have better support for preemption. But I don’t know if it would make any actual difference in any particular case.

Finally, you might as another option do some careful profiling to see what is the duration distribution of your GPU kernels. By constructing your GPU work issuance in some way so as to limit the worst-case kernel duration to something on the order of 0.1 seconds or less, you may possibly witness some improvement in the responsiveness of the UI.

I concur with the recommendation of finer granularity for the compute kernels. If I recall correctly, conventional wisdom on UI reponsiveness is that delays below 20 milliseconds are generally imperceptible to humans. So try to keep the runtime of compute kernel below that. Also, check that data transfers between host and device (in either direction) stay below this time limit.

Another approach worth trying in conjunction with finer granularity of compute work is to adjust the relative process priority of the two processes via operating system specific means, with the compute process having lower priority than the rendering process. The higher priority process has, on average, more chances to submit work to the GPU.

A “big mallet” approach would be to use a faster GPU.