How to balance short real-time vs slow background CUDA tasks?

BareMetalCoder · March 3, 2022, 4:29pm

My application needs to process images coming from a camera as fast as possible. The processing for this is rather short, and fits within a frame period, so it’s all good.

However, I would like to add a long, compute-intensive task that also uses CUDA on a separate thread. This task is not required to run at the frame rate. The result is only needed every few frames.

I don’t want this long task to hold up the GPU and interfere with the real-time processing, so I have broken it into several steps. I trigger the state machine after every incoming frame and it performs a sufficiently small bit of work before returning the GPU to the system.

This works well for me, but I was wondering if there was a better or built-in method/paradigm to manage this kind of behaviour in CUDA. Are there synchronization or “yield” mechanisms?

Thanks!

Robert_Crovella · March 3, 2022, 4:39pm

cuda stream priority may be useful. In order for this to be useful, it will be necessary for your long-running task to “cycle through” threadblocks at some reasonable rate (to give the block scheduler an oppty to insert higher priority blocks).

In a multiprocess setting, there are also possibilities using CUDA MPS to assign resources to processes.

njuffa · March 3, 2022, 6:56pm

@Robert_Crovella Purely from a performance perspective, would there be any reason to assume that these alternatives would be superior to asker’s current scheme? I suspect the answer is “no”, but have no deep insight into the trade-offs from a performance angle.

Robert_Crovella · March 3, 2022, 7:04pm

OP’s own characterisation:

OP asked:

I think its fair to say that stream priority is potentially a built-in method in CUDA to manage this kind of behavior.

Correct, it may not be better.

If the state machine and chunking of work that OP suggests is efficient and has no appreciable gaps in GPU utilization, it’s unlikely IMO that anything would be better than that.

njuffa · March 3, 2022, 7:23pm

Thanks for the quick feedback. I was really just trying to calibrate my own thinking: While the built-in methods have the potential to improve flexibility and ease-of-use, they are unlikely to lead to improved performance.

BareMetalCoder · March 8, 2022, 7:23pm

Understood, and I would expect as much. Thank you nevertheless for the clarification. I was indeed looking to improve flexibility and ease of use - “user friendliness”, in a way.
Thanks to all for the constructive conversation! I don’t post often on this forum, but it’s nice to see an active community!

Topic		Replies	Views
How to verify that high priority stream is served CUDA Programming and Performance	12	2162	April 24, 2025
Questions of CUDA stream priority CUDA Programming and Performance cuda	10	4453	April 19, 2023
How high priority stream preemption CUDA Programming and Performance	12	6950	November 30, 2022
cuda stream high priority could not always schedule high prority CUDA Programming and Performance	2	782	July 11, 2019
CUDA stream priority across processes(Windows OS) CUDA Programming and Performance	1	771	May 31, 2019
Limit number of (or allocate) SM on a per stream basis CUDA Programming and Performance	3	1584	November 14, 2023
handling thread priorities - (how) is it possible? CUDA Programming and Performance	1	2449	September 16, 2007
Load Balancing Streams CUDA Programming and Performance	3	928	February 23, 2017
unable to get the cpu and gpu to run in parallel CUDA Programming and Performance	34	23532	October 7, 2010
Prioritization of GPU time between CUDA and DirectX CUDA Programming and Performance cuda	2	732	April 29, 2023

How to balance short real-time vs slow background CUDA tasks?

Related topics