Is there a way to allocate priority among different ExecutionContexts in TensorRT?

urmydata · December 3, 2020, 2:47am

Hello,

I want to set priority using TensorRT and make preemption by priority in GPU.
As far as I know, it is possible to give priority to a stream by using cuStreamCreateWithPriority() when creating a stream. Also, stream priorities are within a cuda context [1].

Here a question arises.
The context of cuda (CUcontext) and the ExecutionContext of tensorRT (IExecutionContext) are different.
Thus,

Can preemption take place using streams with different priorities on different ExecutionContexts?
For example, ExecutionContext A use high priority stream a and ExecutionContext B use low priority stream b. Then, A can preempt B?
If the answer of #1 is true, then how about among different processes?
For instance, ExecutionContext A use high priority stream a in the application 1 and ExecutionContext B use low priority stream b in the application 2. Then, A can preempt B?
If the answer of #2 is false, then is there way to assign different priorities between different processes?
In addition, is there any relationship between CUcontext and IExecutionContext in terms of priority assignment?
Different CUcontext can not run concurrently (i.e., A GPU never runs work (kernels) from 2 or more contexts simultaneously [1]. However, the IExecutionContexts can run concurrently [2-3]. Actually, I confirmed the running of different ExecutionContexts with different streams simultaneously through the experiment and nsight systems, and it shows great improvement. So, executions behaviors are different, then how about in terms of priority allocation?

Thanks

[1] GPU sharing among different application with different CUDA context - CUDA / CUDA Programming and Performance - NVIDIA Developer Forums
[2] Can I inference two engine simultaneous on jetson using TensorRT? - Jetson & Embedded Systems / Jetson TX2 - NVIDIA Developer Forums
[3] Multiple concurrent Execution Contexts?

AakankshaS · December 8, 2020, 4:23am

Hi @urmydata,
You can use the stream that you have created with priority for inference purposes.
There’s no different behavior than any other CUDA app. For more info about how this works, see:

Thanks!

Topic		Replies	Views
My GPU cannot support cuda stream priority CUDA Setup and Installation tensorrt , kernel	4	389	June 3, 2024
How high priority stream preemption CUDA Programming and Performance	12	7122	November 30, 2022
Priority of concurrent CUDA kernel execution on TX1 Jetson TX1	5	1452	October 18, 2021
How to verify that high priority stream is served CUDA Programming and Performance	12	2336	April 24, 2025
Trouble launching kernels while TensorRT kernels are running TensorRT	0	1008	July 2, 2019
TensorRT cuda strams: kernels in one stream seem to be blocked for a while TensorRT tensorrt , ubuntu , python	2	237	June 3, 2024
Can multiple cudaStream instances share the same tensorrt execution context? TensorRT	2	270	April 3, 2024
CUDA stream priority across processes(Windows OS) CUDA Programming and Performance	1	789	May 31, 2019
cuda stream high priority could not always schedule high prority CUDA Programming and Performance	2	811	July 11, 2019
Concurrent instances of TensorRT TensorRT	0	749	March 9, 2019

Is there a way to allocate priority among different ExecutionContexts in TensorRT?

Related topics