Concurrent kernel execution on TX2/AGX


My application runs on Jetson TX2 target. It has a detector and a tracker running concurrently on two streams. I have profiled my application using Nsight Systems. Tasks of Detector is scheduled on Thread ID 26335 and executes on Stream 17 while Tracker is on Thread ID 26340 and executes on Stream 44 (See figure).

According to, it was advised if multiple jobs are to be executed concurrently then we need to start it by different CUDA stream.

I believe I’m doing so in my application yet I do not observe any concurrent execution between kernels “trtwell_fp16x2_hcudnn_fp16x2_128x64_relu_medium_nn” and “pyrDown” (OR) “poolCHW_RS3_UV2_PQT_kernel” and “pyrDown”.

My question is: How to execute tasks from multiple streams concurrently on TX2/AGX plaform?



You will need to put the tasks into one application but different threads.

Since one CPU process creates one CUDA context.
If the CUDA tasks running on GPU are in different processes, they will run in different CUDA context.
The GPU resource for different CUDA contexts are time-sliced, indicating the kernel can’t run in parallel: