My application runs on Jetson TX2 target. It has a detector and a tracker running concurrently on two streams. I have profiled my application using Nsight Systems. Tasks of Detector is scheduled on Thread ID 26335 and executes on Stream 17 while Tracker is on Thread ID 26340 and executes on Stream 44 (See figure).
According to https://devtalk.nvidia.com/default/topic/1024457/jetson-tx2/concurrent-task-execution-from-multiple-processes-on-jetson-tx2/, it was advised if multiple jobs are to be executed concurrently then we need to start it by different CUDA stream.
I believe I’m doing so in my application yet I do not observe any concurrent execution between kernels “trtwell_fp16x2_hcudnn_fp16x2_128x64_relu_medium_nn” and “pyrDown” (OR) “poolCHW_RS3_UV2_PQT_kernel” and “pyrDown”.
My question is: How to execute tasks from multiple streams concurrently on TX2/AGX plaform?