Jetson Nano concurrent streams

Description

I am running inferences on Nvidia Jetson Nano, which has the tegra X1 chip, and I am wondering if multiple cuda steams are supported on there in the gpu hardware. For example I want to run a heavy inference network taking 300ms, but I want to be able to concurrently run another tensorrt inference network that only takes 50ms. Now the insight profiler shows that my second concurrent network is waiting for the first one, that is already running, to end. I have different cuda streams created per network. I also tried the cudaStreamCreateWithPriority but that did not work. The lower prio was 0 and the highest is -1. But the result was the same. The running network did not get preempted by the newly started inference with higher priority.

Is it at all possible what I am trying to do here using the nvidia jetson nano? How many cuda streams are possible? I saw that concurrentKernels is 1 and asyncEngineCount is also 1 in the device properties.

Environment

Jetpack: 4.6
GPU Type: Jetson Nano, Tegra X1

Hi,

This looks like a Jetson issue. Please refer to the below samples in case useful.

For any further assistance, we will move this post to to Jetson related forum.

Thanks!

I found in the cuda part of the forum indications that it can’t be done. the docs say “may” preempt when using another priority cuda stream, but it is’n guaranteed to preempt, so I will have to find another solution.

Hi,
Do you run two processes, or single process with two threads? Please check
Concurrent kernel execution on TX2/AGX - #2 by AastaLLL

We would suggest run single process with two threads.

I run 1 process, and each tensorrt network is in a separate context and each has it’s own cudaStream, and all memory is allocated using cudaHostMalloc and then inference is enqueud for each stream in separate threads.
After some more testing with nsight: It does run concurrently as stated in the docs, but really preempting is not possible if I understood correctly from other posts in the cuda sections. So now as workaround we are aligning the inferences in time so they interfere with each other as minimal as possible.
Thanks for the replies!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.