Jetson Nano concurrent streams

deblauwetom · November 29, 2022, 3:26pm

Description

I am running inferences on Nvidia Jetson Nano, which has the tegra X1 chip, and I am wondering if multiple cuda steams are supported on there in the gpu hardware. For example I want to run a heavy inference network taking 300ms, but I want to be able to concurrently run another tensorrt inference network that only takes 50ms. Now the insight profiler shows that my second concurrent network is waiting for the first one, that is already running, to end. I have different cuda streams created per network. I also tried the cudaStreamCreateWithPriority but that did not work. The lower prio was 0 and the highest is -1. But the result was the same. The running network did not get preempted by the newly started inference with higher priority.

Is it at all possible what I am trying to do here using the nvidia jetson nano? How many cuda streams are possible? I saw that concurrentKernels is 1 and asyncEngineCount is also 1 in the device properties.

Environment

Jetpack: 4.6
GPU Type: Jetson Nano, Tegra X1

NVES · November 29, 2022, 3:37pm

Hi,

This looks like a Jetson issue. Please refer to the below samples in case useful.

For any further assistance, we will move this post to to Jetson related forum.

Thanks!

deblauwetom · November 29, 2022, 7:21pm

I found in the cuda part of the forum indications that it can’t be done. the docs say “may” preempt when using another priority cuda stream, but it is’n guaranteed to preempt, so I will have to find another solution.

DaneLLL · November 30, 2022, 9:11am

Hi,
Do you run two processes, or single process with two threads? Please check
Concurrent kernel execution on TX2/AGX - #2 by AastaLLL

We would suggest run single process with two threads.

deblauwetom · November 30, 2022, 4:51pm

I run 1 process, and each tensorrt network is in a separate context and each has it’s own cudaStream, and all memory is allocated using cudaHostMalloc and then inference is enqueud for each stream in separate threads.
After some more testing with nsight: It does run concurrently as stated in the docs, but really preempting is not possible if I understood correctly from other posts in the cuda sections. So now as workaround we are aligning the inferences in time so they interfere with each other as minimal as possible.
Thanks for the replies!

system · December 21, 2022, 5:18am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Issue in making streams concurrent Jetson AGX Xavier	6	918	April 11, 2019
cudaStreamCreateWithPriority on TK1 Jetson TK1	1	1271	September 2, 2014
Jetson TK1 CUDA streams Jetson TK1	2	853	June 12, 2015
Unable to achieve concurrency in kernel launches CUDA Programming and Performance	2	896	February 12, 2016
Priority of concurrent CUDA kernel execution on TX1 Jetson TX1	5	1382	October 18, 2021
TensorRT on Multiple CUDA-Streams GPU-Accelerated Libraries	1	2440	May 9, 2018
Is concurrent kernel execution really possible on TK1 ? Jetson TK1	0	519	March 31, 2016
Concurrent task execution from multiple processes on Jetson TX2 Jetson TX2	5	1510	October 18, 2021
My streams are not running concurrently CUDA Programming and Performance	7	1839	March 6, 2018
Thread vs Stream what is the difference? CUDA Programming and Performance	6	5249	December 2, 2010

Jetson Nano concurrent streams

Description

Environment

Related topics