All cudaStreamSynchronize() hang in tensorRT thread, run on jetson orin

simon.wei · December 22, 2023, 2:59pm

Hi all,
All my tensorRT threads hang at cudaStreamSynchronize ()
This is the part of my bt in gdb:
Thread 35 (Thread 0xfffe590b5500 (LWP 3604513)):
#0 0x0000ffff88069938 in ioctl () from target:/lib/aarch64-linux-gnu/libc.so.6
#1 0x0000fffeef8d5e98 in ?? () from target:/lib/libnvrm_host1x.so
#2 0x0000fffeefdec0ac in ?? () from target:/lib/libcuda.so.1
#3 0x0000fffeefca2de8 in ?? () from target:/lib/libcuda.so.1
#4 0x0000fffeefcbae34 in ?? () from target:/lib/libcuda.so.1
#5 0x0000fffeefd16f54 in ?? () from target:/lib/libcuda.so.1
#6 0x0000fffeefd220d8 in ?? () from target:/lib/libcuda.so.1
#7 0x0000fffeefd35298 in ?? () from target:/lib/libcuda.so.1
#8 0x0000ffff841b1568 in ?? () from target:/apollo/bazel-bin/modules/rs_perception/trafficlight/tfl_nn/component/…/…/…/…/…/_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccudart___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib/libcudart.so.11.0
#9 0x0000ffff8420c974 in cudaStreamSynchronize () from target:/apollo/bazel-bin/modules/rs_perception/trafficlight/tfl_nn/component/…/…/…/…/…/_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccudart___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib/libcudart.so.11.0
…
…
Thread 33 (Thread 0xfffe5b1db500 (LWP 3604511)):
#0 0x0000ffff88069938 in ioctl () from target:/lib/aarch64-linux-gnu/libc.so.6
#1 0x0000fffeef8d5e98 in ?? () from target:/lib/libnvrm_host1x.so
#2 0x0000fffeefdec0ac in ?? () from target:/lib/libcuda.so.1
#3 0x0000fffeefca2de8 in ?? () from target:/lib/libcuda.so.1
#4 0x0000fffeefcbae34 in ?? () from target:/lib/libcuda.so.1
#5 0x0000fffeefd16f54 in ?? () from target:/lib/libcuda.so.1
#6 0x0000fffeefd220d8 in ?? () from target:/lib/libcuda.so.1
#7 0x0000fffeefd35298 in ?? () from target:/lib/libcuda.so.1
#8 0x0000ffff841b1568 in ?? () from target:/apollo/bazel-bin/modules/rs_perception/trafficlight/tfl_nn/component/…/…/…/…/…/_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccudart___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib/libcudart.so.11.0
#9 0x0000ffff8420c974 in cudaStreamSynchronize () from target:/apollo/bazel-bin/modules/rs_perception/trafficlight/tfl_nn/component/…/…/…/…/…/_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccudart___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib/libcudart.so.11.0
…
…

I have 8 stream and Running in parallel in 8 threads，all the 8 threads hang at cudaStreamSynchronize (), and they have same backtrace.
How can I fix it?

thanks

AastaLLL · December 25, 2023, 6:13am

Hi,

How long do you wait for the cudaStreamSynchronize() to finish?

By default, CPU launches GPU tasks without waiting for them to finish.
After calling synchronize, it’s expected that the CPU will block utils all GPU tasks are done.

Thanks.

simon.wei · December 25, 2023, 6:28am

Hi AastaLLL,
I would call cudaStreamSynchronize() after calling enqueueV3() to wait for the GPU to return.
When running correctly, it will wait about 10-50ms for the result to be returned. But here’s the error, after 10min of waiting it still hangs in the same place.
Thanks.

AastaLLL · December 25, 2023, 6:31am

Hi,

Could you share a sample and steps to reproduce this?
We want to check it further internally.

Thanks.

simon.wei · December 25, 2023, 7:07am

Hi,
Our test cases contain some trade secrets that make it difficult to provide a simple sample.
But I have some other information, there are 3 tensorRT networks running in these 8 streams, these 3 networks don’t have this problem when they are running in 3 separate processes, but the scheduling is very inefficient, so we merged these 3 networks into the same process, and this glitch occurs.
could you provide some debug suggestions so that we can get more information?

Thanks.

AastaLLL · December 26, 2023, 5:46am

Hi,

Please try to profile the application with our Nsight System.
It should give you some hints about where the freeze happened.

More, could you check if there is any error message in dmesg?

$ sudo dmesg

Thanks

simon.wei · December 29, 2023, 3:25am

Hi,
I can only see all the thread backtraces, and all GPU infer threads hang at cudaStreamSynchronize (), dmesg has no information when this problem occurs because the program did not quit. And I don’t know how to use nsys to get more information about this bug.
I did some control tests, I tried to reduce the number of infer threads and when I remove any two of them, the problem doesn’t occur.
Is there a limit to the number of streams for tensorRT on jetson orin?

Thanks

AastaLLL · January 4, 2024, 5:51am

Hi,

Could you try to increase the CUDA queue to see if it helps?

$ export CUDA_DEVICE_MAX_CONNECTIONS=32

Below is the introduction of CUDA_DEVICE_MAX_CONNECTIONS for your reference:
https://docs.nvidia.com/deploy/mps/index.html#topic_5_2_4

Thanks.

Topic		Replies	Views
cudaStreamSynchronize random hang CUDA Programming and Performance cuda , cudnn	0	138	September 29, 2025
cudaStreamSynchronize problem on Tesla T4 TensorRT	2	650	July 24, 2020
Model inference on multiple cuda streams with tensorrt api Jetson AGX Orin tensorrt , nsight , nvbugs	22	2798	February 20, 2024
Confusion about TensorRT stream.synchronize() in GPU-only inference TensorRT tensorrt	1	670	November 2, 2022
Models do inference, call `cudaStreamSynchronize` gets stuck DRIVE AGX Orin General driveos-cuda	3	51	May 18, 2026
Stream.synchronize() is slow (python API) TensorRT	5	2343	August 24, 2021
TensorRT multithread hang TensorRT	2	495	September 9, 2020
TensorRT multi stream TensorRT	2	2993	February 29, 2024
cudaErrorIllegalAddress (700) from cudaStreamSynchronize only if multiple processes use same GPU CUDA Programming and Performance tensorrt , cuda	4	279	August 7, 2024
stream synchronize problem CUDA Programming and Performance	2	806	August 28, 2017

All cudaStreamSynchronize() hang in tensorRT thread, run on jetson orin

Related topics