I have two TensorRT plans compiled from ONNX using the standard TensorRT builder and ONNX parser.
I can successfully capture the ExecutionContexts derived from these plans to CudaGraphs and launch these on Streams (with outputs as expected).
However, when launching these operations repeatedly in a loop, and if certain conditions are met, we will eventually encounter a Xid 31 error after an arbitrary, large number of loop iterations. This error manifests itself in the program as a cuda error 700 (illegal memory access) when synchronizing the first stream.
The following conditions must all be true to trigger the error:
The ExectionContexts must be captured to graphs.
The two ExectionContexts must be executing in parallel (on two Streams).
There must be other compute processes on the same GPU.
compute-sanitizer (all tools) and cuda-memcheck (all tools) report no problems. The issue doesn’t seem to pop up when running with cuda-gdb. when CUDA_LAUNCH_BLOCKING=1 is used, the error is still received when synchronizing.
Environment
TensorRT Version: 8.6.1.6 GPU Type: tested with RTX 4070 and RTX A4500 Nvidia Driver Version: 550.78 (RTX 4070) or 525.60.13 (RTX A4500) CUDA Version: tested with 11.8 and 12.3.2 CUDNN Version: 8.9.7 Operating System + Version: tested with linux 6.6 and linux 6.1 Python Version (if applicable): N/A TensorFlow Version (if applicable): N/A PyTorch Version (if applicable): N/A Baremetal or Container (if container which image + tag): tested on nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04 and nvcr.io/nvidia/tensorrt:24.01-py3
Relevant Files
Steps To Reproduce
git clone git@github.com:soooch/weird-trt-thing.git
cd weird-trt-thing
docker run --gpus all -it --rm -v .:/workspace nvcr.io/nvidia/tensorrt:24.01-py3
once inside container:
apt update
apt-get install -y parallel
make
# need at least 2, but will fail faster if more (hence 16)
parallel -j0 --delay 0.3 ./fuzzer ::: {1..16}
# wait up to ~ 10 minutes. usually much faster