Description
I have a test on the CUPTI Activity API, which is based on the TensorRT official sample trtexec
.
CUPTI_CALL(cuptiActivityRegisterCallbacks(bufferRequested, bufferCompleted));
There are two functions that are registered by cuptiActivityRegisterCallbacks
. These functions are responsible for allocating memory and processing data when a CUDA activity occurs.
I encountered an issue when utilizing the CUDA graph and repeating model inference multiple times. Initially, the CUDA activities would successfully trigger bufferRequested
and bufferCompleted
in the first few iterations. However, in later iterations, the CUDA activities only triggered bufferRequested
, resulting in data not being processed.
By the way, this issue cannot be reproduced on the x86 platform.
Environment
TensorRT Version: v8502
GPU Type: Jetson Orin
Nvidia Driver Version: nvidia-jetpack 5.1.1-b56
CUDA Version: 11.4
CUDNN Version: 8.6
Operating System + Version: Ubuntu 20.04.6 LTS
Python Version (if applicable): N/A
TensorFlow Version (if applicable): N/A
PyTorch Version (if applicable): N/A
Baremetal or Container (if container which image + tag): N/A
Relevant Files
trt_samples.tar.gz (838.9 KB)
Steps To Reproduce
tar xvzf trt_samples.tar.gz
cd trt_samples/trtexec
./complie.sh
make -j8
sudo ../../bin/trtexec --loadEngine=./resnet.engine --useCudaGraph
I have added logging statements in both the bufferRequested
and bufferCompleted
functions. When running the test, you will initially see outputs from both functions, and then only the output from bufferRequested
will remain.