Description
Main issue:
I’m implementing a YOLO model which performs inference on input video frames. I tried to upgrade my framework from Holoscan 2.3 (using TensorRT v8.6) to Holoscan 2.6 using TRT v10.3. When converting the model from ONNX to TensorRT using --useCudaGraphs
the model successfully converts but I’ve observed the following logs:
[11/20/2024-10:34:21] [I] Starting inference
[11/20/2024-10:34:21] [I] Capturing CUDA graph for the current execution context
[11/20/2024-10:34:21] [E] Error[1]: IExecutionContext::enqueueV3: Error Code 1: Cuda Runtime (operation not permitted when stream is capturing)
[11/20/2024-10:34:21] [W] The CUDA graph capture on the stream has failed.
[11/20/2024-10:34:21] [W] The built TensorRT engine contains operations that are not permitted under CUDA graph capture mode.
[11/20/2024-10:34:21] [W] The specified --useCudaGraph flag has been ignored. The inference will be launched without using CUDA graph launch.
[11/20/2024-10:34:21] [E] Error[1]: [defaultAllocator.cpp::deallocate::64] Error Code 1: Cuda Runtime (invalid argument)
When I start the application, the model inference operator fails and the following exception is thrown:
[info] [utils.hpp:46] IExecutionContext::enqueueV3: Error Code 1: Cuda Runtime (operation not permitted when stream is capturing)
[error] [infer_utils.cpp:31] Cuda runtime error, operation failed due to a previous error during capture
[error] [holoinfer_constants.hpp:82] Inference manager, Error in inference setup: Cuda runtime error: cudaErrorStreamCaptureInvalidated, operation failed due to a previous error during capture
[error] [gxf_wrapper.cpp:90] Exception occurred for operator: 'holoinfer' - Error in Inference Operator, Sub-module->Compute, Inference execution, Message->Error in Inference Operator, Sub-module->Compute, Inference execution, Inference manager, Error in inference setup: Cuda runtime error: cudaErrorStreamCaptureInvalidated, operation failed due to a previous error during capture
[error] [entity_executor.cpp:596] Failed to tick codelet holoinfer in entity: holoinfer code: GXF_FAILURE
[info] [utils.hpp:46] [defaultAllocator.cpp::deallocate::64] Error Code 1: Cuda Runtime (invalid argument)
[warning] [greedy_scheduler.cpp:243] Error while executing entity 23 named 'holoinfer': GXF_FAILURE
What I’ve tried so far:
The new Holoscan update imposes the use of Cuda Graph when inference is performed. As my model is not compatible with CudaGraph, I’ve followed the following steps to no success so far.
- According to TensorRT documentation models containing loops or conditionals do not support Cuda Graphs. I removed instances of loops/conditionals in my model but to no success.
- I’ve tried looking for ways to disable the use of Cuda Graphs during inference but this has also been unsuccessful.
Is there any guide to follow for inference to run without cudagraphs, or perhaps an existent tool I can use to determine why my model is not compatible with Cuda Graph?
Environment
TensorRT Version: 10.3.0
GPU Type: NVIDIA GeForce RTX 4070
Nvidia Driver Version: 560.35.03
CUDA Version: 12.6
CUDNN Version: 9.4.0
Operating System + Version: Ubuntu 22.04.4