Model inference fails after Holoscan 2.6 update: Not compatible with useCudaGraph

lutajf2 · November 20, 2024, 10:55am

Description

Main issue:

I’m implementing a YOLO model which performs inference on input video frames. I tried to upgrade my framework from Holoscan 2.3 (using TensorRT v8.6) to Holoscan 2.6 using TRT v10.3. When converting the model from ONNX to TensorRT using --useCudaGraphs the model successfully converts but I’ve observed the following logs:

[11/20/2024-10:34:21] [I] Starting inference
[11/20/2024-10:34:21] [I] Capturing CUDA graph for the current execution context
[11/20/2024-10:34:21] [E] Error[1]: IExecutionContext::enqueueV3: Error Code 1: Cuda Runtime (operation not permitted when stream is capturing)
[11/20/2024-10:34:21] [W] The CUDA graph capture on the stream has failed.
[11/20/2024-10:34:21] [W] The built TensorRT engine contains operations that are not permitted under CUDA graph capture mode.
[11/20/2024-10:34:21] [W] The specified --useCudaGraph flag has been ignored. The inference will be launched without using CUDA graph launch.
[11/20/2024-10:34:21] [E] Error[1]: [defaultAllocator.cpp::deallocate::64] Error Code 1: Cuda Runtime (invalid argument)

When I start the application, the model inference operator fails and the following exception is thrown:

[info] [utils.hpp:46] IExecutionContext::enqueueV3: Error Code 1: Cuda Runtime (operation not permitted when stream is capturing)
[error] [infer_utils.cpp:31] Cuda runtime error, operation failed due to a previous error during capture
[error] [holoinfer_constants.hpp:82] Inference manager, Error in inference setup: Cuda runtime error: cudaErrorStreamCaptureInvalidated, operation failed due to a previous error during capture
[error] [gxf_wrapper.cpp:90] Exception occurred for operator: 'holoinfer' - Error in Inference Operator, Sub-module->Compute, Inference execution, Message->Error in Inference Operator, Sub-module->Compute, Inference execution, Inference manager, Error in inference setup: Cuda runtime error: cudaErrorStreamCaptureInvalidated, operation failed due to a previous error during capture
[error] [entity_executor.cpp:596] Failed to tick codelet holoinfer in entity: holoinfer code: GXF_FAILURE
[info] [utils.hpp:46] [defaultAllocator.cpp::deallocate::64] Error Code 1: Cuda Runtime (invalid argument)
[warning] [greedy_scheduler.cpp:243] Error while executing entity 23 named 'holoinfer': GXF_FAILURE

What I’ve tried so far:

The new Holoscan update imposes the use of Cuda Graph when inference is performed. As my model is not compatible with CudaGraph, I’ve followed the following steps to no success so far.

According to TensorRT documentation models containing loops or conditionals do not support Cuda Graphs. I removed instances of loops/conditionals in my model but to no success.
I’ve tried looking for ways to disable the use of Cuda Graphs during inference but this has also been unsuccessful.

Is there any guide to follow for inference to run without cudagraphs, or perhaps an existent tool I can use to determine why my model is not compatible with Cuda Graph?

Environment

TensorRT Version: 10.3.0
GPU Type: NVIDIA GeForce RTX 4070
Nvidia Driver Version: 560.35.03
CUDA Version: 12.6
CUDNN Version: 9.4.0
Operating System + Version: Ubuntu 22.04.4

AakankshaS · November 30, 2024, 8:27am

Hi @lutajf2 ,
I see there is a bug internally moving about the similar issue, and i can collect more details and get back.

Topic		Replies	Views
Cuda Error in launchPwgenKernel- When running a specific engine in async TensorRT tensorrt	9	2155	June 11, 2022
CUDA & TensorRT issue, I'd appreciate the help CUDA Setup and Installation tensorrt , cuda , tensorflow , driver	0	1577	March 26, 2023
TensorRT Error: Can't identify the cuda device. Running on device 0 TensorRT tensorrt , cuda , tensorflow	3	649	January 7, 2021
TensorRT 8 - nvinfer1::CudaRuntimeError GUI application TensorRT cuda	3	1350	September 15, 2021
Error during inference TensorRT	7	971	October 12, 2021
Error: Cuda Runtime (no kernel image is available for execution on the device) TensorRT	4	1430	September 25, 2023
Issue with Inferencing with TensorRT on Python TensorRT	7	1131	July 20, 2022
Got Cudnn Error in executeConv: 3 (CUDNN_STATUS_BAD_PARAM) TensorRT tensorrt , cuda , ubuntu	1	1398	February 21, 2022
:nvinfer1::rt::ExecutionContext::enqueueInternal::330, condition: bindings[x] != nullptr TensorRT tensorrt	1	1883	February 15, 2022
CUDA Graph and TensorRT batch inference TensorRT tensorrt , cuda , kernel	2	3655	October 12, 2021

Model inference fails after Holoscan 2.6 update: Not compatible with useCudaGraph

Description

Environment

Related topics