Issues Running TensorRT Inference on Jetson Orin: CUDA Stream Capture Errors

Hi everyone,

I’m encountering a series of CUDA errors during inference using NVIDIA Holoscan.
The error logs include:

[error] [utils.hpp:63] IExecutionContext::enqueueV3: Error Code 1: Myelin ([cask.cpp:exec:1306] Platform (Cuda) error)
[error] [gxf_wrapper.cpp:118] Exception occurred for operator: ‘holoviz’ - CUDA driver error 906 (CUDA_ERROR_STREAM_CAPTURE_IMPLICIT): operation would make the legacy stream depend on a capturing blocking stream
[error] [infer_utils.cpp:31] Cuda runtime error, operation failed due to a previous error during capture
[error] [entity_executor.cpp:596] Failed to tick codelet holoviz in entity: holoviz code: GXF_FAILURE
[error] [holoinfer_constants.hpp:83] Inference manager, Error in inference execution: Cuda runtime error: cudaErrorStreamCaptureInvalidated, operation failed due to a previous error during capture
[error] [gxf_wrapper.cpp:118] Exception occurred for operator: ‘classification’ - Error in Inference Operator, Sub-module->Compute, Inference execution, Message->Error in Inference Operator, Sub-module->Compute, Inference execution, Inference manager, Error in inference execution: Cuda runtime error: cudaErrorStreamCaptureInvalidated, operation failed due to a previous error during capture
[error] [entity_executor.cpp:596] Failed to tick codelet classification in entity: classification code: GXF_FAILURE
[warning] [event_based_scheduler.cpp:329] Error while executing entity E155 named ‘classification’: GXF_FAILURE
[info] [event_based_scheduler.cpp:671] Stopping all async jobs

when running the test again, the error often changes :
[error] [utils.hpp:63] IExecutionContext::enqueueV3: Error Code 1: Cask (Cask convolution execution)
[error] [infer_utils.cpp:31] Cuda runtime error, operation failed due to a previous error during capture
[error] [gxf_wrapper.cpp:118] Exception occurred for operator: ‘holoviz’ - [/workspace/holoscan-sdk/modules/holoviz/src/vulkan/resource.cpp:69] CUDA driver error 906 (CUDA_ERROR_STREAM_CAPTURE_IMPLICIT): operation would make the legacy stream depend on a capturing blocking stream
[error] [holoinfer_constants.hpp:83] Inference manager, Error in inference execution: Cuda runtime error: cudaErrorStreamCaptureInvalidated, operation failed due to a previous error during capture
[error] [entity_executor.cpp:596] Failed to tick codelet holoviz in entity: holoviz code: GXF_FAILURE
[error] [gxf_wrapper.cpp:118] Exception occurred for operator: ‘classification’ - Error in Inference Operator, Sub-module->Compute, Inference execution, Message->Error in Inference Operator, Sub-module->Compute, Inference execution, Inference manager, Error in inference execution: Cuda runtime error: cudaErrorStreamCaptureInvalidated, operation failed due to a previous error during capture
[error] [entity_executor.cpp:596] Failed to tick codelet classification in entity: classification code: GXF_FAILURE

or

[error] [utils.hpp:63] IExecutionContext::enqueueV3: Error Code 1: Myelin ([direct.cpp:exec:136] ‘__mye52244’: wait failed (1))
[error] [infer_utils.cpp:31] Cuda runtime error, operation failed due to a previous error during capture
[error] [holoinfer_constants.hpp:83] Inference manager, Error in inference execution: Cuda runtime error: cudaErrorStreamCaptureInvalidated, operation failed due to a previous error during capture
[error] [gxf_wrapper.cpp:118] Exception occurred for operator: ‘classification’ - Error in Inference Operator, Sub-module->Compute, Inference execution, Message->Error in Inference Operator, Sub-module->Compute, Inference execution, Inference manager, Error in inference execution: Cuda runtime error: cudaErrorStreamCaptureInvalidated, operation failed due to a previous error during capture
[error] [entity_executor.cpp:596] Failed to tick codelet classification in entity: classification code: GXF_FAILURE
[error] [gxf_wrapper.cpp:118] Exception occurred for operator: ‘holoviz’ - [/workspace/holoscan-sdk/modules/holoviz/src/vulkan/resource.cpp:69] CUDA driver error 906 (CUDA_ERROR_STREAM_CAPTURE_IMPLICIT): operation would make the legacy stream depend on a capturing blocking stream
[error] [entity_executor.cpp:596] Failed to tick codelet holoviz in entity: holoviz code: GXF_FAILURE

From what I understand, the CUDA error 906 (CUDA_ERROR_STREAM_CAPTURE_IMPLICIT) and cudaErrorStreamCaptureInvalidated relate to issues with CUDA stream capture and legacy stream dependencies. However, I’m not sure exactly what causes this in my code and how to fix it.

The first inference of the classification with Holoinfer is going well, and I had no issue during the conversion onnx to engine with trtexec .

Environnment :

  • JetPack 6.0
  • jtop 4.2.8
  • CUDA 12.6.20
  • TensorRT: 10.3.0
  • ONNX model : mvit (classification)
  • Running on Jetson Orin (nvgpu) ARM64

Thanks in advance for any pointers or solutions!