I’m encountering the following Tensor & CUDA errors during long-term aging in the environment described below. Do you happen to know the cause or any related information?
[04/15/2026-06:55:53] [E] [TRT] 1: [checkMacros.cpp::catchCudaError::202] Error Code 1: Cuda Runtime (an async error has occurred in an external entity outside of CUDA)
Operating Environment
Jetson Orin NX 8G
JetPack 5.1.2
custom board
Inference using a combination of 1 GPU model + 3 DLA models
1. The recurrence interval is random. It can occur as soon as 1–2 days, or take up to about a week.
2. When the symptom occurs, only the inference process fails. There are no system hangs or application crashes.
3. Currently, dmesg logs are not being collected, so I cannot attach them. However, when the symptom occurred, there were no notable failure logs in the dmesg.
Based on your comment, it sounds more like an application-level issue.
Is there another application running concurrently?
Any possibility that the device is running out of memory?
Based on the error, the error seems related to the interaction with other applications/libraries:
Error Code 1: Cuda Runtime (an async error has occurred in an external entity outside of CUDA)
We need more logs to figure out the issue.
Could you try to reproduce the problem with the CUDA coredump:
Set the CUDA_ENABLE_COREDUMP_ON_EXCEPTION environment variable to 1 and share the file with us.
We will need to reproduce this issue locally to debug it further.
Do you have dependencies on JetPack 5?
If not, could you try to reproduce the issue on JetPack 6 to see if the hang still occurs?
If yes, could you share a reproducible source and corresponding steps with us?
Our custom board has a dependency on Jetpack 5.1.2.
As a result, migrating to Jetpack 6 is not straightforward.
The reproduction process involves decoding 8 streams at 1080p@10fps, extracting results using the GPU detection model, and then passing those results to 3 DLA models as needed to extract further results.
If this process is continuously run, the issue occurs intermittently.
PS. Please understand that we are unable to share the model due to company policy.
The DLA and GPU read from different buffers and write to different buffers.
A single flow is as follows:
In Buf -> GPU -> Out Buf -> GPU Result -> New In Buf -> DLA 1 -> New Out Buf -> DLA Result
-> New In Buf -> DLA 2 -> New Out Buf -> DLA Result
-> New In Buf -> DLA 3 -> New Out Buf -> DLA Result
DLA models are called selectively based on the GPU Result.
The above flow is running indefinitely at 7 fps on 8 threads.
Before reaching out with this issue, I had already checked the synchronization between the CPU and GPU, and I’ve added synchronization measures to prevent concurrent access where necessary.
In short, the problem still occurs even after adding synchronization.
During aging tests under high-load conditions (complex scenes with many objects), we are encountering the following error:
[E] [TRT] 1: [cudlaUtils.cpp::submit::95] Error Code 1: DLA (Failed to submit program to DLA engine.)
This appears to be caused by DLA resource contention and scheduling bottlenecks when processing multiple models on a single DLA core.
Configuration: 1 GPU model + 3 DLA models (8-channel inference, 7fps per channel)
To prevent DLA queue overflow, we are considering a Dynamic FPS Scaling approach:
Monitor scene complexity (e.g., number of detected objects or DLA processing latency).
If complexity is high, temporarily reduce the input frame rate (e.g., 7fps → 4~5fps).
Restore to 7fps once the scene complexity decreases.
Is this dynamic scaling a recommended approach to mitigate DLA submit errors on Orin NX 8GB?
[04/15/2026-06:55:53] [E] [TRT] 1: [checkMacros.cpp::catchCudaError::202] Error Code 1: Cuda Runtime (an async error has occurred in an external entity outside of CUDA)
[one more log]
[05/02/2026-06:43:48] [E] [TRT] 1: [cudlaUtils.cpp::submit::95] Error Code 1: DLA (Failed to submit program to DLA engine . )
So far, we’ve had it running in parallel, but this time we’d like to change it to run sequentially and test it.
@yj45.kwon is the software project leader on my team.
I’ve been concerned that simultaneous requests to the DLA might be causing issues, so we’re currently testing a change to limit requests to one at a time.