Errors inferencing model with DLA core


I have created a model (Mobilenet-v1-SSD) which uses the DLA core using trtexec. When generating this trt engine file, I do not get any errors. It also inferences perfectly using trtexec.

But this same model when run with my inferencer, runs perfectly with only the first frame from the video. This model fails on every subsequent frame thereafter. In fact, the inferencer I coded up is loosely based on the trtexec file provided in the TensorRT Github repository.


TensorRT Version: 7.1.0
GPU Type: Nvidia Xavier AGX
Nvidia Driver Version:
CUDA Version: 10.2
CUDNN Version:
Operating System + Version: Ubuntu 18.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

The console output with the errors:

2021-08-18 13:28:07.974] [info] [main.cxx:522] (15, 69), (239, 201), 1 (aeroplane), 0.88964844
[2021-08-18 13:28:07.974] [info] [timeExecution.hpp:96] [UTILS] [TimeExecution (TensorRT Inference) ]: 1490.553682ms (0.670892 Hz)
NVMEDIA_DLA : 1928, ERROR: Submit failed.
[08/18/2021-13:29:26] [08/18/2021-13:29:26] [E] [TRT] ../rtExt/dla/native/dlaUtils.cpp (194) - DLA Error in submit: 7 (Failure to submit program to DLA engine.)
[E] [TRT] FAILED_EXECUTION: std::exception
[2021-08-18 13:29:48.642] [info] [main.cxx:522] (15, 69), (239, 201), 1 (aeroplane), 0.88964844
[2021-08-18 13:29:48.642] [info] [timeExecution.hpp:96] [UTILS] [TimeExecution (TensorRT Inference) ]: 41131.105716ms (0.024312 Hz)
NVMEDIA_DLA :  885, ERROR: runtime registerEvent failed. err: 0x4.
NVMEDIA_DLA : 1849, ERROR: RequestSubmitEvents failed. status: 0x7.

In the above output snippet, I was able to get the first prediction which I confirmed it to be correct. Then every subsequent frame would not be submitted successfully.

Please check the below links, as they might answer your concerns.

The model actually successfully executes using trtexec. So, I don’t think the issue is with the compatibility of the layers with the DLA. And it executes for one frame with my executor. I suspect I was not able to submit the next frame due to the DLA maybe it does not have enough memory bandwidth remaining. I am not sure if we need to call any specific API calls which need to call to clear out the DLA. Could you clarify upon this?

Hi @ninolendt,

We recommend you to post your concern on Jetson related forum to get better help.

Thank you.