TensorRT 5.0 problem on TegraXavier


I’m trying to run a deep neural network on a Xavier SOC. A part of the network have to run on the DLA, an other part have to run on the integrated Xavier GPU.

I’m working on a costum environement throught the C++ API provided by TensorRT. I’m not using onnx, caffe or other format.

A message appears in function of the placement of my tasks on the DLA:

DLA supports only 3 subgraphs per DLA core. Use allowGPUFallback() to fallback on GPU for these layers.

Then during the creation of the engine :

RuntimeError: unidentifiable C++ exception
Cuda failure: driver shutting down

Do you have more informations about the good way to do the tasks mapping of the DLA ?


Moreover if i map juste one layer to the DLA there is a long latency caused by the reformatter of the output layer for the next input layer.
In my case this “input reformatter” added layer takes 25% of the execution time of the full network.


Please reference http://nvdla.org/sw/runtime_environment.html for introduction on DLA jobs/tasks.

To help us debug, can you please share a small repro that demonstrates the CUDA error you are seeing?