DLA Fallback to GPU even GPUFallback Flag is not set


i’m using a Jetson Xavier AGX with Jetpack 4.3 and TensorRT 6.0.
I used the tensorrt C++ API and build a network for DLA 0 and DLA 1 without the Flag GPUFallback and a network for the GPU. The building doesn’t show any errors.

I run the inference for the three engines concurrently.
I expected that the DLA inference won’t be profiled and that the execution would be completly indepent between the DLA 0, DLA 1 and GPU.

Altough following pictures of the profiler was captured. The first thread is for the DLA0, the second for the GPU and the third for DLA1. In the second picture the execution of the DLA1 is visible. Furthermore if i lock at the streams, more kernel executions are visible of the DLA. (stream 63, 88 in picture 3)

My questions are:

  • Why is the execution of the DLAs visible even the GPUFallback flag is not set?
  • Is there a possibility to overcome these Fallback for completly indepent execution?
  • How to get an error if GPU Fallback is not enabled and the DLA can’t execute the layer?


What kind of function does the DLA thread execute?
It is possible that DLA use GPU for some input/output data reformatting.



Thanks for your quick response.

The DLA thread execute only convolutional layers and activation layers with relu.

Yes the DLA uses the GPU for data reformatting (nchwtonhwc). This is the second line/block in picture 3 in stream 63,88.

The biggest block in the dla stream coulb be through an unsupported layer but i thought if gpu Fallback is not set the generation of the engine would raise an error.

Is it possible that gpu fallback is implicitly set if a layer is not supported by the dla cores?