I want to use DLA core to accelerate the inference on my Orin. So when I converted my onnx model to trt engin using trtexec. But when I created the Engine, it showed me this message: I am using fp16,
[W] Dimension: 2 (21504) exceeds maximum allowed size for DLA: 8192
[W] [TRT] DLA supports only 16 subgraphs per DLA core. Switching to GPU for layer
[W] [TRT] DLA supports only 16 subgraphs per DLA core. Switching to GPU for layer
[W] [TRT] DLA supports only 16 subgraphs per DLA core. Switching to GPU for layer
[W] [TRT] DLA supports only 16 subgraphs per DLA core. Switching to GPU for layer
The first message indicates the size of the tensor’s dimension is over the DLA capability.
The maximum of each dimension is 8192.
The second one indicates you have more than 16 loadable, which is also over the DLA capability.
Multiple loadable is usually caused by the fallback layer in between a network.
TensorRT may split a network into multiple DLA loadables if any intermediate layers cannot run on DLA and GPUFallback is enabled. Otherwise, TensorRT can emit an error and fallback. For more information, refer to GPU Fallback Mode.
At most 16 DLA loadables can be loaded concurrently, per core, due to hardware and software memory limitations.