Unable to build tensorrt engine with DLA enabled on Jetson Xavier NX

Description

TensorRT engine build failed with error Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/cnn/cnn.0/Conv]}.

Environment

TensorRT Version: 8.5.2

NVIDIA GPU: Volta GPU

CUDA Version: 11.4

CUDNN Version: 8.6

Operating System: Ubuntu 20

Platform : Jetson Xavier NX

Jetpack version: 5.1.1

Relevant Files

Model link: https://drive.google.com/file/d/1K5kQxR0IR-SGF6Ry1V44R-bmfwF4NPPx/view?usp=sharing

Steps To Reproduce

  1. Took the example model from https://github.com/NVIDIA-AI-IOT/jetson_dla_tutorial
  2. Exported the model to onnx format.
  3. Tried building the engine with the command /usr/src/tensorrt/bin/trtexec --onnx=model_gn.onnx --shapes=input:32x3x32x32 --saveEngine=model_gn.engine --exportProfile=model_gn.json --int8 --useDLACore=0 --allowGPUFallback --useSpinWait --separateProfileRun
  4. Build failed with the error Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/cnn/cnn.0/Conv]}
  5. You can check the complete log here https://drive.google.com/file/d/1Ude0Pb3VOb_rzJhbzu_AXtlk8HUUNbzT/view?usp=drive_link
  6. Got the same result when tried with polygraphy too!

Hi,

Could you run trtexec with --verbose and attach the log with us?
Thanks.

Hi,
You can find the log link here https://drive.google.com/file/d/1qNT8dqTcdHgU4Djzxj1ltr0KS-QiyfHY/view?usp=drive_link

I have provided the log file for verbose output!

Hi

The link shows “Access Denied”.
Could you help to check?

Thanks.

Sorry can you try again:

Hi,

Thanks for sharing.

[05/08/2024-10:14:17] [E] Error[10]: [optimizer.cpp::computeCosts::3728] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/cnn/cnn.0/Conv]}.)
[05/08/2024-10:14:17] [V] [TRT] =============== Computing costs for 
[05/08/2024-10:14:17] [V] [TRT] *************** Autotuning format combination: Int8(6144,2048,64,1) -> Int8(65536,1024,64,1) ***************
[05/08/2024-10:14:17] [V] [TRT] --------------- Timing Runner: {ForeignNode[/cnn/cnn.0/Conv]} (DLA)
[05/08/2024-10:14:17] [V] [TRT] Setting a default quantization params because quantization data is missing for {ForeignNode[/cnn/cnn.0/Conv]}
[05/08/2024-10:14:17] [W] [TRT] Skipping tactic 0x0000000000000003 due to exception Failed to create DLA runtime context. Hint: You can load at most 16 DLA loadables simultaneously per core. Attempting to load more will cause context allocation to fail.
[05/08/2024-10:14:17] [V] [TRT] Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[05/08/2024-10:14:17] [V] [TRT] *************** Autotuning format combination: Int8(6144,2048,64,1) -> Int8(512,256:32,16,1) ***************
[05/08/2024-10:14:17] [V] [TRT] --------------- Timing Runner: {ForeignNode[/cnn/cnn.0/Conv]} (DLA)
[05/08/2024-10:14:17] [V] [TRT] Setting a default quantization params because quantization data is missing for {ForeignNode[/cnn/cnn.0/Conv]}
[05/08/2024-10:14:17] [W] [TRT] Skipping tactic 0x0000000000000003 due to exception Failed to create DLA runtime context. Hint: You can load at most 16 DLA loadables simultaneously per core. Attempting to load more will cause context allocation to fail.
[05/08/2024-10:14:17] [V] [TRT] Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[05/08/2024-10:14:17] [V] [TRT] *************** Autotuning format combination: Int8(1024,1:4,32,1) -> Int8(65536,1024,64,1) ***************
[05/08/2024-10:14:17] [V] [TRT] --------------- Timing Runner: {ForeignNode[/cnn/cnn.0/Conv]} (DLA)
[05/08/2024-10:14:17] [V] [TRT] Setting a default quantization params because quantization data is missing for {ForeignNode[/cnn/cnn.0/Conv]}
[05/08/2024-10:14:17] [W] [TRT] Skipping tactic 0x0000000000000003 due to exception Failed to create DLA runtime context. Hint: You can load at most 16 DLA loadables simultaneously per core. Attempting to load more will cause context allocation to fail.
[05/08/2024-10:14:17] [V] [TRT] Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[05/08/2024-10:14:17] [V] [TRT] *************** Autotuning format combination: Int8(1024,1:4,32,1) -> Int8(512,256:32,16,1) ***************
[05/08/2024-10:14:17] [V] [TRT] --------------- Timing Runner: {ForeignNode[/cnn/cnn.0/Conv]} (DLA)
[05/08/2024-10:14:17] [V] [TRT] Setting a default quantization params because quantization data is missing for {ForeignNode[/cnn/cnn.0/Conv]}
[05/08/2024-10:14:17] [W] [TRT] Skipping tactic 0x0000000000000003 due to exception Failed to create DLA runtime context. Hint: You can load at most 16 DLA loadables simultaneously per core. Attempting to load more will cause context allocation to fail.
[05/08/2024-10:14:17] [V] [TRT] Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[05/08/2024-10:14:17] [V] [TRT] *************** Autotuning format combination: Int8(1024,1024:32,32,1) -> Int8(65536,1024,64,1) ***************
[05/08/2024-10:14:17] [V] [TRT] --------------- Timing Runner: {ForeignNode[/cnn/cnn.0/Conv]} (DLA)
[05/08/2024-10:14:17] [V] [TRT] Setting a default quantization params because quantization data is missing for {ForeignNode[/cnn/cnn.0/Conv]}
[05/08/2024-10:14:17] [W] [TRT] Skipping tactic 0x0000000000000003 due to exception Failed to create DLA runtime context. Hint: You can load at most 16 DLA loadables simultaneously per core. Attempting to load more will cause context allocation to fail.
[05/08/2024-10:14:17] [V] [TRT] Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[05/08/2024-10:14:17] [V] [TRT] *************** Autotuning format combination: Int8(1024,1024:32,32,1) -> Int8(512,256:32,16,1) ***************
[05/08/2024-10:14:17] [V] [TRT] --------------- Timing Runner: {ForeignNode[/cnn/cnn.0/Conv]} (DLA)
[05/08/2024-10:14:17] [V] [TRT] Setting a default quantization params because quantization data is missing for {ForeignNode[/cnn/cnn.0/Conv]}
[05/08/2024-10:14:17] [W] [TRT] Skipping tactic 0x0000000000000003 due to exception Failed to create DLA runtime context. Hint: You can load at most 16 DLA loadables simultaneously per core. Attempting to load more will cause context allocation to fail.
[05/08/2024-10:14:17] [V] [TRT] Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf

Based on the log, the model already meets the DLA loadable limit.
In such case, TensorRT should fall back to GPU for inference but it doesn’t.

Could you test it on JetPack 5.1.3? The latest software for XavierNX?

Thanks.