Unable to build tensorrt engine with DLA enabled on Jetson Xavier NX

nk4464 · May 7, 2024, 7:30pm

Description

TensorRT engine build failed with error Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/cnn/cnn.0/Conv]}.

Environment

TensorRT Version: 8.5.2

NVIDIA GPU: Volta GPU

CUDA Version: 11.4

CUDNN Version: 8.6

Operating System: Ubuntu 20

Platform : Jetson Xavier NX

Jetpack version: 5.1.1

Relevant Files

Model link: https://drive.google.com/file/d/1K5kQxR0IR-SGF6Ry1V44R-bmfwF4NPPx/view?usp=sharing

Steps To Reproduce

Took the example model from https://github.com/NVIDIA-AI-IOT/jetson_dla_tutorial
Exported the model to onnx format.
Tried building the engine with the command /usr/src/tensorrt/bin/trtexec --onnx=model_gn.onnx --shapes=input:32x3x32x32 --saveEngine=model_gn.engine --exportProfile=model_gn.json --int8 --useDLACore=0 --allowGPUFallback --useSpinWait --separateProfileRun
Build failed with the error Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/cnn/cnn.0/Conv]}
You can check the complete log here https://drive.google.com/file/d/1Ude0Pb3VOb_rzJhbzu_AXtlk8HUUNbzT/view?usp=drive_link
Got the same result when tried with polygraphy too!

AastaLLL · May 8, 2024, 4:17am

Hi,

Could you run trtexec with --verbose and attach the log with us?
Thanks.

nk4464 · May 8, 2024, 2:19pm

Hi,
You can find the log link here https://drive.google.com/file/d/1qNT8dqTcdHgU4Djzxj1ltr0KS-QiyfHY/view?usp=drive_link

nk4464 · May 9, 2024, 6:41pm

I have provided the log file for verbose output!

AastaLLL · May 13, 2024, 7:45am

Hi

The link shows “Access Denied”.
Could you help to check?

Thanks.

nk4464 · May 13, 2024, 4:35pm

Sorry can you try again:

AastaLLL · May 15, 2024, 8:22am

Hi,

Thanks for sharing.

[05/08/2024-10:14:17] [E] Error[10]: [optimizer.cpp::computeCosts::3728] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/cnn/cnn.0/Conv]}.)

[05/08/2024-10:14:17] [V] [TRT] =============== Computing costs for 
[05/08/2024-10:14:17] [V] [TRT] *************** Autotuning format combination: Int8(6144,2048,64,1) -> Int8(65536,1024,64,1) ***************
[05/08/2024-10:14:17] [V] [TRT] --------------- Timing Runner: {ForeignNode[/cnn/cnn.0/Conv]} (DLA)
[05/08/2024-10:14:17] [V] [TRT] Setting a default quantization params because quantization data is missing for {ForeignNode[/cnn/cnn.0/Conv]}
[05/08/2024-10:14:17] [W] [TRT] Skipping tactic 0x0000000000000003 due to exception Failed to create DLA runtime context. Hint: You can load at most 16 DLA loadables simultaneously per core. Attempting to load more will cause context allocation to fail.
[05/08/2024-10:14:17] [V] [TRT] Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[05/08/2024-10:14:17] [V] [TRT] *************** Autotuning format combination: Int8(6144,2048,64,1) -> Int8(512,256:32,16,1) ***************
[05/08/2024-10:14:17] [V] [TRT] --------------- Timing Runner: {ForeignNode[/cnn/cnn.0/Conv]} (DLA)
[05/08/2024-10:14:17] [V] [TRT] Setting a default quantization params because quantization data is missing for {ForeignNode[/cnn/cnn.0/Conv]}
[05/08/2024-10:14:17] [W] [TRT] Skipping tactic 0x0000000000000003 due to exception Failed to create DLA runtime context. Hint: You can load at most 16 DLA loadables simultaneously per core. Attempting to load more will cause context allocation to fail.
[05/08/2024-10:14:17] [V] [TRT] Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[05/08/2024-10:14:17] [V] [TRT] *************** Autotuning format combination: Int8(1024,1:4,32,1) -> Int8(65536,1024,64,1) ***************
[05/08/2024-10:14:17] [V] [TRT] --------------- Timing Runner: {ForeignNode[/cnn/cnn.0/Conv]} (DLA)
[05/08/2024-10:14:17] [V] [TRT] Setting a default quantization params because quantization data is missing for {ForeignNode[/cnn/cnn.0/Conv]}
[05/08/2024-10:14:17] [W] [TRT] Skipping tactic 0x0000000000000003 due to exception Failed to create DLA runtime context. Hint: You can load at most 16 DLA loadables simultaneously per core. Attempting to load more will cause context allocation to fail.
[05/08/2024-10:14:17] [V] [TRT] Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[05/08/2024-10:14:17] [V] [TRT] *************** Autotuning format combination: Int8(1024,1:4,32,1) -> Int8(512,256:32,16,1) ***************
[05/08/2024-10:14:17] [V] [TRT] --------------- Timing Runner: {ForeignNode[/cnn/cnn.0/Conv]} (DLA)
[05/08/2024-10:14:17] [V] [TRT] Setting a default quantization params because quantization data is missing for {ForeignNode[/cnn/cnn.0/Conv]}
[05/08/2024-10:14:17] [W] [TRT] Skipping tactic 0x0000000000000003 due to exception Failed to create DLA runtime context. Hint: You can load at most 16 DLA loadables simultaneously per core. Attempting to load more will cause context allocation to fail.
[05/08/2024-10:14:17] [V] [TRT] Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[05/08/2024-10:14:17] [V] [TRT] *************** Autotuning format combination: Int8(1024,1024:32,32,1) -> Int8(65536,1024,64,1) ***************
[05/08/2024-10:14:17] [V] [TRT] --------------- Timing Runner: {ForeignNode[/cnn/cnn.0/Conv]} (DLA)
[05/08/2024-10:14:17] [V] [TRT] Setting a default quantization params because quantization data is missing for {ForeignNode[/cnn/cnn.0/Conv]}
[05/08/2024-10:14:17] [W] [TRT] Skipping tactic 0x0000000000000003 due to exception Failed to create DLA runtime context. Hint: You can load at most 16 DLA loadables simultaneously per core. Attempting to load more will cause context allocation to fail.
[05/08/2024-10:14:17] [V] [TRT] Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[05/08/2024-10:14:17] [V] [TRT] *************** Autotuning format combination: Int8(1024,1024:32,32,1) -> Int8(512,256:32,16,1) ***************
[05/08/2024-10:14:17] [V] [TRT] --------------- Timing Runner: {ForeignNode[/cnn/cnn.0/Conv]} (DLA)
[05/08/2024-10:14:17] [V] [TRT] Setting a default quantization params because quantization data is missing for {ForeignNode[/cnn/cnn.0/Conv]}
[05/08/2024-10:14:17] [W] [TRT] Skipping tactic 0x0000000000000003 due to exception Failed to create DLA runtime context. Hint: You can load at most 16 DLA loadables simultaneously per core. Attempting to load more will cause context allocation to fail.
[05/08/2024-10:14:17] [V] [TRT] Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf

Based on the log, the model already meets the DLA loadable limit.
In such case, TensorRT should fall back to GPU for inference but it doesn’t.

Could you test it on JetPack 5.1.3? The latest software for XavierNX?

Thanks.

system · June 5, 2024, 6:44am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can not make tensorrt work on DLA (Jetson Xavier) Jetson AGX Xavier tensorrt , dla	3	607	October 18, 2021
Cannot create DLA engine using trtexec Jetson Xavier NX tensorrt	2	1558	October 18, 2021
Cannot create DLA engine using trtexec on Xavier Jetson AGX Xavier tensorrt , dla	8	1057	July 1, 2022
Can't build engines with the DLA + INT8 in Jetson Xavier NX Jetson Xavier NX tensorrt , dla	13	1312	October 18, 2021
Trt_pose on DLA Jetson Xavier NX dla	13	1772	June 25, 2021
Trtexec log problem and use DLA error on Jetson Xavier Jetson AGX Xavier dla	7	1608	October 18, 2021
Engine creation fails when using DLA with GPU fallback Jetson AGX Xavier tensorrt , dla	11	2067	March 22, 2022
Cannot build a TensorRT engine for DLA from a large ONNX file Jetson Xavier NX tensorrt , nvbugs , dla	12	2671	July 21, 2021
Trtexec failed to generate engine (Internal Error) with DLA Jetson Orin NX tensorrt , nvbugs , dla	7	1107	April 8, 2024
Failed to bind input tensor Jetson AGX Xavier tensorrt , dla	6	839	May 18, 2022

Unable to build tensorrt engine with DLA enabled on Jetson Xavier NX

Description

Environment

Relevant Files

Steps To Reproduce

Related topics