TensorRT Conversion Fails On Orin Nano

Hi,
I’m trying to convert an ONNX model to TensorRT using FP16 precision on my Orin Nano 8GB and failed. Here’s the last portion of the log when the conversion failed:

[01/01/1970-10:01:52] [V] [TRT] --------------- Timing Runner: {ForeignNode[/Concat_429_output_0.../Concat_431]} (Myelin[0x80000023])
[01/01/1970-10:01:53] [V] [TRT] [MemUsageChange] Subgraph create: CPU +72, GPU +0, now: CPU 4883, GPU 7414 (MiB)
[01/01/1970-10:04:11] [V] [TRT]  (foreignNode) Set user's cuda kernel library
[01/01/1970-10:04:11] [V] [TRT] Subgraph compilation completed in 137.516 seconds.
[01/01/1970-10:04:11] [V] [TRT] [MemUsageChange] Subgraph compilation: CPU +16, GPU -116, now: CPU 4899, GPU 7298 (MiB)
[01/01/1970-10:04:11] [W] [TRT] Tactic Device request: 399MB Available: 323MB. Device memory is insufficient to use tactic.
[01/01/1970-10:04:11] [W] [TRT] UNSUPPORTED_STATE: Skipping tactic 0 due to insufficient memory on requested size of 419033088 detected for tactic 0x0000000000000000.
[01/01/1970-10:04:11] [V] [TRT] {ForeignNode[/Concat_429_output_0.../Concat_431]} (Myelin[0x80000023]) profiling completed in 138.763 seconds. Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[01/01/1970-10:04:11] [V] [TRT] *************** Autotuning format combination: Half(258048,1:8,2016,28), Half(258048,1:8,2016,28), Half(258048,1:8,2016,28), Half(18432,1:8,144,2), Half(18432,1:8,144,2) -> Half(2654208,82944,1:8,648,9) ***************
[01/01/1970-10:04:11] [V] [TRT] --------------- Timing Runner: {ForeignNode[/Concat_429_output_0.../Concat_431]} (Myelin[0x80000023])
[01/01/1970-10:04:12] [V] [TRT] [MemUsageChange] Subgraph create: CPU +71, GPU +3, now: CPU 4954, GPU 7303 (MiB)
[01/01/1970-10:06:12] [V] [TRT]  (foreignNode) Set user's cuda kernel library
[01/01/1970-10:06:12] [V] [TRT] Subgraph compilation completed in 120.025 seconds.
[01/01/1970-10:06:12] [V] [TRT] [MemUsageChange] Subgraph compilation: CPU +16, GPU -68, now: CPU 4970, GPU 7235 (MiB)
[01/01/1970-10:06:13] [W] [TRT] Tactic Device request: 438MB Available: 390MB. Device memory is insufficient to use tactic.
[01/01/1970-10:06:13] [W] [TRT] UNSUPPORTED_STATE: Skipping tactic 0 due to insufficient memory on requested size of 459730944 detected for tactic 0x0000000000000000.
[01/01/1970-10:06:13] [V] [TRT] {ForeignNode[/Concat_429_output_0.../Concat_431]} (Myelin[0x80000023]) profiling completed in 121.248 seconds. Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[01/01/1970-10:06:14] [E] Error[10]: IBuilder::buildSerializedNetwork: Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/Concat_429_output_0.../Concat_431]}.)
[01/01/1970-10:06:14] [E] Engine could not be created from network
[01/01/1970-10:06:14] [E] Building engine failed
[01/01/1970-10:06:14] [E] Failed to create engine from model or file.
[01/01/1970-10:06:14] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v100300] # ./trtexec --onnx=/home/nvidia/foundation_stereo/p2.onnx --saveEngine=./test.engine --fp16 --verbose

It seems the failure is related to insufficient memory?
I also tested converting this ONNX model to TensorRT with FP16 precision on my local PC (with an RTX 4070 Ti Super), and succeed. The converted TensorRT engine takes about 1.8GB memory when inferencing.

I have TensorRT-10.7.0.23 on my local PC, and TensorRT-10.3.0.30 on my Jetson Orin Nano (Jetpack 6.1)

Any suggestion? Thanks in advance for your help.

Best regards

Hi,

Please increase builderOptimizationLevel to allow TensorRT to spend more building time for more optimization options.

Ex.

$ /usr/src/tensorrt/bin/trtexec --builderOptimizationLevel=4 ...

Or

$ /usr/src/tensorrt/bin/trtexec --builderOptimizationLevel=5 ...

Thanks.

Following your suggested approach, I’ve successfully resolved the issue! Really appreciate your help!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.