Profile TensorRT Model on Orin NX

I have installed Nsight DL Designer 2025.4 on a Jetson Orin NX and I am trying to “Profile TensorRT Model”.

It fail with the following error listed below. I also tried to “Export TensorRT Engine” which seems to complete. It seems like it it the subsequent profiling (after converting ONNX model) that fails. I am invoking the tool as root. I am using mnist.onnx from the tensorrt installation.

Preparing to launch the Profile TensorRT Model activity on localhost...

Using target packages from the system. Skipping deployment.
Launched process: DLDesignerWorker (pid: 43637)



DLDesignerWorker profile-trt --use-system-trt --config "/tmp/NVIDIA Nsight Deep Learning Designer-dFCBmY/trtconfig.json" /usr/src/tensorrt/data/mnist/mnist.onnx /usr/src/tensorrt/data/mnist/mnist-TRT.nv-dld-report




Launch succeeded.

[INF] Profiling on Orin (GA10B)
[INF] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 286, GPU 4264 (MiB)
[INF] [TRT] [MemUsageChange] Init builder kernel library: CPU +927, GPU +752, now: CPU 1256, GPU 5059 (MiB)
[INF] [TRT] ----------------------------------------------------------------
[INF] [TRT] Input filename: /usr/src/tensorrt/data/mnist/mnist.onnx
[INF] [TRT] ONNX IR version: 0.0.3
[INF] [TRT] Opset version: 8
[INF] [TRT] Producer name: CNTK
[INF] [TRT] Producer version: 2.5.1
[INF] [TRT] Domain: ai.cntk
[INF] [TRT] Model version: 1
[INF] [TRT] Doc string: 
[INF] [TRT] ----------------------------------------------------------------
[INF] Building TensorRT engine. This may take some time.

[INF] [TRT] Global timing cache in use. Profiling results in this builder pass will be stored.
[INF] [TRT] Detected 1 inputs and 1 output network tensors.
[INF] [TRT] Total Host Persistent Memory: 18976
[INF] [TRT] Total Device Persistent Memory: 0
[INF] [TRT] Total Scratch Memory: 0
[INF] [TRT] [BlockAssignment] Started assigning block shifts. This will take 4 steps to complete.
[INF] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 0.015776ms to assign 2 blocks to 4 nodes requiring 31744 bytes.
[INF] [TRT] Total Activation Memory: 31744
[INF] [TRT] Total Weights Memory: 25704
[INF] [TRT] Engine generation completed in 0.0971114 seconds.
[INF] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 8 MiB
[INF] [TRT] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 1835 MiB
[INF] TensorRT engine build complete.
[INF] [TRT] Serialized 26 bytes of code generator cache.
[INF] [TRT] Serialized 4238735 bytes of compilation cache.
[INF] [TRT] Serialized 11949 timing cache entries
[INF] [TRT] Loaded engine size: 0 MiB
[INF] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[WRN] DL Designer does not implement clock controls on this platform.

[INF] Beginning 10 whole-network measurement passes.
[INF] Completed all whole-network measurement passes.
[INF] Beginning 10 per-layer measurement passes.
[INF] Completed all per-layer measurement passes.
[ERR] Failed to collect all whole-network metrics.
Process terminated.
Profiler encountered an error during execution: 0x1u.

Thanks for reporting this issue. Based on the log you shared, this definitely looks like a profiler-side problem.

To aid our investigation, can you share which driver and TensorRT versions you encountered the issue with? You can use nvidia-smi to get the driver version.

NVIDIA Jetson Orin NX

L4T: 36.4.7

Jetpack 6.2.1

CUDA: 12.6.68

TensorRT : 10.3.0.30

Driver Version: 540.4.0

@cvanderknyff : Was the info posted useful for you? Do you need more details?

Hi,

Chris is OOTO due to Thanksgiving holiday. We will continue investigation and report back once he is back to office.

1 Like

I spent some time trying to repro this issue on Orin devices, but all of my profiling attempts succeeded on the JetPack 6.x devices I had access to. (L4T 36.4.3/JP 6.2 and L4T 36.3/JP 6.0). This includes both TensorRT 10.14.1 and 10.3.

My recommendation is to try reflashing the board and/or downgrading to JetPack 6.2. While your TensorRT version is old, changing it is unlikely to help.

Unfortunately, our next upcoming release is based on CUDA 13 and JetPack 7, so Orin devices will soon be temporarily unsupported by Nsight DL Designer until JetPack 7.2 reintegrates Orin support. The Jetson Roadmap currently has this scheduled for Q1 2026.