I have installed Nsight DL Designer 2025.4 on a Jetson Orin NX and I am trying to “Profile TensorRT Model”.
It fail with the following error listed below. I also tried to “Export TensorRT Engine” which seems to complete. It seems like it it the subsequent profiling (after converting ONNX model) that fails. I am invoking the tool as root. I am using mnist.onnx from the tensorrt installation.
Preparing to launch the Profile TensorRT Model activity on localhost...
Using target packages from the system. Skipping deployment.
Launched process: DLDesignerWorker (pid: 43637)
DLDesignerWorker profile-trt --use-system-trt --config "/tmp/NVIDIA Nsight Deep Learning Designer-dFCBmY/trtconfig.json" /usr/src/tensorrt/data/mnist/mnist.onnx /usr/src/tensorrt/data/mnist/mnist-TRT.nv-dld-report
Launch succeeded.
[INF] Profiling on Orin (GA10B)
[INF] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 286, GPU 4264 (MiB)
[INF] [TRT] [MemUsageChange] Init builder kernel library: CPU +927, GPU +752, now: CPU 1256, GPU 5059 (MiB)
[INF] [TRT] ----------------------------------------------------------------
[INF] [TRT] Input filename: /usr/src/tensorrt/data/mnist/mnist.onnx
[INF] [TRT] ONNX IR version: 0.0.3
[INF] [TRT] Opset version: 8
[INF] [TRT] Producer name: CNTK
[INF] [TRT] Producer version: 2.5.1
[INF] [TRT] Domain: ai.cntk
[INF] [TRT] Model version: 1
[INF] [TRT] Doc string:
[INF] [TRT] ----------------------------------------------------------------
[INF] Building TensorRT engine. This may take some time.
[INF] [TRT] Global timing cache in use. Profiling results in this builder pass will be stored.
[INF] [TRT] Detected 1 inputs and 1 output network tensors.
[INF] [TRT] Total Host Persistent Memory: 18976
[INF] [TRT] Total Device Persistent Memory: 0
[INF] [TRT] Total Scratch Memory: 0
[INF] [TRT] [BlockAssignment] Started assigning block shifts. This will take 4 steps to complete.
[INF] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 0.015776ms to assign 2 blocks to 4 nodes requiring 31744 bytes.
[INF] [TRT] Total Activation Memory: 31744
[INF] [TRT] Total Weights Memory: 25704
[INF] [TRT] Engine generation completed in 0.0971114 seconds.
[INF] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 8 MiB
[INF] [TRT] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 1835 MiB
[INF] TensorRT engine build complete.
[INF] [TRT] Serialized 26 bytes of code generator cache.
[INF] [TRT] Serialized 4238735 bytes of compilation cache.
[INF] [TRT] Serialized 11949 timing cache entries
[INF] [TRT] Loaded engine size: 0 MiB
[INF] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[WRN] DL Designer does not implement clock controls on this platform.
[INF] Beginning 10 whole-network measurement passes.
[INF] Completed all whole-network measurement passes.
[INF] Beginning 10 per-layer measurement passes.
[INF] Completed all per-layer measurement passes.
[ERR] Failed to collect all whole-network metrics.
Process terminated.
Profiler encountered an error during execution: 0x1u.