When I use Nsight DL for remote profiling an ONNX file I get the error listed at the end of the post.
I am running Nsight on windows and connecting by ssh to a jetson orin nano, running jetpack 6.2
I am wondering about the path /home/nvidia/Documents/onnxruntime which is not on remote or host. Also wondering if I am missing some installation on remote?
: /home/nvidia/Documents/onnxruntime/onnxruntime/core/session/provider_bridge_ort.cc:1695 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.38' not found (required by /home/nvidia/libonnxruntime_providers_cuda.so)
Thanks for visiting the NVIDIA Developer Forums.
To ensure better visibility and support, I’ve moved your post to the Jetson category where it’s more appropriate
By further inspection, it seems to be a bug in Nsight Deep Learning Designer 2025.3. It is compiled against glibc version 2.38, but Jetson Orin Nano only has glibc 2.35. (System requirements state GLIBC version 2.29).
Yes - I tried to install Nsight Deep Learning Designer on Jetson Orin Nano also. I have tried to profile directly on target and again after install, remote profiling. Both failed.
I can try to add the error output from on target profiling later today.
Is it possible to try the previous version of Nsight Deep Learning Designer also? Does there exist a download link for the previously released versions?
Just test it on a JetPack 6.2.1 environment and it can work correctly.
Could you try it again? Please note that you will need to run it with sudo to get the GPU trace data.
$ sudo /opt/nvidia/nsight_dl/2025.3.25220.1113/target/linux-v4l_l4t-dl-t210-a64/ndld-prof /usr/src/tensorrt/data/mnist/mnist.onnx
[INF] Could not determine the previous TensorRT version. Will attempt to load from the local environment.
...
[INF] Completed all per-layer measurement passes.
[INF] Generating report data.
[INF] Profiling operations complete.
Median end-to-end latency: 0.11056 ms
Fastest end-to-end latency: 0.10182 ms
Slowest end-to-end latency: 0.14506 ms
Median GPU compute inference time: 0.10578 ms
Fastest GPU compute inference time: 0.096704 ms
Slowest GPU compute inference time: 0.13469 ms
Median input H2D copy time: 0.002544 ms
Median output D2H copy time: 0.002336 ms
Top 6 layer inference times within the median pass:
0.0577 ms Convolution28 + Parameter6 + ONNXTRT_Broadcast + Plus30 + ReLU32
0.0281 ms Convolution110 + Parameter88 + ONNXTRT_Broadcast_10 + Plus112 + ReLU114
0.0197 ms __myl_MulSumAdd_myl4_1
0.0188 ms Pooling160
0.0154 ms Pooling66
0.00246 ms dummy_shape_call__mye602_0_myl4_0
Network metric values:
SMs Active: 22.9 %
DRAM Read Throughput: 0.432 %
DRAM Write Throughput: 0.134 %
Tensor Active: 0.197 %
Compute Warps in Flight: 2.93 %