Remote profiling errors - unable to load CUDA profiling library

chen.dadon · February 26, 2025, 2:58pm

Hi, im trying to profiling a dummy model on orin - DRIVE AGX. and im getting the following errors:
Launching: /home/orin/nsight-dl-designer/DLDesignerWorker
Process launched
[ERR] Unable to load CUDA profiling library.
[INF] The current user is not a member of the debug group. You may need to join that group in order to profile.
Launched application returned 1 (0x1).
Retrieving /home/orin/nsight-dl-designer/trtreport.nv-dld-report to /home/yz9qvs/projects/deconv_bev3_new_head_network_only_fs_cam_only/nvidia_x86/trt/fp16_no_custom_io/deconv_bev3_new_head_network_only_fs_cam_only-TRT.nv-dld-report
Failed to retrieve /home/orin/nsight-dl-designer/trtreport.nv-dld-report
Failed to retrieve files.
Profiler encountered an error during execution.

couldn’t found anything on those errors:
any help would be great.

cvanderknyff · February 26, 2025, 4:44pm

DL Designer does not support NVIDIA DriveOS as a target platform.

The primary error in your logs is the line prefixed by [ERR], indicating that the GPU hardware profiler could not be started. The subsequent errors are downstream failures; the profiler exited before it could do anything, so no report could be generated for transfer back to the DL Designer host machine.

chen.dadon · February 27, 2025, 8:21am

the docs clearly state:

What platform does Nsight DL Designer run on?

We currently support Windows, Linux, and L4T.

after connecting with root privileges, ive passed the previous error, but encountered the following:

[INF] [TRT] [MemUsageChange] Init CUDA: CPU +313, GPU +0, now: CPU 695, GPU 6747 (MiB)

[INF] [TRT] [MemUsageChange] Init builder kernel library: CPU +944, GPU +1112, now: CPU 1682, GPU 7904 (MiB)

[INF] [TRT] ----------------------------------------------------------------

[INF] [TRT] Input filename:   /root/nsight-dl-designer/deconv_bev3_new_head_network_only_fs_cam_only.onnx

[INF] [TRT] ONNX IR version:  0.0.8

[INF] [TRT] Opset version:    17

[INF] [TRT] Producer name:    pytorch

[INF] [TRT] Producer version: 2.4.0

[INF] [TRT] Domain:           

[INF] [TRT] Model version:    0

[INF] [TRT] Doc string:       

[INF] [TRT] ----------------------------------------------------------------

[INF] Building TensorRT engine. This may take some time.

[INF] [TRT] Global timing cache in use. Profiling results in this builder pass will be stored.

[INF] Estimated builder progress: 2.57%...

[INF] Estimated builder progress: 9.49%...

[INF] Estimated builder progress: 13.5%...

[INF] Estimated builder progress: 19%...

[INF] Estimated builder progress: 22.7%...

[INF] Estimated builder progress: 27%...

[INF] Estimated builder progress: 28.6%...

[INF] Estimated builder progress: 34.3%...

[INF] [TRT] Compiler backend is used during engine build.

[INF] Estimated builder progress: 36.2%...

[INF] Estimated builder progress: 40.3%...

[INF] Estimated builder progress: 46.2%...

[INF] Estimated builder progress: 47.1%...

[INF] [TRT] Detected 3 inputs and 2 output network tensors.

[INF] [TRT] Total Host Persistent Memory: 187072 bytes

[INF] [TRT] Total Device Persistent Memory: 0 bytes

[INF] [TRT] Max Scratch Memory: 25165824 bytes

[INF] [TRT] [BlockAssignment] Started assigning block shifts. This will take 47 steps to complete.

[INF] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 1.07386ms to assign 5 blocks to 47 nodes requiring 106315776 bytes.

[INF] [TRT] Total Activation Memory: 106315776 bytes

[INF] [TRT] Total Weights Memory: 52303900 bytes

[INF] [TRT] Compiler backend is used during engine execution.

[INF] [TRT] Engine generation completed in 30.9241 seconds.

[INF] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 9 MiB, GPU 128 MiB

[INF] [TRT] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 2074 MiB

[INF] TensorRT engine build complete.

[INF] [TRT] Serialized 27 bytes of code generator cache.

[INF] [TRT] Serialized 50298 bytes of compilation cache.

[INF] [TRT] Serialized 328 timing cache entries

[WRN] The timing cache could not be saved.

[INF] Exception text: /root/.config/NVIDIA Corporation/NVIDIA Nsight Deep Learning Designer/timing_cache.10.5.0.9.bin: No such file or directory

[INF] [TRT] Loaded engine size: 50 MiB

[INF] [TRT] [MS] Running engine with multi stream info

[INF] [TRT] [MS] Number of aux streams is 3

[INF] [TRT] [MS] Number of total worker streams is 4

[INF] [TRT] [MS] The main stream provided by execute/enqueue calls is the first worker stream

[INF] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +102, now: CPU 0, GPU 151 (MiB)

[WRN] DL Designer does not implement clock controls on this platform.

[INF] Beginning 10 whole-network measurement passes.

[INF] Completed all whole-network measurement passes.

[INF] Beginning 10 per-layer measurement passes.

[INF] Completed all per-layer measurement passes.

[ERR] Primary GPU sampling range should have remained open.

Launched application returned 1 (0x1).
Retrieving /root/nsight-dl-designer/trtreport.nv-dld-report to /home/yz9qvs/projects/deconv_bev3_new_head_network_only_fs_cam_only/nvidia_x86/trt/fp16_no_custom_io/deconv_bev3_new_head_network_only_fs_cam_only-TRT.nv-dld-report
Failed to retrieve /root/nsight-dl-designer/trtreport.nv-dld-report
Failed to retrieve files.
Profiler encountered an error during execution.

gxie · February 27, 2025, 4:01pm

Sorry for the confusion. The L4T we officially support is the Jetson system. We have not validated NDLD on the DRIVE systems, but we will look into the specific profiler error: “[ERR] Primary GPU sampling range should have remained open.”

Thank you for the feedback and it is much appreciated.

Topic		Replies	Views
Errors while running Drive samples DRIVE AGX Orin General driveos-cuda	9	826	July 8, 2023
Error in executing TensorRT samples through docker container environment DRIVE AGX Orin General docker , driveos-dl	14	102	October 24, 2024
How to use Nsight Compute to profile a single CUDA kernel on different processes DRIVE AGX Orin General nsight , driveos-cuda , driveos-tools , ncu	4	33	April 10, 2025
Unable to run tensorRT on DRIVE OS 6.0 DRIVE AGX Orin General driveos-dl	5	36	May 14, 2025
Upgrading CUDA for Autoware Compatibility and tensorrt libs not Accessible Inside the l4t-jetpack DRIVE AGX Orin General driveos-cuda	10	882	January 22, 2024
GPU not detected in NVIDIA DRIVE AGX Orin DevKit after flashing Drive OS 6.0.10 DRIVE AGX Orin General driveos	5	21	April 21, 2025
TensorRT 10.8 on Windows: API Usage Error (Target GPU SM 120 is not supported by this TensorRT release.) TensorRT cudnn	3	358	March 27, 2025
Cannot profile RTX 2060 KO (TU104) with CUDA 11.0 on windows and ubuntu Visual Profiler and nvprof nvbugs	8	2758	July 27, 2020
NVIDIA NSight Compute: The profiler returned an error code:1 Nsight Compute	13	1941	March 18, 2024
getPluginCreator could not find plugin BatchedNMS_TRT version 1 TensorRT	5	4018	December 23, 2020

Remote profiling errors - unable to load CUDA profiling library

What platform does Nsight DL Designer run on?

Related topics