DRIVE OS Version: DRIVE OS version. 7.0.3.0-40740537
Hi everyone,
I’m running on an ARM platform with a DRIVE Thor (ThorU) SoC. I’m trying to profile a TensorRT engine using both Nsight Systems and Nsight Compute, but I’m unable to capture GPU utilization or memory bandwidth metrics for the operators.
Nsight Systems command:
bash
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/aarch64-linux-gnu/nsight-systems/target-linux-sbsa-armv8:/usr/lib/aarch64-linux-gnu/nsight-systems/target-linux-sbsa-armv8
nsys profile \
--trace=cuda,nvtx \
--stats=true \
--gpu-metrics-device=all \
--gpu-metrics-set=0 \
--cuda-graph-trace=node \
--force-overwrite true \
-o test \
trtexec \
--loadEngine=test_nodata.trt \
--fp16
Issue with Nsight Systems:
I can see kernel latency information, but GPU utilization and bandwidth utilization metrics are missing from the output.
Nsight Compute command:
bash
/usr/local/NVIDIA-Nsight-Compute/ncu \
--target-processes all \
--kernel-name regex:.+ \
--kernel-name-base function \
--metrics all \
--csv \
--print-summary per-kernel \
--export ncu_trt_report \
--force-overwrite \
trtexec \
--loadEngine=test.trt \
--fp16
Issue with Nsight Compute:
No kernel information is captured at all.
System environment:
-
Platform: DRIVE AGX Thor (ARM64/aarch64)
-
Drive OS version: [Please fill in your version]
-
TensorRT version: [Please fill in]
Has anyone successfully profiled GPU metrics on Thor using these tools? Are there any specific flags, environment variables, or alternative methods required for the Thor platform? Any guidance would be greatly appreciated.
Thanks in advance!