Unable to capture GPU utilization and bandwidth metrics with Nsight Systems / Nsight Compute on DRIVE Thor (ARM)

DRIVE OS Version: Provide DRIVE OS version. Example: 7.0.3

Hi everyone,

I’m running on an ARM platform with a DRIVE Thor (ThorU) SoC. I’m trying to profile a TensorRT engine using both Nsight Systems and Nsight Compute, but I’m unable to capture GPU utilization or memory bandwidth metrics for the operators.

Nsight Systems command:

bash

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/aarch64-linux-gnu/nsight-systems/target-linux-sbsa-armv8:/usr/lib/aarch64-linux-gnu/nsight-systems/target-linux-sbsa-armv8

nsys profile \
    --trace=cuda,nvtx \
    --stats=true \
    --gpu-metrics-device=all \
    --gpu-metrics-set=0 \
    --cuda-graph-trace=node \
    --force-overwrite true \
    -o test \
    trtexec \
    --loadEngine=test_nodata.trt \
    --fp16

Issue with Nsight Systems:
I can see kernel latency information, but GPU utilization and bandwidth utilization metrics are missing from the output.

Nsight Compute command:

bash

/usr/local/NVIDIA-Nsight-Compute/ncu \
    --target-processes all \
    --kernel-name regex:.+ \
    --kernel-name-base function \
    --metrics all \
    --csv \
    --print-summary per-kernel \
    --export ncu_trt_report \
    --force-overwrite \
    trtexec \
    --loadEngine=test.trt \
    --fp16

Issue with Nsight Compute:
No kernel information is captured at all.

System environment:

  • Platform: DRIVE AGX Thor (ARM64/aarch64)

  • Drive OS version: [Please fill in your version]

  • TensorRT version: [Please fill in]

Has anyone successfully profiled GPU metrics on Thor using these tools? Are there any specific flags, environment variables, or alternative methods required for the Thor platform? Any guidance would be greatly appreciated.

Thanks in advance!

Dear @kaiheng.weng ,
Let me check reproduce the issue locally and get back to you.

Dear @SivaRamaKrishnaNV

Hope you’re doing well.Just gently checking in on the progress of reproducing the issue we discussed earlier. Please let me know if you need any additional information from my side.Thanks a lot for your time and help.

You meant like below timeline has just latency information ?

I see SM active info in nv rep. Can you share your nvrep file?

Yes, I just see the latency info and didn’t see the SM GPU utilization and bandwidth info.
My nsight system only produced qdstrm, and there is no nsys-rep.

Could you share your nsys files?

Could you please provide any update for this topic?