I tried to use nsys to profile a cuda kernel function, but encountered this problem, I tried execute the command with or without root, but none of it works. The nsys version is 2022.3.3.18-4d5367b and Jetpack version is 5.0.2-b231. The command and output are as following:
root@nvidia-desktop:/home/nvidia/Desktop/cuda-test/build# nsys profile ./cuda_test
GPU time is 1263.071289ms
Generating ‘/tmp/nsys-report-2a58.qdstrm’
FATAL ERROR: /build/agent/work/323cb361ab84164c/QuadD/Common/GpuTraits/Src/GpuTicksConverter.cpp(376): Throw in function QuadDCommon::TimestampType GpuTraits::GpuTicksConverter::ConvertToCpuTime(const QuadDCommon::Uuid&, uint64_t&) const
Dynamic exception type: boost::wrapexceptQuadDCommon::NotFoundException
std::exception::what: NotFoundException
[QuadDCommon::tag_message*] = No GPU associated to the given UUID`
I also tried to execute remote profile as mentioned in Nsys cli cannot trace cuda - #5 by richsheep, but failed to build connection between my win11 host and device through Nsight System. It’s so weird, the host can connect device through powshell or vscode with ssh, but they just can’t get connect through Nsight. Can anybody give some help🙏. I was stuck by this problem for almost two days.

Hi,
We don’t meet the issue with JetPack 5.1.2.
Could you update to the latest JetPack 5 and try it again?
$ sudo /opt/nvidia/nsight-systems/2023.2.4/target-linux-tegra-armv8/nsys --version
NVIDIA Nsight Systems version 2023.2.4.44-33011852v0
$ sudo /opt/nvidia/nsight-systems/2023.2.4/target-linux-tegra-armv8/nsys profile ./vectorAdd
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
Generating '/tmp/nsys-report-be38.qdstrm'
[1/1] [========================100%] report1.nsys-rep
Generated:
/usr/local/cuda-11.4/samples/0_Simple/vectorAdd/report1.nsys-rep
Thanks.
Hi,
Thanks for replying. How to update jetpack? Do I have to update both BSP and L4T through sdk-manager. Or is there any method to update only Jetpack OTA.
I pull a docker image of Jetpack 5.1, nsys can work inside the container.
Hi,
Did you get the expected result with the nsys in the JetPack 5.1 container?
Thanks.
Hi,
It can collect cuda events, but unable to configure the collection of CPU IP/backtrace samples, context switch data, or event sampling data
root@nvidia-desktop:/workspace/cuda-test/build# /opt/nvidia/nsight-systems/2022.5.2/target-linux-tegra-armv8/nsys profile ./cuda_test
WARNING: CPU sampling in a Docker container requires `--pid=host` Docker option or `--sampling-trigger=perf` NSys option, disabling.
WARNING: CPU sampling in a Docker container requires `--privileged=true` Docker option, disabling.
WARNING: 'timer' backtrace collection trigger will not be used because sampling is disabled.
WARNING: 'sched' backtrace collection trigger will not be used because sampling is disabled.
GPU time is 1734.069458ms
Generating '/tmp/nsys-report-2b5a.qdstrm'
[1/1] [========================100%] report6.nsys-rep
Generated:
/workspace/cuda-test/build/report6.nsys-rep
I tried add option --sampling-trigger=perf
, the warning disapeared but result retain the same
And I tried add
--pid=host
or
--privileged=true
option, but got like
unrecognised option '--privileged=true'
Thanks.
There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks
Hi,
Sorry for the late update.
Do you want to profile an application within the container?
Or the app runs on Jetson natively but nsys doesn’t collect the data?
Thanks.