hi,
I’m using nsight system cli with version
$ nsys --version
NVIDIA Nsight Systems version 2022.2.1.31-5fe97ab
But when I use -t cuda, FATAL ERROR occured and qdstrm is broken.
nvidia@tegra-ubuntu:/usr/local/cuda/samples/0_Simple/vectorAdd$ nsys profile --cudabacktrace=all -t cuda,cudnn,nvtx,mpi --output=./ ./vectorAdd
WARNING: ARMv8 PMU is not available, enabling `sampling-trigger=perf` switch, software events will be used for CPU sampling.
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
Generating '/tmp/nsys-report-6178.qdstrm'
FATAL ERROR: /build/agent/work/20a3cfcd1c25021d/QuadD/Common/GpuTraits/Src/GpuTicksConverter.cpp(376): Throw in function QuadDCommon::TimestampType GpuTraits::GpuTicksConverter::ConvertToCpuTime(const QuadDCommon::Uuid&, uint64_t&) const
Dynamic exception type: boost::exception_detail::clone_impl<QuadDCommon::NotFoundException>
std::exception::what: NotFoundException
[QuadDCommon::tag_message*] = No GPU associated to the given UUID
wil not happen when cuda is not in trace list
nvidia@tegra-ubuntu:/usr/local/cuda/samples/0_Simple/vectorAdd$ nsys profile --cudabacktrace=all -t cudnn,nvtx,mpi --output=./ ./vectorAdd
WARNING: ARMv8 PMU is not available, enabling `sampling-trigger=perf` switch, software events will be used for CPU sampling.
WARNING: CUDA backtraces will not be collected because CUDA tracing is disabled.
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
Generating '/tmp/nsys-report-1936.qdstrm'
Failed to create '/usr/local/cuda-11.4/samples/0_Simple/vectorAdd/./.nsys-rep': Permission denied.
[1/1] [========================100%] nsys-report-dc7e.nsys-rep
Generated:
/tmp/nsys-report-dc7e.nsys-rep

