"Unrecognized GPU UUID" error in nsys profiler

I’ve been trying to debug a CUPTI_ERROR_INVALID_DEVICE error I’ve been getting from other software. A complicating factor is that I have two versions of CUDA installed – I’m using CUDA runtime 11.8 and the included version of nsys to do this, but the “CUDA Driver” reported by nvidia-smi is 12.2, and my GPU driver is version 535.86.05.

I get the following error when trying to profile one of the CUDA demo scripts (report file attached):

Events fetch failed: Source ID=
Type=ErrorInformation (18)
 Error information:
 ProcessEventsError (4005)
  Properties:
  ErrorText (100)=/build/agent/work/323cb361ab84164c/QuadD/Host/Analysis/EventHandler/TraceEventHandler.cpp(562): Throw in function void QuadDAnalysis::EventHandler::TraceEventParser::operator()(const QuadDCommon::FlatComm::Cuda::Event&)
Dynamic exception type: boost::wrapexcept
std::exception::what: InternalErrorException
[QuadDCommon::tag_message*] = Unrecognized GPU UUID: f88f7016-d57c-1856-9eb9-7c200786f0ce

The profiler also says

Installed CUDA driver version (12.2) is not supported by this build of Nsight Systems. CUDA trace will be collected using libraries for driver version 11.8

Here’s my nsys status -e output:

Timestamp counter supported: Yes

CPU Profiling Environment Check

Root privilege: disabled
Linux Kernel Paranoid Level = 1
Linux Distribution = arch
Linux Kernel Version = 6.4.8-arch1-1: OK
Linux perf_event_open syscall available: OK
Sampling trigger event available: OK
Intel(c) Last Branch Record support: Available
CPU Profiling Environment (process-tree): OK
CPU Profiling Environment (system-wide): Fail

What is going wrong? Do I need to install an older version of the drivers?

report1.nsys-rep (181.2 KB)

This looks like a known issue with some versions of the CTK and some laptop GPUs.

There is a fix for with Nsys that shipped in 2023.3 (which just went live yesterday) so I would recommend that you download that and give it a try (Nsight Systems | NVIDIA Developer)

Thanks! That seems to have fixed the problem (though nsys status -e is still showing CPU Profiling Environment (system-wide): Fail).

Your paranoia level is set to 1, at that level you can get your own process tree backtraces, but you cannot get systemwide:

Paranoid Level CPU IP/backtrace Sampling process-tree mode CPU IP/backtrace Sampling system-wide mode CPU Context Switch Tracing process-tree mode CPU Context Switch Tracing system-wide mode Event Sampling system-wide mode
3 or greater not available not available not available not available not available
2 User mode IP/backtrace samples only not available available not available not available
1 Kernel and user mode IP/backtrace samples not available available not available not available
0, -1 Kernel and user mode IP/backtrace samples Kernel and user mode IP/backtrace samples available available hardware and OS events

Oh, that makes sense. Thank you for explaining!