I’ve been trying to debug a CUPTI_ERROR_INVALID_DEVICE error I’ve been getting from other software. A complicating factor is that I have two versions of CUDA installed – I’m using CUDA runtime 11.8 and the included version of nsys to do this, but the “CUDA Driver” reported by nvidia-smi is 12.2, and my GPU driver is version 535.86.05.
I get the following error when trying to profile one of the CUDA demo scripts (report file attached):
Events fetch failed: Source ID=
Type=ErrorInformation (18)
Error information:
ProcessEventsError (4005)
Properties:
ErrorText (100)=/build/agent/work/323cb361ab84164c/QuadD/Host/Analysis/EventHandler/TraceEventHandler.cpp(562): Throw in function void QuadDAnalysis::EventHandler::TraceEventParser::operator()(const QuadDCommon::FlatComm::Cuda::Event&)
Dynamic exception type: boost::wrapexcept
std::exception::what: InternalErrorException
[QuadDCommon::tag_message*] = Unrecognized GPU UUID: f88f7016-d57c-1856-9eb9-7c200786f0ce
The profiler also says
Installed CUDA driver version (12.2) is not supported by this build of Nsight Systems. CUDA trace will be collected using libraries for driver version 11.8
Here’s my nsys status -e
output:
Timestamp counter supported: Yes
CPU Profiling Environment Check
Root privilege: disabled
Linux Kernel Paranoid Level = 1
Linux Distribution = arch
Linux Kernel Version = 6.4.8-arch1-1: OK
Linux perf_event_open syscall available: OK
Sampling trigger event available: OK
Intel(c) Last Branch Record support: Available
CPU Profiling Environment (process-tree): OK
CPU Profiling Environment (system-wide): Fail
What is going wrong? Do I need to install an older version of the drivers?
report1.nsys-rep (181.2 KB)