Issues with using the nsight and ncu

I tried to remotely analyze CUDA applications on WSL using nsys-ui and ncu-ui installed on Win10,
but encountered some errors. Does anyone know how to solve it?

1)nsys-ui error:
Full error information:
RuntimeError (120) {
RuntimeError (120) {
OriginalExceptionClass: struct boost::wrapexcept
OriginalFile: C:\dvs\p4\build\sw\devtools\Agora\Rel\QuadD_Main\QuadD\Host\Analysis\Clients\AnalysisHelper\AnalysisStatus.cpp
OriginalLine: 81
OriginalFunction: class Nvidia::QuadD::Analysis::Data::AnalysisStatusInfo __cdecl QuadDAnalysis::AnalysisHelper::AnalysisStatus::MakeFromErrorString(enum Nvidia::QuadD::Analysis::Data::AnalysisStatus,enum Nvidia::QuadD::Analysis::Data::AnalysisErrorType::Type,const class std::basic_string<char,struct std::char_traits,class std::allocator > &,const class boost::intrusive_ptr &)
ErrorText: /dvs/p4/build/sw/devtools/Agora/Rel/QuadD_Main/QuadD/Target/Daemon/TimeConversion.cpp(311): Throw in function int64_t QuadDDaemon::PostMortemTimeConverter::ConvertGpuTicksToSyncNs(const QuadDCommon::Uuid&, int64_t) const
Dynamic exception type: boost::wrapexceptQuadDCommon::InternalErrorException
std::exception::what: InternalErrorException

}

}

2)ncu-ui error:
==PROF== Connected to ncu-ui at 127.0.0.1:50152.
==PROF== Connected to process 1511 (/home/ray/proj/cpp/cuda/exercise02/00/a.out)
==PROF== Connected to process 1511 (/home/ray/proj/cpp/cuda/exercise02/00/a.out)

==ERROR== An error was reported by the counter measurement library:
==ERROR== Failed to initialize the profiler: LibraryNotLoaded. Check that a compatible driver library is loaded.
==PROF== Trying to shutdown target application
==ERROR== An error was reported by the counter measurement library:
==ERROR== Failed to initialize the profiler: LibraryNotLoaded. Check that a compatible driver library is loaded.
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).

3)a issue while starting nsys-ui on WSL:
OpenGL version is too low (0). Falling back to Mesa software rendering.
Skipping OpenGL version check on WSL.
OpenGL version: “”

[Hardware]
NVIDIA GeForce GTX 1080 Ti

[Software]
OS: Windows 10
NVIDIA Driver Version: 572.16
CUDA Version: 12.8

[Nsight and ncu]
os: win10
NVIDIA Nsight Systems version 2025.3.1.90-253135822126v0
NsightCompute Version 2025.2.1.0 (build 35987062) (public-release)

os: wsl2 (nsys and ncu are include in cuda12.8 toolkit)
NVIDIA Nsight Systems version 2024.6.2.225-246235244400v0
NsightCompute Version 2025.1.1.0 (build 35528883) (public-release)

---- win10 ----
F:\Nsight\target-windows-x64>nsys status -e
Timestamp counter supported: Yes

Sampling Environment Check
Administrator privileges: No
Sampling Environment: Fail

— wsl —
nsys status -e
Timestamp counter supported: No

CPU Profiling Environment Check
Root privilege: disabled
Linux Kernel Paranoid Level = 2
Linux Distribution = Ubuntu
Linux Kernel Version = 6.6.87.1-microsoft-standard-WSL2: OK
Linux perf_event_open syscall available: OK
Sampling trigger event available: OK
Intel(c) Last Branch Record support: Available
CPU Profiling Environment (process-tree): OK
CPU Profiling Environment (system-wide): Fail

Hi, @rayment

Please make sure your environment meet below requirement. 1. NVIDIA GPU Accelerated Computing on WSL 2 — CUDA on WSL 12.9 documentation

Hi, @veraj. Thank you for your response. cuda development environment is

Windows 10 Professional Edition OS build : 19045.6093.

win10 host nvidia driver has update to 572.83 .

wsl version is :
WSL version: 2.5.7.0
kernel version: 6.6.87.1-1
WSLg version: 1.0.66
MSRDC version: 1.2.6074
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.26100.1-240331-1435.ge-release

I have check the requirement, It meets the requirements that CUDA 12.8 Update 1 >= driver 572.61.
but the issues still no fix. do you have other suggest ?

Hi, @rayment

I just noticed the error all reported from nsys-ui or ncu-ui.
Have you ever tried ncu and nsys command line ?

Hi, @veraj. running ncu and nsys command line on wsl2. there are same issues.

ray@DESKTOP-TC0G374:~/proj/cpp/cuda/exercise02/00$ nsys profile a.out
Collecting data…
Using device 0:
NVIDIA GeForce GTX 1080 Ti; global mem: -1073872896B; compute v6.1; clock: 1582000 kHz

FATAL ERROR: /dvs/p4/build/sw/devtools/Agora/Rel/CUDA12.8/QuadD/Target/Daemon/TimeConversion.cpp(312): Throw in function int64_t QuadDDaemon::PostMortemTimeConverter::ConvertGpuTicksToSyncNs(const QuadDCommon::Uuid&, int64_t) const
Dynamic exception type: boost::wrapexceptQuadDCommon::InternalErrorException
std::exception::what: InternalErrorException

ray@DESKTOP-TC0G374:~/proj/cpp/cuda/exercise02/00$ ncu a.out
==PROF== Connected to process 4853 (/home/ray/proj/cpp/cuda/exercise02/00/a.out)

==ERROR== An error was reported by the counter measurement library:
==ERROR== Failed to initialize the profiler: LibraryNotLoaded. Check that a compatible driver library is loaded.
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).

It may or may not be the issue, but have you tried the workaround :

hi,@hwilper. your suggest does work. nsys-ui can run now, but I get some warnings and unable to select GPU metric option. By the way, do you know how to solve the ncu problem mentioned earlier?

Target does not support GPU Metrics.

Warning Analysis 00:00.002
Error when processing events: Source ID=
Type=ErrorInformation (18)
Error information:
ProcessEventsError (4005)
Properties:
ErrorText (100)=/dvs/p4/build/sw/devtools/Agora/Rel/QuadD_Main/QuadD/Host/Analysis/EventHandler/TraceEventHandler.cpp(677): Throw in function void QuadDAnalysis::EventHandler::TraceEventParser::operator()(const QuadDCommon::FlatComm::Cuda::Event&)
Dynamic exception type: boost::wrapexceptQuadDCommon::InternalErrorException
std::exception::what: InternalErrorException
[QuadDCommon::tag_message*] = Unrecognized GPU UUID: 960ad48c-3fc1-4072-0fc2-1dafafbb1650

Warning Analysis 2040 00:00.006
Not all NVTX events might have been collected.

Warning Analysis 2040 00:00.006
No NVTX events collected. Does the process use NVTX?

Warning Analysis 2040 00:00.006
Not all CUDA events might have been collected.

Warning Analysis 2040 00:00.006
No CUDA events collected. Does the process use CUDA?

Warning Analysis 2040 00:00.006
Not all OS runtime libraries events might have been collected.

Can you please check below within WSL ?

  1. nvidia-smi
  2. ncu --version
  3. output of your sample without ncu/nsys

Also please check on Windows host side, you have below option enabled.

------------------check in win10 ---------------------------

I have check on Windows host side, “allow acces to the gpu performance counters to all users option” is enabled.

------------------- check in wsl ----------------------------

1) ray@DESKTOP-TC0G374:~$ nvidia-smi
Tue Jul 22 16:20:53 2025
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.07 Driver Version: 572.83 CUDA Version: 12.8 |
|-----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce GTX 1080 Ti On | 00000000:01:00.0 On | N/A |
| 20% 38C P8 15W / 250W | 946MiB / 11264MiB | 2% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+

±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 28 G /Xwayland N/A |
±----------------------------------------------------------------------------------------+

2) ray@DESKTOP-TC0G374:~$ ncu --version
NVIDIA (R) Nsight Compute Command Line Profiler
Copyright (c) 2018-2025 NVIDIA Corporation
Version 2025.1.1.0 (build 35528883) (public-release)

3) output of your sample without ncu/nsys
./a.out
using device 0
NVIDIA GeForce GTX 1080 Ti: global mem: 11263 MB; compute v6.1; clock: 1582000 kHz
running global reduce
reduction result: -5.172142, correct answer: -5.174542
average time elapsed: 1.458208

Latest progress in problem-solving.

  1. nsys-ui error 1 has solved. The solution is as follows:

mkdir -p “$(dirname “$(nsys -z)”)”

echo ‘CuptiUseRawGpuTimestamps=false’ >> “$(nsys -z)”

  1. a issue while starting nsys-ui on WSL has solved.The solution is as follows:

unset WAYLAND_DISPLAY # Not permanently effective

sudo apt install libxcb-cursor0 libxkbcommon-x11-0 libxcb-icccm4 libxcb-keysyms1