Nsys is not collecting kernel data

Hi,
When I’m trying to profile with nsys no CUDA kernel data is collected. When I use Nsight System kernels doesn’t show up in the timeline.

Running the following command sudo nsys profile --stats=true -t cuda <app path> in WSL2 on Windows 11. The output is:

Generating '/tmp/nsys-report-48d3.qdstrm'
[1/6] [========================100%] report6.nsys-rep
[2/6] [========================100%] report6.sqlite
[3/6] Executing 'cuda_api_sum' stats report

 Time (%)  Total Time (ns)  Num Calls  Avg (ns)   Med (ns)   Min (ns)  Max (ns)  StdDev (ns)                Name
 --------  ---------------  ---------  ---------  ---------  --------  --------  -----------  ---------------------------------
     86.4       5726048558       1182  4844372.7   614840.0      3099  88526493   11888877.3  cuLibraryLoadData
      9.4        621915573        633   982489.1   745363.0       498  15934364    2063962.5  cudaDeviceSynchronize
      1.7        111028852        458   242421.1   115570.5       615   5818108     546821.2  cudaStreamSynchronize
      0.7         45797514         50   915950.3     3745.5       970  29626149    4344456.9  cudaFree
      0.6         42045311       4828     8708.6     6155.0      2567    245540      11405.5  cudaLaunchKernel
      0.3         23154932        167   138652.3    73144.0     39458   2466145     259722.2  cudaMemcpy
      0.3         20672259        637    32452.5    40169.0      6110    238175      27230.1  cudaMemcpyAsync
      0.2         11414782          6  1902463.7     7085.0       978   5858052    2941052.4  cudaStreamCreateWithFlags
      0.2         10872657       1383     7861.6     4521.0      1598    346083      14211.1  cudaMemsetAsync
      0.1          9897352         44   224939.8    14274.0      2459   1493941     400263.1  cudaMalloc
      0.0          2221999       2469      900.0      572.0       383     52343       1860.8  cudaEventRecord
      0.0          1075344          1  1075344.0  1075344.0   1075344   1075344          0.0  cuLibraryUnload
      0.0           784357          9    87150.8    69571.0     27117    206559      61072.6  cudaMemcpyToSymbol
      0.0           335479        104     3225.8      694.5       316     54593       7733.6  cudaEventCreateWithFlags
      0.0           217769       1149      189.5      159.0        91      4141        168.1  cuGetProcAddress_v2
      0.0            95910        107      896.4      404.0       271     13236       1812.6  cudaEventDestroy
      0.0            36526          7     5218.0     3884.0       883     17516       5929.3  cudaStreamDestroy
      0.0            28425         28     1015.2      852.5       333      2766        684.1  cudaEventQuery
      0.0            15508          3     5169.3     2724.0      1197     11587       5610.1  cudaEventCreate
      0.0             4239          3     1413.0     1491.0      1190      1558        196.0  cuInit
      0.0             3002          1     3002.0     3002.0      3002      3002          0.0  cudaStreamCreate
      0.0             2918          1     2918.0     2918.0      2918      2918          0.0  cudaGetDeviceProperties_v2_v12000
      0.0             1993          4      498.3      204.0       151      1434        624.4  cuModuleGetLoadingMode

[4/6] Executing 'cuda_gpu_kern_sum' stats report
SKIPPED: <path>/report6.sqlite does not contain CUDA kernel data.
[5/6] Executing 'cuda_gpu_mem_time_sum' stats report
SKIPPED: <path>/report6.sqlite does not contain GPU memory data.
[6/6] Executing 'cuda_gpu_mem_size_sum' stats report
SKIPPED: <path>/report6.sqlite does not contain GPU memory data.
Generated:
    <path>/report6.nsys-rep
    <path>/report6.sqlite

As you can see there is no data at all about the CUDA kernels.

Nvidia-smi output:

nvidia-smi
Thu Mar  2 11:41:30 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.89.02    Driver Version: 528.49       CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A200...  On   | 00000000:01:00.0  On |                  N/A |
| N/A   52C    P8     7W /  35W |   1244MiB /  8192MiB |      3%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Nsys status:

 sudo nsys status -e
Timestamp counter supported: Yes

CPU Profiling Environment Check
Root privilege: enabled
Linux Kernel Paranoid Level = 2
Linux Distribution = Ubuntu
Linux Kernel Version = 5.15.90.1-microsoft-standard-WSL2: OK
Linux perf_event_open syscall available: OK
Sampling trigger event available: OK
Intel(c) Last Branch Record support: Not Available
CPU Profiling Environment (process-tree): OK
CPU Profiling Environment (system-wide): OK

See the product documentation at https://docs.nvidia.com/nsight-systems for more information,
including information on how to set the Linux Kernel Paranoid Level.

Nsys version:

nsys --version
NVIDIA Nsight Systems version 2023.1.2.43-32377213v0

Hope you are able to help.

Can you attach the .nsys-rep file you are working with?

I sent you the .nsys-rep file as a pm.

Looping in @rknight to assist with this.

Hi orjan.grefstad,

I looked at the report1.nsys-rep file. I assume this is the file that you provided.

I noticed a couple of entries in the Diagnostics section of the nsys-rep file that could be hints of the issue. See the following two entries;

Installed CUDA driver version (12.0) is not supported by this build of Nsight Systems. CUDA trace will be collected using libraries for driver version 12.1

CUDA device 0: Unified Memory cannot be traced on devices that don’t support peer-to-peer transfers.Please verify that SLI/NVLink is functioning properly.

Is it possible that you have a prototype driver installed? Since this is running in the WSL2 environment, @jasoncohen might also have some insight into this issue.

Sorry, collecting CUDA kernel launches under WSL2 is not yet supported. I am actively working on that right now and I expect to have it shipping in the upcoming Nsight Systems release.

Ok. Looking forward to the next release. Would be nice if you could keep your guide up to date: NVIDIA GPU Accelerated Computing on WSL 2. As far as I could tell it should be supported:

Developer tools - Profilers - Volta and later (Using driver r525+ and Windows 11)
but you also have contradicting note further down:
Developers who require profiling support are encouraged to find alternatives in the meanwhile.
I updated to Windows 11 to get this feature working. Unfortunately I’m stuck with Windows and WSL due to work restrictions.