Hi,
When I’m trying to profile with nsys no CUDA kernel data is collected. When I use Nsight System kernels doesn’t show up in the timeline.
Running the following command sudo nsys profile --stats=true -t cuda <app path>
in WSL2 on Windows 11. The output is:
Generating '/tmp/nsys-report-48d3.qdstrm'
[1/6] [========================100%] report6.nsys-rep
[2/6] [========================100%] report6.sqlite
[3/6] Executing 'cuda_api_sum' stats report
Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- --------- --------- -------- -------- ----------- ---------------------------------
86.4 5726048558 1182 4844372.7 614840.0 3099 88526493 11888877.3 cuLibraryLoadData
9.4 621915573 633 982489.1 745363.0 498 15934364 2063962.5 cudaDeviceSynchronize
1.7 111028852 458 242421.1 115570.5 615 5818108 546821.2 cudaStreamSynchronize
0.7 45797514 50 915950.3 3745.5 970 29626149 4344456.9 cudaFree
0.6 42045311 4828 8708.6 6155.0 2567 245540 11405.5 cudaLaunchKernel
0.3 23154932 167 138652.3 73144.0 39458 2466145 259722.2 cudaMemcpy
0.3 20672259 637 32452.5 40169.0 6110 238175 27230.1 cudaMemcpyAsync
0.2 11414782 6 1902463.7 7085.0 978 5858052 2941052.4 cudaStreamCreateWithFlags
0.2 10872657 1383 7861.6 4521.0 1598 346083 14211.1 cudaMemsetAsync
0.1 9897352 44 224939.8 14274.0 2459 1493941 400263.1 cudaMalloc
0.0 2221999 2469 900.0 572.0 383 52343 1860.8 cudaEventRecord
0.0 1075344 1 1075344.0 1075344.0 1075344 1075344 0.0 cuLibraryUnload
0.0 784357 9 87150.8 69571.0 27117 206559 61072.6 cudaMemcpyToSymbol
0.0 335479 104 3225.8 694.5 316 54593 7733.6 cudaEventCreateWithFlags
0.0 217769 1149 189.5 159.0 91 4141 168.1 cuGetProcAddress_v2
0.0 95910 107 896.4 404.0 271 13236 1812.6 cudaEventDestroy
0.0 36526 7 5218.0 3884.0 883 17516 5929.3 cudaStreamDestroy
0.0 28425 28 1015.2 852.5 333 2766 684.1 cudaEventQuery
0.0 15508 3 5169.3 2724.0 1197 11587 5610.1 cudaEventCreate
0.0 4239 3 1413.0 1491.0 1190 1558 196.0 cuInit
0.0 3002 1 3002.0 3002.0 3002 3002 0.0 cudaStreamCreate
0.0 2918 1 2918.0 2918.0 2918 2918 0.0 cudaGetDeviceProperties_v2_v12000
0.0 1993 4 498.3 204.0 151 1434 624.4 cuModuleGetLoadingMode
[4/6] Executing 'cuda_gpu_kern_sum' stats report
SKIPPED: <path>/report6.sqlite does not contain CUDA kernel data.
[5/6] Executing 'cuda_gpu_mem_time_sum' stats report
SKIPPED: <path>/report6.sqlite does not contain GPU memory data.
[6/6] Executing 'cuda_gpu_mem_size_sum' stats report
SKIPPED: <path>/report6.sqlite does not contain GPU memory data.
Generated:
<path>/report6.nsys-rep
<path>/report6.sqlite
As you can see there is no data at all about the CUDA kernels.
Nvidia-smi output:
nvidia-smi
Thu Mar 2 11:41:30 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.89.02 Driver Version: 528.49 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA RTX A200... On | 00000000:01:00.0 On | N/A |
| N/A 52C P8 7W / 35W | 1244MiB / 8192MiB | 3% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Nsys status:
sudo nsys status -e
Timestamp counter supported: Yes
CPU Profiling Environment Check
Root privilege: enabled
Linux Kernel Paranoid Level = 2
Linux Distribution = Ubuntu
Linux Kernel Version = 5.15.90.1-microsoft-standard-WSL2: OK
Linux perf_event_open syscall available: OK
Sampling trigger event available: OK
Intel(c) Last Branch Record support: Not Available
CPU Profiling Environment (process-tree): OK
CPU Profiling Environment (system-wide): OK
See the product documentation at https://docs.nvidia.com/nsight-systems for more information,
including information on how to set the Linux Kernel Paranoid Level.
Nsys version:
nsys --version
NVIDIA Nsight Systems version 2023.1.2.43-32377213v0
Hope you are able to help.