'osrt_sum' stats report no data available

Hi,

I followed Unified Memory course.
I compile first example
nvcc -o single-thread-vector-add 01-vector-add/01-vector-add.cu -run
and
nsys profile --stats=true ./single-thread-vector-add
results are

Success! All values calculated correctly.
Generating '/tmp/nsys-report-eafc.qdstrm'
[1/8] [========================100%] report1.nsys-rep
[2/8] [========================100%] report1.sqlite
[3/8] Executing 'nvtx_sum' stats report
SKIPPED: /dli/task/report1.sqlite does not contain NV Tools Extension (NVTX) data.
[4/8] Executing 'osrt_sum' stats report

 Time (%)  Total Time (ns)  Num Calls   Avg (ns)    Med (ns)   Min (ns)  Max (ns)   StdDev (ns)           Name         
 --------  ---------------  ---------  ----------  ----------  --------  ---------  -----------  ----------------------
     90.4       6154621709        318  19354156.3  10074556.0      2220  100141907   27631341.2  poll                  
      8.7        590645191        280   2109447.1   2066063.0       170   20541289    1278921.7  sem_timedwait         
      0.6         43862349        499     87900.5     12780.0       380   10213399     606373.8  ioctl                 
      0.3         19354957         24    806456.5      5775.5      1080    7282794    2173280.8  mmap                  
      0.0          1141729         27     42286.3      4531.0      3030     785654     149169.7  mmap64                
      0.0           511139         44     11616.8     10965.0      4510      34401       5260.6  open64                
      0.0           200354         29      6908.8      4040.0      1470      55051      10010.5  fopen                 
      0.0           159332          4     39833.0     38960.5     27380      54031      13213.9  pthread_create        
      0.0           131542         11     11958.4     12681.0      1010      16110       4573.5  write                 
      0.0           126084         12     10507.0      4960.0      1510      62562      16942.7  munmap                
      0.0            58861         26      2263.9        90.0        70      56641      11090.8  fgets                 
      0.0            45330          6      7555.0      8395.0      3620       9840       2345.0  open                  
      0.0            38220         52       735.0       515.0       160       6160        847.7  fcntl                 
      0.0            32460         22      1475.5      1305.0       760       3340        697.1  fclose                
      0.0            23200         14      1657.1      1375.0       520       4390       1219.8  read                  
      0.0            17580          2      8790.0      8790.0      5310      12270       4921.5  socket                
      0.0            11700          5      2340.0       990.0        80       7170       2954.8  fread                 
      0.0            11490          1     11490.0     11490.0     11490      11490          0.0  connect               
      0.0             6020          1      6020.0      6020.0      6020       6020          0.0  pipe2                 
      0.0             5550         64        86.7        50.0        40        170         45.7  pthread_mutex_trylock 
      0.0             3390          1      3390.0      3390.0      3390       3390          0.0  bind                  
      0.0             1200          1      1200.0      1200.0      1200       1200          0.0  listen                
      0.0              450          1       450.0       450.0       450        450          0.0  pthread_cond_broadcast

[5/8] Executing 'cuda_api_sum' stats report

 Time (%)  Total Time (ns)  Num Calls    Avg (ns)      Med (ns)     Min (ns)    Max (ns)   StdDev (ns)          Name         
 --------  ---------------  ---------  ------------  ------------  ----------  ----------  -----------  ---------------------
     94.5       2471546863          1  2471546863.0  2471546863.0  2471546863  2471546863          0.0  cudaDeviceSynchronize
      4.8        124438386          3    41479462.0       58191.0       17761   124362434   71778762.1  cudaMallocManaged    
      0.7         19455429          3     6485143.0     6167994.0     5964250     7323185     732880.4  cudaFree             
      0.0            47231          1       47231.0       47231.0       47231       47231          0.0  cudaLaunchKernel     

[6/8] Executing 'cuda_gpu_kern_sum' stats report

 Time (%)  Total Time (ns)  Instances    Avg (ns)      Med (ns)     Min (ns)    Max (ns)   StdDev (ns)                       Name                     
 --------  ---------------  ---------  ------------  ------------  ----------  ----------  -----------  ----------------------------------------------
    100.0       2471537085          1  2471537085.0  2471537085.0  2471537085  2471537085          0.0  addVectorsInto(float *, float *, float *, int)

[7/8] Executing 'cuda_gpu_mem_time_sum' stats report

 Time (%)  Total Time (ns)  Count  Avg (ns)  Med (ns)  Min (ns)  Max (ns)  StdDev (ns)              Operation            
 --------  ---------------  -----  --------  --------  --------  --------  -----------  ---------------------------------
     75.5         34140556   2304   14817.9    4351.5      1983     80192      22493.8  [CUDA Unified Memory memcpy HtoD]
     24.5         11060935    768   14402.3    3775.5      1279     80735      22787.8  [CUDA Unified Memory memcpy DtoH]

[8/8] Executing 'cuda_gpu_mem_size_sum' stats report

 Total (MB)  Count  Avg (MB)  Med (MB)  Min (MB)  Max (MB)  StdDev (MB)              Operation            
 ----------  -----  --------  --------  --------  --------  -----------  ---------------------------------
    402.653   2304     0.175     0.033     0.004     1.044        0.301  [CUDA Unified Memory memcpy HtoD]
    134.218    768     0.175     0.033     0.004     1.044        0.301  [CUDA Unified Memory memcpy DtoH]

Generated:
    /dli/task/report1.nsys-rep
    /dli/task/report1.sqlite

So no problem.
Now i run and compile same source code and use same command for nsys using windows 11 and cuda toolkit 12.5

nsys profile --stats=true .\single-thread-vector.exe
Generating 'C:\Users\UTILIS~1\AppData\Local\Temp\nsys-report-ecd2.qdstrm'
[1/8] [========================100%] report9.nsys-rep
[2/8] [========================100%] report9.sqlite
[3/8] Executing 'nvtx_sum' stats report
SKIPPED: C:\Users\laurent\Documents\Visual Studio 2022\cuda_course\report9.sqlite does not contain NV Tools Extension (NVTX) data.
[4/8] Executing 'osrt_sum' stats report
SKIPPED: No data available.
[5/8] Executing 'cuda_api_sum' stats report

 Time (%)  Total Time (ns)  Num Calls   Avg (ns)     Med (ns)    Min (ns)   Max (ns)   StdDev (ns)           Name
 --------  ---------------  ---------  -----------  -----------  ---------  ---------  -----------  ----------------------
     52,0        485592590          1  485592590,0  485592590,0  485592590  485592590          0,0  cudaDeviceSynchronize
     25,0        234802399          1  234802399,0  234802399,0  234802399  234802399          0,0  cudaLaunchKernel
     18,0        170693090          3   56897696,0   11941656,0   11541091  147210343   78213302,0  cudaMallocManaged
      3,0         33701527          3   11233842,0    5590101,0    5406366   22705060    9934790,0  cudaFree
      0,0            25365          1      25365,0      25365,0      25365      25365          0,0  cuLibraryUnload
      0,0             4323          1       4323,0       4323,0       4323       4323          0,0  cuModuleGetLoadingMode
      0,0             2846          1       2846,0       2846,0       2846       2846          0,0  cuCtxSynchronize
      0,0              262          1        262,0        262,0        262        262          0,0  cuDeviceGetLuid

[6/8] Executing 'cuda_gpu_kern_sum' stats report

 Time (%)  Total Time (ns)  Instances   Avg (ns)     Med (ns)    Min (ns)   Max (ns)   StdDev (ns)                       Name
 --------  ---------------  ---------  -----------  -----------  ---------  ---------  -----------  ----------------------------------------------
    100,0        485544700          1  485544700,0  485544700,0  485544700  485544700          0,0  addVectorsInto(float *, float *, float *, int)

[7/8] Executing 'cuda_gpu_mem_time_sum' stats report
SKIPPED: C:\Users\laurent\Documents\Visual Studio 2022\cuda_course\report9.sqlite does not contain GPU memory data.
[8/8] Executing 'cuda_gpu_mem_size_sum' stats report
SKIPPED: C:\Users\laurent\Documents\Visual Studio 2022\cuda_course\report9.sqlite does not contain GPU memory data.
Generated:
    C:\Users\laurent\Documents\Visual Studio 2022\cuda_course\report9.nsys-rep

What’s wrong on windows with my command?

In wsl2

sudo nsys profile --stats=true ./single-thread-vector-add
[sudo] password for laurent:
Success! All values calculated correctly.
Generating '/tmp/nsys-report-6f22.qdstrm'
[1/8] [========================100%] report16.nsys-rep
[2/8] [========================100%] report16.sqlite
[3/8] Executing 'nvtx_sum' stats report
SKIPPED: /mnt/c/Users/laurent/Documents/Visual Studio 2022/cuda_course/report16.sqlite does not contain NV Tools Extension (NVTX) data.
[4/8] Executing 'osrt_sum' stats report

 Time (%)  Total Time (ns)  Num Calls   Avg (ns)    Med (ns)    Min (ns)  Max (ns)   StdDev (ns)           Name
 --------  ---------------  ---------  ----------  -----------  --------  ---------  -----------  ----------------------
     96.1       3905387096         43  90822955.7  100122039.0       557  100184919   26312807.9  poll
      3.3        134922500        555    243103.6      32105.0       156    5136647     670217.9  ioctl
      0.3         13443270         29    463561.0       3005.0       622    4482217    1380590.0  mmap
      0.1          5739727          5   1147945.4     119354.0      2938    5467016    2415309.5  fread
      0.0          1113219          6    185536.5     185546.5    173861     202606       9923.1  mprotect
      0.0           691997         22     31454.4       2189.5       345     453696      99665.4  fopen
      0.0           632149          3    210716.3     287338.0     56183     288628     133831.3  pthread_create
      0.0           507690          7     72527.1        740.0       545     196809      90938.5  read
      0.0           482596          1    482596.0     482596.0    482596     482596          0.0  pthread_join
      0.0           313205         12     26100.4        802.0       279     279833      80052.7  fclose
      0.0           141511          3     47170.3      32499.0     25620      83392      31556.9  sem_timedwait
      0.0            25451          4      6362.8       7475.0       422      10079       4196.6  write
      0.0            24667         35       704.8         24.0        23      23742       4008.5  fgets
      0.0            18979          6      3163.2       2939.5       514       5736       1770.9  open
      0.0             8009          6      1334.8        814.0        73       4285       1584.1  fwrite
      0.0             6337         10       633.7        301.0        82       2161        760.6  fcntl
      0.0             5256          3      1752.0       1856.0       537       2863       1166.5  pipe2
      0.0             3202          2      1601.0       1601.0       952       2250        917.8  munmap
      0.0             1726          1      1726.0       1726.0      1726       1726          0.0  fflush
      0.0             1314         64        20.5         17.0        16        146         19.8  pthread_mutex_trylock
      0.0              651          3       217.0        230.0       158        263         53.7  pthread_cond_broadcast

[5/8] Executing 'cuda_api_sum' stats report

 Time (%)  Total Time (ns)  Num Calls    Avg (ns)      Med (ns)     Min (ns)    Max (ns)   StdDev (ns)           Name
 --------  ---------------  ---------  ------------  ------------  ----------  ----------  -----------  ----------------------
     90.0       3486957086          1  3486957086.0  3486957086.0  3486957086  3486957086          0.0  cudaDeviceSynchronize
      5.6        217923759          1   217923759.0   217923759.0   217923759   217923759          0.0  cudaLaunchKernel
      3.8        146405113          3    48801704.3    25097953.0    23351220    97955940   42577775.1  cudaMallocManaged
      0.6         23709003          3     7903001.0     7180613.0     7080767     9447623    1338613.1  cudaFree
      0.0             1103          1        1103.0        1103.0        1103        1103          0.0  cuModuleGetLoadingMode

[6/8] Executing 'cuda_gpu_kern_sum' stats report
SKIPPED: /mnt/c/Users/laurent/Documents/Visual Studio 2022/cuda_course/report16.sqlite does not contain CUDA kernel data.
[7/8] Executing 'cuda_gpu_mem_time_sum' stats report
SKIPPED: /mnt/c/Users/laurent/Documents/Visual Studio 2022/cuda_course/report16.sqlite does not contain GPU memory data.
[8/8] Executing 'cuda_gpu_mem_size_sum' stats report
SKIPPED: /mnt/c/Users/laurent/Documents/Visual Studio 2022/cuda_course/report16.sqlite does not contain GPU memory data.
Generated:
    /mnt/c/Users/laurent/Documents/Visual Studio 2022/cuda_course/report16.nsys-rep
    /mnt/c/Users/laurent/Documents/Visual Studio 2022/cuda_course/report16.sqlite

Is profiling possible on windows system?

Found old problem on windows here

Greetings,

What you are experiencing is normal.

The OS Runtime Tracing (OSRT) option is Linux-only, so running natively on Windows, it is not supported so no data of that type is collected, therefore no stats to report.

Currently, CUDA trace is not available under WSL.

1 Like

and about

7/8] Executing 'cuda_gpu_mem_time_sum' stats report
 does not contain GPU memory data.
[8/8] Executing 'cuda_gpu_mem_size_sum' stats report
 does not contain GPU memory data.

Is it supported on windows?