Hi,
I followed Unified Memory course.
I compile first example
nvcc -o single-thread-vector-add 01-vector-add/01-vector-add.cu -run
and
nvcc -o single-thread-vector-add 01-vector-add/01-vector-add.cu -run
results are
Success! All values calculated correctly.
Generating '/tmp/nsys-report-eafc.qdstrm'
[1/8] [========================100%] report1.nsys-rep
[2/8] [========================100%] report1.sqlite
[3/8] Executing 'nvtx_sum' stats report
SKIPPED: /dli/task/report1.sqlite does not contain NV Tools Extension (NVTX) data.
[4/8] Executing 'osrt_sum' stats report
Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- ---------- ---------- -------- --------- ----------- ----------------------
90.4 6154621709 318 19354156.3 10074556.0 2220 100141907 27631341.2 poll
8.7 590645191 280 2109447.1 2066063.0 170 20541289 1278921.7 sem_timedwait
0.6 43862349 499 87900.5 12780.0 380 10213399 606373.8 ioctl
0.3 19354957 24 806456.5 5775.5 1080 7282794 2173280.8 mmap
0.0 1141729 27 42286.3 4531.0 3030 785654 149169.7 mmap64
0.0 511139 44 11616.8 10965.0 4510 34401 5260.6 open64
0.0 200354 29 6908.8 4040.0 1470 55051 10010.5 fopen
0.0 159332 4 39833.0 38960.5 27380 54031 13213.9 pthread_create
0.0 131542 11 11958.4 12681.0 1010 16110 4573.5 write
0.0 126084 12 10507.0 4960.0 1510 62562 16942.7 munmap
0.0 58861 26 2263.9 90.0 70 56641 11090.8 fgets
0.0 45330 6 7555.0 8395.0 3620 9840 2345.0 open
0.0 38220 52 735.0 515.0 160 6160 847.7 fcntl
0.0 32460 22 1475.5 1305.0 760 3340 697.1 fclose
0.0 23200 14 1657.1 1375.0 520 4390 1219.8 read
0.0 17580 2 8790.0 8790.0 5310 12270 4921.5 socket
0.0 11700 5 2340.0 990.0 80 7170 2954.8 fread
0.0 11490 1 11490.0 11490.0 11490 11490 0.0 connect
0.0 6020 1 6020.0 6020.0 6020 6020 0.0 pipe2
0.0 5550 64 86.7 50.0 40 170 45.7 pthread_mutex_trylock
0.0 3390 1 3390.0 3390.0 3390 3390 0.0 bind
0.0 1200 1 1200.0 1200.0 1200 1200 0.0 listen
0.0 450 1 450.0 450.0 450 450 0.0 pthread_cond_broadcast
[5/8] Executing 'cuda_api_sum' stats report
Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- ------------ ------------ ---------- ---------- ----------- ---------------------
94.5 2471546863 1 2471546863.0 2471546863.0 2471546863 2471546863 0.0 cudaDeviceSynchronize
4.8 124438386 3 41479462.0 58191.0 17761 124362434 71778762.1 cudaMallocManaged
0.7 19455429 3 6485143.0 6167994.0 5964250 7323185 732880.4 cudaFree
0.0 47231 1 47231.0 47231.0 47231 47231 0.0 cudaLaunchKernel
[6/8] Executing 'cuda_gpu_kern_sum' stats report
Time (%) Total Time (ns) Instances Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- ------------ ------------ ---------- ---------- ----------- ----------------------------------------------
100.0 2471537085 1 2471537085.0 2471537085.0 2471537085 2471537085 0.0 addVectorsInto(float *, float *, float *, int)
[7/8] Executing 'cuda_gpu_mem_time_sum' stats report
Time (%) Total Time (ns) Count Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Operation
-------- --------------- ----- -------- -------- -------- -------- ----------- ---------------------------------
75.5 34140556 2304 14817.9 4351.5 1983 80192 22493.8 [CUDA Unified Memory memcpy HtoD]
24.5 11060935 768 14402.3 3775.5 1279 80735 22787.8 [CUDA Unified Memory memcpy DtoH]
[8/8] Executing 'cuda_gpu_mem_size_sum' stats report
Total (MB) Count Avg (MB) Med (MB) Min (MB) Max (MB) StdDev (MB) Operation
---------- ----- -------- -------- -------- -------- ----------- ---------------------------------
402.653 2304 0.175 0.033 0.004 1.044 0.301 [CUDA Unified Memory memcpy HtoD]
134.218 768 0.175 0.033 0.004 1.044 0.301 [CUDA Unified Memory memcpy DtoH]
Generated:
/dli/task/report1.nsys-rep
/dli/task/report1.sqlite
So no problem.
Now i run and compile same source code and use same command for nsys using windows 11 and cuda toolkit 12.5
nsys profile --stats=true .\single-thread-vector.exe
Generating 'C:\Users\UTILIS~1\AppData\Local\Temp\nsys-report-ecd2.qdstrm'
[1/8] [========================100%] report9.nsys-rep
[2/8] [========================100%] report9.sqlite
[3/8] Executing 'nvtx_sum' stats report
SKIPPED: C:\Users\laurent\Documents\Visual Studio 2022\cuda_course\report9.sqlite does not contain NV Tools Extension (NVTX) data.
[4/8] Executing 'osrt_sum' stats report
SKIPPED: No data available.
[5/8] Executing 'cuda_api_sum' stats report
Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- ----------- ----------- --------- --------- ----------- ----------------------
52,0 485592590 1 485592590,0 485592590,0 485592590 485592590 0,0 cudaDeviceSynchronize
25,0 234802399 1 234802399,0 234802399,0 234802399 234802399 0,0 cudaLaunchKernel
18,0 170693090 3 56897696,0 11941656,0 11541091 147210343 78213302,0 cudaMallocManaged
3,0 33701527 3 11233842,0 5590101,0 5406366 22705060 9934790,0 cudaFree
0,0 25365 1 25365,0 25365,0 25365 25365 0,0 cuLibraryUnload
0,0 4323 1 4323,0 4323,0 4323 4323 0,0 cuModuleGetLoadingMode
0,0 2846 1 2846,0 2846,0 2846 2846 0,0 cuCtxSynchronize
0,0 262 1 262,0 262,0 262 262 0,0 cuDeviceGetLuid
[6/8] Executing 'cuda_gpu_kern_sum' stats report
Time (%) Total Time (ns) Instances Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- ----------- ----------- --------- --------- ----------- ----------------------------------------------
100,0 485544700 1 485544700,0 485544700,0 485544700 485544700 0,0 addVectorsInto(float *, float *, float *, int)
[7/8] Executing 'cuda_gpu_mem_time_sum' stats report
SKIPPED: C:\Users\laurent\Documents\Visual Studio 2022\cuda_course\report9.sqlite does not contain GPU memory data.
[8/8] Executing 'cuda_gpu_mem_size_sum' stats report
SKIPPED: C:\Users\laurent\Documents\Visual Studio 2022\cuda_course\report9.sqlite does not contain GPU memory data.
Generated:
C:\Users\laurent\Documents\Visual Studio 2022\cuda_course\report9.nsys-rep
What’s wrong on windows with my command?