I’m taking the accelerated computing course and trying to do the page faults module but dont’ get this output online
Warning: LBR backtrace method is not supported on this platform. DWARF backtrace method will be used.
WARNING: The command line includes a target application therefore the CPU context-switch scope has been set to process-tree.
Collecting data...
Processing events...
Saving temporary "/tmp/nsys-report-41a4-313b-b3f8-5862.qdstrm" file to disk...
Creating final output files...
Processing [==============================================================100%]
Saved report file to "/tmp/nsys-report-41a4-313b-b3f8-5862.qdrep"
Exporting 1060 events: [==================================================100%]
Exported successfully to
/tmp/nsys-report-41a4-313b-b3f8-5862.sqlite
CUDA API Statistics:
Time(%) Total Time (ns) Num Calls Average Minimum Maximum Name
------- --------------- --------- ----------- --------- --------- ---------------------
91.2 261802252 1 261802252.0 261802252 261802252 cudaMallocManaged
6.7 19130645 1 19130645.0 19130645 19130645 cudaDeviceSynchronize
2.1 6008423 1 6008423.0 6008423 6008423 cudaFree
0.0 37687 1 37687.0 37687 37687 cudaLaunchKernel
CUDA Kernel Statistics:
Time(%) Total Time (ns) Instances Average Minimum Maximum Name
------- --------------- --------- ---------- -------- -------- -----------------------
100.0 19121415 1 19121415.0 19121415 19121415 deviceKernel(int*, int)
Operating System Runtime API Statistics:
Time(%) Total Time (ns) Num Calls Average Minimum Maximum Name
------- --------------- --------- ---------- ------- --------- --------------------------
69.0 360451138 20 18022556.9 48579 100127962 poll
21.3 111434090 666 167318.5 1012 18594843 ioctl
7.5 39163519 16 2447719.9 13751 20907532 sem_timedwait
1.7 8672010 92 94261.0 1252 5869931 mmap
0.4 2021984 82 24658.3 4659 49522 open64
0.0 186682 4 46670.5 31469 64385 pthread_create
0.0 167157 3 55719.0 53419 60267 fgets
0.0 140769 25 5630.8 1511 24129 fopen
0.0 106346 11 9667.8 4090 14039 write
0.0 40206 27 1489.1 1058 6546 fcntl
0.0 34644 7 4949.1 2445 8483 munmap
0.0 34090 5 6818.0 3832 9746 open
0.0 28303 18 1572.4 1020 5215 fclose
0.0 27899 5 5579.8 1092 7321 pthread_rwlock_timedwrlock
0.0 25177 2 12588.5 8203 16974 socket
0.0 24503 12 2041.9 1092 4180 read
0.0 22936 5 4587.2 1171 10734 fgetc
0.0 14289 1 14289.0 14289 14289 pipe2
0.0 9440 4 2360.0 1885 2856 mprotect
0.0 9010 2 4505.0 3830 5180 fread
0.0 8827 1 8827.0 8827 8827 connect
0.0 2786 1 2786.0 2786 2786 bind
0.0 1925 1 1925.0 1925 1925 listen
Report file moved to "/dli/task/report5.qdrep"
Report file moved to "/dli/task/report5.sqlite"
Earlier for vector add, in the same notebook, I was getting the Cuda Memory stats per below
CUDA API Statistics:
Time(%) Total Time (ns) Num Calls Average Minimum Maximum Name
------- --------------- --------- ------------ ---------- ---------- ---------------------
88.8 2315390328 1 2315390328.0 2315390328 2315390328 cudaDeviceSynchronize
10.4 271472573 3 90490857.7 19411 271376306 cudaMallocManaged
0.8 21322764 3 7107588.0 6343104 8428518 cudaFree
0.0 46645 1 46645.0 46645 46645 cudaLaunchKernel
CUDA Kernel Statistics:
Time(%) Total Time (ns) Instances Average Minimum Maximum Name
------- --------------- --------- ------------ ---------- ---------- -------------------------------------------
100.0 2315434815 1 2315434815.0 2315434815 2315434815 addVectorsInto(float*, float*, float*, int)
CUDA Memory Operation Statistics (by time):
Time(%) Total Time (ns) Operations Average Minimum Maximum Operation
------- --------------- ---------- ------- ------- ------- ---------------------------------
76.5 68296926 2304 29642.8 1886 177502 [CUDA Unified Memory memcpy HtoD]
23.5 20983319 768 27322.0 1119 165278 [CUDA Unified Memory memcpy DtoH]
CUDA Memory Operation Statistics (by size in KiB):
Total Operations Average Minimum Maximum Operation
---------- ---------- ------- ------- -------- ---------------------------------
393216.000 2304 170.667 4.000 1020.000 [CUDA Unified Memory memcpy HtoD]
131072.000 768 170.667 4.000 1020.000 [CUDA Unified Memory memcpy DtoH]
Notebook found here in paid courses: Jupyter Notebook