I use Nsight System to profile a CUDA program (compiled by nvcc) with the following command nsys profile --stats=true ./a.out
However, the output does not contain any useful information about the cuda kernel I wrote. The output is given in the following:
Generating CUDA API Statistics...
CUDA API Statistics (nanoseconds)
CUDA trace data was not collected.
Generating Operating System Runtime API Statistics...
Operating System Runtime API Statistics (nanoseconds)
Time(%) Total Time Calls Average Minimum Maximum Name
------- -------------- ---------- -------------- -------------- -------------- --------------------------------------------------------------------------------
65.2 300539226 14 21467087.6 24740 100154582 poll
25.2 116053432 717 161859.7 1011 29997630 ioctl
4.6 21346504 57 374500.1 1145 9892355 mmap
4.3 19886802 38 523336.9 1829 19521784 fopen
0.3 1194145 10 119414.5 17324 694206 sem_timedwait
0.1 608474 57 10675.0 4294 31110 open64
0.1 374910 1 374910.0 374910 374910 sem_wait
0.1 358942 3 119647.3 118941 120625 fgets
0.0 197620 3 65873.3 39925 116779 pthread_create
0.0 123217 31 3974.7 1690 7461 fclose
0.0 39858 11 3623.5 2294 6348 write
0.0 39444 11 3585.8 1279 6223 munmap
0.0 37736 4 9434.0 6129 16527 open
0.0 34602 13 2661.7 1220 5190 read
0.0 24693 4 6173.2 2601 13805 fread
0.0 16219 2 8109.5 1912 14307 fwrite
0.0 13166 7 1880.9 1005 4133 fcntl
0.0 12768 2 6384.0 5034 7734 socket
0.0 10715 1 10715.0 10715 10715 connect
0.0 6306 1 6306.0 6306 6306 pipe2
0.0 3811 1 3811.0 3811 3811 fflush
0.0 2267 1 2267.0 2267 2267 bind
Generating NVTX Push-Pop Range Statistics...
NVTX Push-Pop Range Statistics (nanoseconds)
How can nsys
display the statistics for the cuda kernel? Thank you!