Nsight System outputs "CUDA trace data was not collected." and there is no result for cuda kernels

I use Nsight System to profile a CUDA program (compiled by nvcc) with the following command nsys profile --stats=true ./a.out
However, the output does not contain any useful information about the cuda kernel I wrote. The output is given in the following:

Generating CUDA API Statistics...
CUDA API Statistics (nanoseconds)



CUDA trace data was not collected.


Generating Operating System Runtime API Statistics...
Operating System Runtime API Statistics (nanoseconds)

Time(%)      Total Time       Calls         Average         Minimum         Maximum  Name
-------  --------------  ----------  --------------  --------------  --------------  --------------------------------------------------------------------------------
   65.2       300539226          14      21467087.6           24740       100154582  poll
   25.2       116053432         717        161859.7            1011        29997630  ioctl
    4.6        21346504          57        374500.1            1145         9892355  mmap
    4.3        19886802          38        523336.9            1829        19521784  fopen
    0.3         1194145          10        119414.5           17324          694206  sem_timedwait
    0.1          608474          57         10675.0            4294           31110  open64
    0.1          374910           1        374910.0          374910          374910  sem_wait
    0.1          358942           3        119647.3          118941          120625  fgets
    0.0          197620           3         65873.3           39925          116779  pthread_create
    0.0          123217          31          3974.7            1690            7461  fclose
    0.0           39858          11          3623.5            2294            6348  write
    0.0           39444          11          3585.8            1279            6223  munmap
    0.0           37736           4          9434.0            6129           16527  open
    0.0           34602          13          2661.7            1220            5190  read
    0.0           24693           4          6173.2            2601           13805  fread
    0.0           16219           2          8109.5            1912           14307  fwrite
    0.0           13166           7          1880.9            1005            4133  fcntl
    0.0           12768           2          6384.0            5034            7734  socket
    0.0           10715           1         10715.0           10715           10715  connect
    0.0            6306           1          6306.0            6306            6306  pipe2
    0.0            3811           1          3811.0            3811            3811  fflush
    0.0            2267           1          2267.0            2267            2267  bind

Generating NVTX Push-Pop Range Statistics...
NVTX Push-Pop Range Statistics (nanoseconds)

How can nsys display the statistics for the cuda kernel? Thank you!