Nsight system not report unified memory page fault statistics in summery

Hi, All

I am using nsight system on DGX-A100 to get the unified memory page fault information. Here is the command I tried

nsys profile --stats=true --cuda-um-gpu-page-faults=true --cuda-um-cpu-page-faults=true --show-output=true ./myapplication

However, in the reported summary, I can only see the cudaMallocManaged but not see any detailed statistics regarding the unified memory page fault at the CPU or GPU. Here is the summary I got by running the above command.

CUDA API Statistics:

 Time(%)  Total Time (ns)  Num Calls    Average     Minimum    Maximum    StdDev           Name        
 -------  ---------------  ---------  -----------  ---------  ---------  --------  --------------------
    99.6        307198070          1  307198070.0  307198070  307198070       0.0  cudaMallocManaged   
     0.2           656747          3     218915.7       3490     648467  372002.9  cudaMalloc          
     0.2           504097          1     504097.0     504097     504097       0.0  cudaEventSynchronize
     0.0            88401          3      29467.0       5551      76280   40544.4  cudaFree            
     0.0            49250          2      24625.0      20810      28440    5395.2  cudaMemcpy          
     0.0            38349          1      38349.0      38349      38349       0.0  cudaLaunchKernel    
     0.0            11751          2       5875.5       3871       7880    2834.8  cudaEventRecord     
     0.0             4180          2       2090.0        660       3520    2022.3  cudaEventCreate     



CUDA Kernel Statistics:

 Time(%)  Total Time (ns)  Instances  Average   Minimum  Maximum  StdDev                                    Name                                  
 -------  ---------------  ---------  --------  -------  -------  ------  ------------------------------------------------------------------------
   100.0           498458          1  498458.0   498458   498458     0.0  mykernel(float*, float const*, int const*, int const*, int, int, int)



CUDA Memory Operation Statistics (by time):

 Time(%)  Total Time (ns)  Operations  Average  Minimum  Maximum  StdDev      Operation     
 -------  ---------------  ----------  -------  -------  -------  ------  ------------------
   100.0            10816           2   5408.0     4288     6528  1583.9  [CUDA memcpy HtoD]



CUDA Memory Operation Statistics (by size in KiB):

 Total   Operations  Average  Minimum  Maximum  StdDev      Operation     
 ------  ----------  -------  -------  -------  ------  ------------------
 52.992           2   26.496   10.578   42.414  22.511  [CUDA memcpy HtoD]



Operating System Runtime API Statistics:

 Time(%)  Total Time (ns)  Num Calls   Average    Minimum  Maximum     StdDev         Name     
 -------  ---------------  ---------  ----------  -------  --------  ----------  --------------
    48.1        331545542         18  18419196.8    80550  83054204  23063304.4  poll          
    42.2        290751254       1710    170030.0     1020  17036417    630506.2  ioctl         
     5.3         36739837         15   2449322.5    20810  20961217   5796984.8  sem_timedwait 
     2.9         19888846        142    140062.3     2050  18737729   1571764.3  open64        
     0.9          5934531         78     76083.7     1311   5708922    646078.8  fopen         
     0.4          3059706         95     32207.4     1120   1082445    110447.7  mmap          
     0.0           198128          4     49532.0    44259     54410      4445.5  pthread_create
     0.0           176158          4     44039.5    39069     58120      9393.0  fgets         
     0.0           117225         12      9768.8     1940     62870     16921.7  write         
     0.0            81487         29      2809.9     1140      5170       777.7  read          
     0.0            67478         24      2811.6     1090     19799      4060.4  fgetc         
     0.0            35770          6      5961.7     2630     13100      4412.6  open          
     0.0            27239          8      3404.9     2040      7120      1661.2  munmap        
     0.0            25119         16      1569.9     1090      2300       346.0  fclose        
     0.0            15110          8      1888.8     1060      2660       608.1  fcntl         
     0.0            12309          1     12309.0    12309     12309         0.0  pipe2         
     0.0            10129          2      5064.5     4920      5209       204.4  socket        
     0.0             6240          1      6240.0     6240      6240         0.0  fopen64       
     0.0             6050          1      6050.0     6050      6050         0.0  connect       
     0.0             4270          1      4270.0     4270      4270         0.0  fflush        
     0.0             3650          1      3650.0     3650      3650         0.0  fwrite        
     0.0             1960          1      1960.0     1960      1960         0.0  bind          
     0.0             1240          1      1240.0     1240      1240         0.0  listen        


Did I miss some key steps?

Thanks!

I am sorry that there was no response to this earlier, your forum post was dropped in an orphaned category that the Nsys team was unaware of until this afternoon.

UVM data is not yet being exported to the sqlite, I have filed request to get it added.