Nsight system not report unified memory page fault statistics in summery

Hi, All

I am using nsight system on DGX-A100 to get the unified memory page fault information. Here is the command I tried

nsys profile --stats=true --cuda-um-gpu-page-faults=true --cuda-um-cpu-page-faults=true --show-output=true ./myapplication

However, in the reported summary, I can only see the cudaMallocManaged but not see any detailed statistics regarding the unified memory page fault at the CPU or GPU. Here is the summary I got by running the above command.

CUDA API Statistics:

 Time(%)  Total Time (ns)  Num Calls    Average     Minimum    Maximum    StdDev           Name        
 -------  ---------------  ---------  -----------  ---------  ---------  --------  --------------------
    99.6        307198070          1  307198070.0  307198070  307198070       0.0  cudaMallocManaged   
     0.2           656747          3     218915.7       3490     648467  372002.9  cudaMalloc          
     0.2           504097          1     504097.0     504097     504097       0.0  cudaEventSynchronize
     0.0            88401          3      29467.0       5551      76280   40544.4  cudaFree            
     0.0            49250          2      24625.0      20810      28440    5395.2  cudaMemcpy          
     0.0            38349          1      38349.0      38349      38349       0.0  cudaLaunchKernel    
     0.0            11751          2       5875.5       3871       7880    2834.8  cudaEventRecord     
     0.0             4180          2       2090.0        660       3520    2022.3  cudaEventCreate     



CUDA Kernel Statistics:

 Time(%)  Total Time (ns)  Instances  Average   Minimum  Maximum  StdDev                                    Name                                  
 -------  ---------------  ---------  --------  -------  -------  ------  ------------------------------------------------------------------------
   100.0           498458          1  498458.0   498458   498458     0.0  mykernel(float*, float const*, int const*, int const*, int, int, int)



CUDA Memory Operation Statistics (by time):

 Time(%)  Total Time (ns)  Operations  Average  Minimum  Maximum  StdDev      Operation     
 -------  ---------------  ----------  -------  -------  -------  ------  ------------------
   100.0            10816           2   5408.0     4288     6528  1583.9  [CUDA memcpy HtoD]



CUDA Memory Operation Statistics (by size in KiB):

 Total   Operations  Average  Minimum  Maximum  StdDev      Operation     
 ------  ----------  -------  -------  -------  ------  ------------------
 52.992           2   26.496   10.578   42.414  22.511  [CUDA memcpy HtoD]



Operating System Runtime API Statistics:

 Time(%)  Total Time (ns)  Num Calls   Average    Minimum  Maximum     StdDev         Name     
 -------  ---------------  ---------  ----------  -------  --------  ----------  --------------
    48.1        331545542         18  18419196.8    80550  83054204  23063304.4  poll          
    42.2        290751254       1710    170030.0     1020  17036417    630506.2  ioctl         
     5.3         36739837         15   2449322.5    20810  20961217   5796984.8  sem_timedwait 
     2.9         19888846        142    140062.3     2050  18737729   1571764.3  open64        
     0.9          5934531         78     76083.7     1311   5708922    646078.8  fopen         
     0.4          3059706         95     32207.4     1120   1082445    110447.7  mmap          
     0.0           198128          4     49532.0    44259     54410      4445.5  pthread_create
     0.0           176158          4     44039.5    39069     58120      9393.0  fgets         
     0.0           117225         12      9768.8     1940     62870     16921.7  write         
     0.0            81487         29      2809.9     1140      5170       777.7  read          
     0.0            67478         24      2811.6     1090     19799      4060.4  fgetc         
     0.0            35770          6      5961.7     2630     13100      4412.6  open          
     0.0            27239          8      3404.9     2040      7120      1661.2  munmap        
     0.0            25119         16      1569.9     1090      2300       346.0  fclose        
     0.0            15110          8      1888.8     1060      2660       608.1  fcntl         
     0.0            12309          1     12309.0    12309     12309         0.0  pipe2         
     0.0            10129          2      5064.5     4920      5209       204.4  socket        
     0.0             6240          1      6240.0     6240      6240         0.0  fopen64       
     0.0             6050          1      6050.0     6050      6050         0.0  connect       
     0.0             4270          1      4270.0     4270      4270         0.0  fflush        
     0.0             3650          1      3650.0     3650      3650         0.0  fwrite        
     0.0             1960          1      1960.0     1960      1960         0.0  bind          
     0.0             1240          1      1240.0     1240      1240         0.0  listen        


Did I miss some key steps?

Thanks!

I am sorry that there was no response to this earlier, your forum post was dropped in an orphaned category that the Nsys team was unaware of until this afternoon.

UVM data is not yet being exported to the sqlite, I have filed request to get it added.

On CUDA 12.2 with NVIDIA Nsight Systems version 2024.2.1.106-242134037904v0, Still cannot find UVM data in Nsight system report.
image

Information about UVM transfers and page faults are being exported into the sqlite into the following tables in 2024.2 (which is the latest). I’m pretty sure they were also in 2024.1. I’m wondering if there is a path issue and you are winding up with the older version of Nsys that shipped in 12.2 CTK.

CREATE TABLE ENUM_CUDA_UNIF_MEM_MIGRATION (
– CUDA unified memory migration cause labels

 id                          INTEGER   NOT NULL   PRIMARY KEY,      -- Enum numerical value
 name                        TEXT,                                  -- Enum symbol name
 label                       TEXT                                   -- Enum human name

);
CREATE TABLE ENUM_CUDA_UNIF_MEM_ACCESS_TYPE (
– CUDA unified memory access type labels

 id                          INTEGER   NOT NULL   PRIMARY KEY,      -- Enum numerical value
 name                        TEXT,                                  -- Enum symbol name
 label                       TEXT                                   -- Enum human name

);
CREATE TABLE CUDA_UM_CPU_PAGE_FAULT_EVENTS (
start INTEGER NOT NULL, – Event start timestamp (ns).
globalPid INTEGER NOT NULL, – Serialized GlobalId.
address INTEGER NOT NULL, – Virtual address of the page that faulted.
originalFaultPc INTEGER, – Program counter of the CPU instruction that caused the page fault.
CpuInstruction INTEGER NOT NULL, – REFERENCES StringIds(id) – Function name
module INTEGER NOT NULL, – REFERENCES StringIds(id) – Module name
unresolvedFaultPc INTEGER – True if the program counter was not resolved.
);
CREATE TABLE CUDA_UM_GPU_PAGE_FAULT_EVENTS (
start INTEGER NOT NULL, – Event start timestamp (ns).
end INTEGER NOT NULL, – Event end timestamp (ns).
globalPid INTEGER NOT NULL, – Serialized GlobalId.
deviceId INTEGER NOT NULL, – Device ID.
address INTEGER NOT NULL, – Virtual address of the page that faulted.
numberOfPageFaults INTEGER NOT NULL, – Number of page faults for the same page.
faultAccessType INTEGER NOT NULL – REFERENCES ENUM_CUDA_UNIF_MEM_ACCESS_TYPE(id)
);