[Question] NSys CUDA Profiler - Page Migration and Number of CPU/GPU page faults

I am running a CUDA application with NSYS and I am trying to get the number of memory page migrations for my programs.

nsys profile --stats=true --cuda-um-gpu-page-faults=true --cuda-um-cpu-page-faults=true --show-output=true ./app

I am doing a study and my type of program runs for multiple times, and each time allocates buffers and automatically sends data because GPU buffers are allocated using Unified Memory (thus, the data migration between host and device is transparent). Then the data is also requested back from the CPU when the kernel finishes. What I am expecting in this case scenario is that the gap between the reported host CPU page faults and GPU page not to be that big. However, I am getting 10-20x more GPU page faults compared to CPU page faults. Is this expected? What could be the reason? I guess my question is about understanding better what this number means.

Perhaps my question is if a CPU page fault does not affect a GPU page fault? Are they related or nsys shows different metrics (E.g., CPU page fault because the actual page fault was initiated by the CPU, not by the GPU)?

See an example of an output:

 Total HtoD Migration Size (MB)  Total DtoH Migration Size (MB)  Total CPU Page Faults  Total GPU PageFaults  Minimum Virtual Address  Maximum Virtual Address
 ------------------------------  ------------------------------  ---------------------  --------------------  -----------------------  -----------------------
                    102,412.583                      94,222.877                232,970             6,169,403  0x7F5188000000           0x7F5784B09000  

Any pointers will be appreciated.
Thanks

I think what might be helpful is to open the .nsys-rep file in the GUI and look at the pattern of the faults there. I (obviously) don’t know your code, but if I had to guess I would guess that in addition to host->device and device->host you have a lot of device->device memory transfers.

Basically open your results in the GUI (available for Win/Lin/Mac) and expand the GPU rows and you will be able to see what the memory operations are (the green and pink sections in this example from Gromacs.

If you compare this to the line showing the page faults (this example doesn’t have them collected) you will be able to better correlate what is going on.