I am running a CUDA application with NSYS and I am trying to get the number of memory page migrations for my programs.
nsys profile --stats=true --cuda-um-gpu-page-faults=true --cuda-um-cpu-page-faults=true --show-output=true ./app
I am doing a study and my type of program runs for multiple times, and each time allocates buffers and automatically sends data because GPU buffers are allocated using Unified Memory (thus, the data migration between host and device is transparent). Then the data is also requested back from the CPU when the kernel finishes. What I am expecting in this case scenario is that the gap between the reported host CPU page faults and GPU page not to be that big. However, I am getting 10-20x more GPU page faults compared to CPU page faults. Is this expected? What could be the reason? I guess my question is about understanding better what this number means.
Perhaps my question is if a CPU page fault does not affect a GPU page fault? Are they related or nsys
shows different metrics (E.g., CPU page fault because the actual page fault was initiated by the CPU, not by the GPU)?
See an example of an output:
Total HtoD Migration Size (MB) Total DtoH Migration Size (MB) Total CPU Page Faults Total GPU PageFaults Minimum Virtual Address Maximum Virtual Address
------------------------------ ------------------------------ --------------------- -------------------- ----------------------- -----------------------
102,412.583 94,222.877 232,970 6,169,403 0x7F5188000000 0x7F5784B09000
Any pointers will be appreciated.
Thanks