I’m profiling my application with the following command on a GH200. However, when I view the generated nsys report, I am not able to spot the statistics for the page faults. Where might I find these?
I had not run the script before but I see this with the script
python /home1/apps/nvidia/Linux_aarch64/24.9/profilers/Nsight_Systems/target-linux-sbsa-armv8/reports/um_cpu_page_faults_sum.py report1.sqlite
report1.sqlite does not contain CUDA Unified Memory CPU page faults data.
For the data collection, I am running nsys on the GH host when there is an application running inside a singularity (apptainer) container. Nsys seems to not be able to capture the pagefault information when run outside the container. Are there any specific settings needed for nsys to be able to monitor the pagefaults caused by a application running inside a container?
@liuyis I used your advice from the other thread to get the CUDA runtime events by attaching nsys to the background process, but I still am not able to see the page fault data while profiling inside the container. Could you please have a look at the nsys-rep file?
The page fault data that Nsys could collect is for CUDA Unified Memory feature specifically. Does the application actually uses CUDA UVM? On a search of the CUDA API calls, I don’t see functions like cudaMallocManaged() being used.
cudaHostAlloc() calls that should be directly accessible by the device
but still no information about CPU/GPU page faults that may have occurred was captured in the trace. Could you please let me know what to expect here? I’m also not sure what the “UVM GPU1 BH” process is referring to since it shows high utilization
@liuyis thank you for your response clarifying the differences between UVA and UVM. Also, thank you for the sample. I am able to confirm that I can profile within a container.
On a final note,
From nsys tooling reports perspective, some of these seem ambiguous. For example, I see the following recommendation identifies the memcpy regions as pageable memory. I am not sure if this means that there were no pages migrated by the driver or if there were no page faults that occurred at the time of access. Could you please clarify this?