[Question] NSys CUDA Profiler - Page fault size

sudhanshugupta96 · August 27, 2024, 9:28pm

I am trying to create a trace of CPU and GPU page faults under unified memory. I ran nsys using a command from a previous question:

nsys profile --force-overwrite=true --cuda-um-gpu-page-faults=true --cuda-um-cpu-page-faults=true --export=sqlite ./add_vectors

I then queried the sqlite database for a trace of both CPU and GPU page faults. However, it seems like the CPU page faults do not include prefetches, only the demand accesses. The example below shows the initialization of two vectors. Odd numbered accesses increase by 4KB, then 8KB, all the way up to 2MB (it is the same for even numbered accesses):

Is there any way to get a log of all page faults, including prefetches?

Sanjiv.Satoor · August 28, 2024, 3:55am

Moved to the Nsight Systems category.

hwilper · August 28, 2024, 1:27pm

I do not believe so, but I am going to loop in the engineer that developed the page fault trace.

@skottapalli can you comment?

skottapalli · August 28, 2024, 3:49pm

Nsight Systems gets the Unified memory CPU page fault events from CUPTI. See 6.95. CUpti_ActivityUnifiedMemoryCounter2 — Cupti 12.6 documentation
From what I understand the prefetch operations are user provided hints and not considered as page faults, so it will not show up as a page fault.

I think you may be wanting to see the DtoH transfers for prefetches. If so, please take a look at the “DtoH transfer” timeline row under “Managed Memory” or “Unified Memory” timeline row. See the second screenshot in User Guide — nsight-systems 2024.5 documentation
You should see DtoH transfer events with the migration cause listed as prefetch.

sudhanshugupta96 · August 28, 2024, 4:10pm

Thanks for the links! In the code snippet I posted, I am allocating two vectors and then initializing them. I am looking for the page faults on the CPU that create a page table entry. More specifically, if you look at the CPU UM fault trace, the page offsets don’t necessarily increase by 4KB, indicating that some pages are being “pre-created”. I used the word prefetch before because the behavior seems similar to the tree-based prefetching scheme used in CPU/GPU UM transfers.

My question is is there a way of knowing how many pages on the CPU are being initialized when there is a page fault and the page does not exist anywhere yet? Looking at the CUpti documentation link, I don’t see “size” being listed as a public member. Do you know of any other way I can get this information?

skottapalli · August 28, 2024, 6:24pm

I don’t know of a way to get the size on initialization of pages when a page does not exist yet. However, on subsequent page faults, you could find the corresponding DtoH transfer event and it will contain the size of the transfer.

See the attached screenshot. The first 3 CPU page faults are due to initialization. The next 3 page faults are when the CPU needs to access the pages from the GPU, so DtoH transfer takes place. Those events has the size information.

Topic		Replies	Views
[Question] NSys CUDA Profiler - Page Migration and Number of CPU/GPU page faults Profiling Linux Targets cuda , profiling	1	993	June 23, 2023
Detail page fault tracking via nsys Profiling Linux Targets	2	565	February 14, 2024
How to Display CPU and GPU Page Faults in NSYS Output for Unified Memory on Grace Hopper? Profiling Linux Targets cuda	7	52	January 29, 2025
Where to find cpu/gpu pagefaults when using nsys? Profiling Linux Targets	10	43	May 7, 2025
Nsight system not report unified memory page fault statistics in summery Profiling Linux Targets nsight	3	1909	March 29, 2024
Nsight system shows the result of GPU page faults in two lines Profiling Linux Targets	4	828	February 15, 2024
Unified memory oversubscription and page faults CUDA Programming and Performance	7	2821	March 23, 2018
Nsys doesn't show cuda kernel and memory data Profiling Linux Targets cuda , kernel	10	227	December 7, 2024
Nsys is not collecting kernel data Profiling Linux Targets nsight , wsl	31	7657	March 14, 2025
Does not contain CUDA Unified Memory CPU page faults data Profiling Linux Targets	16	1098	March 8, 2024

[Question] NSys CUDA Profiler - Page fault size

Related topics