[Question] Identifying All Instances of cudaMemcpy with Pageable Host Memory

user157267 · April 10, 2025, 6:52pm

This question isn’t strictly confined to the nsys profiler, though nsys serves as my starting point.

Here’s the motivation behind this query:
When using the nsys profiler, I detected certain cudaMemcpy operations involving pageable host memory. However, given the sampling-based nature of profiling, I’m concerned that nsys may not capture every single cudaMemcpy operation on pageable memory.

I’m interested in determining if there’s a reliable method to show all cudaMemcpy operations that utilize pageable memory, whether through nsys or other profiling tools.
The rationale behind this is that in my multi-streaming context, pageable host memory is highly unfavorable. This is because copies from&to pageable memory will cause synchronization issues.

Thank you!

hwilper · April 11, 2025, 7:34pm

Nsight Systems has both API trace and sampling functionality. And in the case of CUDA, all of the functions are traced, rather than being sampled.

We get a callback every time the function is called, so you are seeing every instance.

I would also like to suggest that you consider running the expert systems rule here:

user157267 · April 12, 2025, 3:47pm

Thank you. I explored the Expert Systems , and it provided a useful summary regarding Pageable Memcpy, along with information on other aspects such as sync Memcpy.

I also learned that all relevant function calls are listed there. With this assistance, I’ve managed to eliminate all Pageable Memcpy operations from my program. Thanks!

system · April 26, 2025, 3:47pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Question about NCCL trace in Nsight System Profiling Linux Targets nsight	2	1419	December 28, 2023
Does Nsight makes the kernel/memory time longer? Profiling Linux Targets	4	250	April 25, 2025
Memcpy is not included in the list selected through nsight compute cli Nsight Compute	1	565	September 13, 2021
How to get the bytes read/write sum about Memory access between GPUs? Nsight Compute	7	1051	March 20, 2024
Where to find cpu/gpu pagefaults when using nsys? Profiling Linux Targets	10	382	May 7, 2025
nvprof --print-api-trace - puzzling outputs. Visual Profiler and nvprof	0	667	January 7, 2020
Question about memory transfer Visual Profiler and nvprof	2	1677	February 5, 2020
Question when Prifilling Megatron-LM Profiling Linux Targets cudnn , llama	8	141	November 14, 2025
Visual profiler missing information CUDA Programming and Performance	6	8525	May 26, 2009
Nsys crashes on first memory access Profiling Linux Targets cuda	4	188	August 14, 2024

[Question] Identifying All Instances of cudaMemcpy with Pageable Host Memory

Related topics