[Question] Identifying All Instances of cudaMemcpy with Pageable Host Memory

This question isn’t strictly confined to the nsys profiler, though nsys serves as my starting point.

Here’s the motivation behind this query:
When using the nsys profiler, I detected certain cudaMemcpy operations involving pageable host memory. However, given the sampling-based nature of profiling, I’m concerned that nsys may not capture every single cudaMemcpy operation on pageable memory.

I’m interested in determining if there’s a reliable method to show all cudaMemcpy operations that utilize pageable memory, whether through nsys or other profiling tools.
The rationale behind this is that in my multi-streaming context, pageable host memory is highly unfavorable. This is because copies from&to pageable memory will cause synchronization issues.

Thank you!

Nsight Systems has both API trace and sampling functionality. And in the case of CUDA, all of the functions are traced, rather than being sampled.

We get a callback every time the function is called, so you are seeing every instance.

I would also like to suggest that you consider running the expert systems rule here:

Thank you. I explored the Expert Systems , and it provided a useful summary regarding Pageable Memcpy, along with information on other aspects such as sync Memcpy.

I also learned that all relevant function calls are listed there. With this assistance, I’ve managed to eliminate all Pageable Memcpy operations from my program. Thanks!