This question isn’t strictly confined to the nsys profiler, though nsys serves as my starting point.
Here’s the motivation behind this query:
When using the nsys profiler, I detected certain cudaMemcpy operations involving pageable host memory. However, given the sampling-based nature of profiling, I’m concerned that nsys may not capture every singlecudaMemcpy operation on pageable memory.
I’m interested in determining if there’s a reliable method to show allcudaMemcpy operations that utilize pageable memory, whether through nsys or other profiling tools.
The rationale behind this is that in my multi-streaming context, pageable host memory is highly unfavorable. This is because copies from&to pageable memory will cause synchronization issues.
Thank you. I explored the Expert Systems , and it provided a useful summary regarding Pageable Memcpy, along with information on other aspects such as sync Memcpy.
I also learned that all relevant function calls are listed there. With this assistance, I’ve managed to eliminate all Pageable Memcpy operations from my program. Thanks!