This question isn’t strictly confined to the nsys profiler, though nsys serves as my starting point.
Here’s the motivation behind this query:
When using the nsys profiler, I detected certain cudaMemcpy
operations involving pageable host memory. However, given the sampling-based nature of profiling, I’m concerned that nsys may not capture every single cudaMemcpy
operation on pageable memory.
I’m interested in determining if there’s a reliable method to show all cudaMemcpy
operations that utilize pageable memory, whether through nsys or other profiling tools.
The rationale behind this is that in my multi-streaming context, pageable host memory is highly unfavorable. This is because copies from&to pageable memory will cause synchronization issues.
Thank you!