Is threre any document about proper use of cudaMalloc/cudaMallocHost/gdrcopy respectively.
Seeing follow description, gdrcopy performs very well and no cons.
Is threre any document about proper use of cudaMalloc/cudaMallocHost/gdrcopy respectively.
Seeing follow description, gdrcopy performs very well and no cons.
Thank you for describing this.
My intention is the difference between cudaMallocHost/GDRCopy.
From seeing the graph about data size and latency (*) , I realized that, for small data, it should use GDRCopy, and , for large data, it should use cudaMallocHost.
(*) for example GDRCopy page which is previously linked.
Thanks for your link to gdrcopy, something I hadn’t come across before.
In turn it linked to another document on RDMA I found useful and indirectly perhaps explains a number of problems that appear on these lists, around P2P failures on assorted hardware.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.