Is there any document about proper use of various malloc method?

Is threre any document about proper use of cudaMalloc/cudaMallocHost/gdrcopy respectively.

Seeing follow description, gdrcopy performs very well and no cons.

The CUDA runtime API document indicates proper usage of those.

gdrcopy is documented here.

1 Like

Thank you for describing this.
My intention is the difference between cudaMallocHost/GDRCopy.

From seeing the graph about data size and latency (*) , I realized that, for small data, it should use GDRCopy, and , for large data, it should use cudaMallocHost.

(*) for example GDRCopy page which is previously linked.

Thanks for your link to gdrcopy, something I hadn’t come across before.

In turn it linked to another document on RDMA I found useful and indirectly perhaps explains a number of problems that appear on these lists, around P2P failures on assorted hardware.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.