I’m coding a multi gpu data flow pipeline. For having the best cudaMemcpyAsync performance, I know I have to use cudaMallocHost for getting pinned host memory.
But it seems that this pinned host memory block is only pinned for the current selected gpu device.
Is it possible to allocated 1 block memory which is pinned for several gpu devices ?
Or should I manage several memory blocks ? One by gpu device ?
I’ve never heard that and don’t believe that it is true.
Thanks for your quick answer.
My bottleneck should come from elsewhere.
I have difficulty to know when or not when should be issue the cudaSetDevice(i).
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.