I am actually confused so i want to know exactly what are advantages of cudamempool on host rather than on device since with on host cudamempool latency is bit higher due to PCLE bandwidth. So exactly why we use cudamempool on host and in which scenarion we can use this.
Also i want to know the code showing demonstration of usecase of cudamempool on host rather than cudamempool on device
Do you mean by “cudamempool on host” a memory pool which allocates pinned memory?
Yes, Cudamempool allocation pinned memory on host
Personally, I can imagine the following use-cases
- allocating and freeing pinned memory is expensive. Having a pool will reduce the latency. And you don’t need to implement your own pool.
- Without having to worry too much about the expensive operations, one can write simpler c++ code (just allocate pinned memory where you use it. no need to allocate pinned memory at the highest level of the call hierarchy and passing it to the functions. (of course, in some situations allocating outside of a function can be more performant because fewer cuda API calls))
- Can be used as fallback if the memory pool for device memory reports out-of-memory
Can cudamempool on host give us advantage over cuda device mempool in case of multi gpu process
I do not understand your question. What do you mean by “advantage over cuda device mempool”.
Where ever you use device memory or pinned memory, you could allocate from a memory pool.
Actually i want to know can cudamempool creation on host can be usefull for mult gpu instead of cudamempool creation over gpu devices.
Usefull in which sense?
It is up to you to decide when to use pinned memory or device memory. This has nothing to do with memory pools.
Actually we can create cudamempool on device side using cudamempool props specific to gpu which then can be sharable with other gpu devices so instead of that if we create memory pool on host side which then pinned to multiple gpus can have advantage?
There are two ways of creating memory pool that is one on host side and other oh device side. Which side of memory pool allocation is usefull and in which scenario. Alos for multi gpu process which side memory pool allocation can be usefull
The PCIe connection is full-duplex and depending on mainboard and CPU also supports peer-to-peer access.
Pinning memory for one or more GPUs does not make much difference as it mostly means that the operating system may not swap out that memory and it keeps a fixed address in host RAM.
Whether storing the memory on device side or host side has advantages, depends on your application and usage scenario.
Latency probably is better with storage on the device, because only one PCIe connection has to be used (GPU to GPU).
Bandwidth may be the same, if the PCIe interfaces are not used by other transactions: For bandwidth it does not matter, if it goes through one or two PCIe interfaces, if they have the same speed.
In general, storing on the device seems to be the faster option. OTOH the host side perhaps has more overall memory. And also if several GPUs need that specific memory contents, the one PCIe link to the GPU, where it is stored, may be overwhelmed, whereas the host has many more PCIe lanes than a single GPU.