I am very new to cuda and cuda programming so please forgive my ignorance. I have a point cloud. I want to load the point cloud into the GPU’s memory. Once the point cloud is loaded into the GPU memory I want to perfrom transformation on the point cloud in Program A using a cuda kernel and further process this transformed point cloud in Program B and so on. My question is I do not want to move the point cloud back to CPU send it over to another program using my messaging service instead I want the point cloud to rest in GPU and be accessed across multiple programs so that I do not have the time constraint of transferring back and forth between devices. I came across cuda IPC mem handle but not very clear about how it works. Is cuda IPC the right way to look into this or is there other techniques to handle the problem?
CUDA IPC could probably be used. If you have data resident on a GPU, and you want that data to be “persistent”, you will need to have a CPU process that “owns” that data/allocation. That would be the CPU process that allocates space on the device for storage of the data, and then copies that data from host to device.
Thereafter, as long as that process is continuing to run, it should be possible to “share” that data with other processes, especially those running on that GPU, via IPC.
I don’t know of any other method to have data stay “persistent” in GPU memory other than by having an owning process that continues to run. There is a CUDA IPC sample code.
Hi, Thank you for the previous suggestion. It really helps. I also went through the example and deallocation is happening in the parent process. In my case the memory allocation happens in a different program. So I am struggling with freeing the memory. I am missing something very trivial?
share the mem handle for that allocation with other programs
stay resident (i.e. running, as a program) until all usage of the data is finished
deallocate the device allocation used to store the data
only terminate at that point
The sample code that I linked to for conciseness is not presenting itself as two separate programs or executables. Rather it is a single program that you start, that creates an “original” process and a “new” process, and communicates between those two processes. So you should imagine that it is possible to separate those two processes into two separate programs.