PyCUDA pass pointers to GPU memory

Hi all,

Would it be possible in PyCUDA to pass a “pointer to GPU memory” from one thread in Python to another one that also uses a CUDA function through PyCUDA for which the input is the output of the CUDA function of the former thread?

(I have a hard time explaining this in words, so I’ll add a small schematic)

The idea would be that the PyCUDA function returns the pointer to the result (which can be as big as 300k elements), then the Python part passes that pointer to the next process which then passes it to the next PyCUDA function, which uses the pointer to get its input data. I noticed moving the data is one of the performance bottlenecks in my application.

I have been looking into the shared memory stuff, but I can’t seem to easily get my head around it.

Thanks in advance,
Sam

https://gist.github.com/lebedov/5179201

Thanks for the answer.

This example seems to work on my desktop (GTX1060), but not on a Jetson Nano.

The Nano yields the error
<class 'pycuda._driver.LogicError'> : cuIpcGetMemHandle failed: operation not supported

Could it be possible this IPC stuff is not implemented on Jetson?

Greetings,
Sam

correct, see here:

" IPC functionality is not supported on Tegra platforms."

The only suggestion I would have would be to use linux-based host IPC, which is not specific to CUDA or something I would be able to help with.

You might also ask on the jetson sub-forum. Since Jetson memory is unified, there usually is not a huge issue if you use e.g. host pinned memory. I believe it may be possible to use host-pinned memory for host-based IPC, but I can’t really help with that and you might get better ideas on the jetson sub-forum.

I will start testing with some host pinned memory, and shoot another question on the Jetson sub-forum.

Anyway, a major thank you, I learned a lot of new stuff through the information you supplied.

I may have been unclear. host-pinned memory by itself does not allow for or facilitate host-based IPC.

So the starting point would be to see if you can get host based IPC working correctly in your python environment (I assume there should be plenty of online help for that). Then, you might see if you can get that host-based IPC working even when the underlying source memory is host-pinned. Hope that helps.

Again, asking on the Jetson forum may be a good idea before going down this avenue.