Would it be possible in PyCUDA to pass a “pointer to GPU memory” from one thread in Python to another one that also uses a CUDA function through PyCUDA for which the input is the output of the CUDA function of the former thread?
(I have a hard time explaining this in words, so I’ll add a small schematic)
The idea would be that the PyCUDA function returns the pointer to the result (which can be as big as 300k elements), then the Python part passes that pointer to the next process which then passes it to the next PyCUDA function, which uses the pointer to get its input data. I noticed moving the data is one of the performance bottlenecks in my application.
I have been looking into the shared memory stuff, but I can’t seem to easily get my head around it.
Thanks in advance,