Hi,
i was thinking that it would be really useful to be able to copy something into the shared memory before kernel execution, like from CPU or from another kernel…
to avoid performing a gst/gld when i know i will need those values just after.
Is there a way to allocate and copy data into a “static” portion of the shared memory, so that it doesn’t get deleted when i call a new Kernel?
to me it sounds like what you want is actually constant memory type. constant data is preserved across kernel invocations and is as fast as shared memory if your threads read data in a coherent way.
what you could try is to read uninitialized data inside the kernel after you’ve executed another kernel which wrote into that shared memory area.
I doubt that the whole shared memory block is cleared before each kernel invocation, so this might actually work if you keep shared memory size per treadblock the same (and same set of input arguments). Also you do not have guarantee, that all shared memory will be touched by your ‘writing’ kernel, so you’d probably want to take that into account.
If you are going to experiment - write about your findings here.
And also you need an exclusive access to the device to make that work, since your algorithm most probably will not like if someone else kicks in and uses the device to render gui for instance.
yes i think that this would be really an hack, even if i could get it to work it won’t for sure work reliably, so i don’t think i will even try.
BTW it would be useful if cuda allowed for a “Kernel Chain” where each kernel keeps the shared memory of the preceding… but maybe it would have too many limitations (es: block number)