i had the same problem and was told that you cannot share them (but why!).
i was writing a computation library which had some initialization function then a computation function. i had to carefully craft the thread model such that CUDA-related data was managed from the respective threads but program data was shared between them.
could anyone imagine this solution (or is it too weird?):
an opengl-context could easily be shared within threads (i just do this). so cuda global memory could be loaded into a FBO and be provided this way. just should be faster than numerous host<->device transfers.
This is a very interesting thread. I read through 22.214.171.124 which says “Several host threads can execute device code on the same device”.
I was thinking, can multiple host threads be used to invoke concurrent kernels on a device? Could this be a way to invoke concurrent “processes” on a GPU (not sure if the kernels invoked by different host threads will be executed concurrently)? Has someone tried something like this?
What are the possible ways of executing multiple concurrent “processes” on a GPU? I know that there is no straightforward CUDA support for invoking concurrent kernels (atleast for mainstream GPUs).
You are completely wrong, that is exactly why the exist (the contexts) and it dose work (i am using it). I have both solutions running, in driver api i use contexts which is much more elegant and faster, in runtime i have a cuda thread that dose all the work.