i had the same problem and was told that you cannot share them (but why!).
i was writing a computation library which had some initialization function then a computation function. i had to carefully craft the thread model such that CUDA-related data was managed from the respective threads but program data was shared between them.
could anyone imagine this solution (or is it too weird?):
an opengl-context could easily be shared within threads (i just do this). so cuda global memory could be loaded into a FBO and be provided this way. just should be faster than numerous host<->device transfers.
You can do CUDA in one thread and let other threads send messages to tell the CUDA thread what to do.
CUDA is serial anyway, so that approach shouldn’t lose any performance (ideally).
Does cuCtxAttach() generate an error when attempting to attach from a thread that didn’t create it? I don’t see this specified in the programming guide.
This is a very interesting thread. I read through 4.5.1.1 which says “Several host threads can execute device code on the same device”.
I was thinking, can multiple host threads be used to invoke concurrent kernels on a device? Could this be a way to invoke concurrent “processes” on a GPU (not sure if the kernels invoked by different host threads will be executed concurrently)? Has someone tried something like this?
What are the possible ways of executing multiple concurrent “processes” on a GPU? I know that there is no straightforward CUDA support for invoking concurrent kernels (atleast for mainstream GPUs).
Oh I am sorry…I posted it again on a new thread because I suddenly struck if anyone would reply to a thread over an year old. Apologize and thanks for the reply.
You are completely wrong, that is exactly why the exist (the contexts) and it dose work (i am using it). I have both solutions running, in driver api i use contexts which is much more elegant and faster, in runtime i have a cuda thread that dose all the work.