global cuda memory and os-threads


im just trying to share cuda global memory between OS-threads (eg linux pthreads).

is it true, that sharing the cuda-pointers will not work. i just want to have read access via os-threads.

is there a solution with the driver-api sharing the context?
how could this code look like?


i had the same problem and was told that you cannot share them (but why!).

i was writing a computation library which had some initialization function then a computation function. i had to carefully craft the thread model such that CUDA-related data was managed from the respective threads but program data was shared between them.

Correct, CUDA memory resources cannot be shared between host threads (Programming Guide Section


well ok, thank you for fast response.

seems that i didnt have recognized :-)

thats a pitty.

could anyone imagine this solution (or is it too weird?):
an opengl-context could easily be shared within threads (i just do this). so cuda global memory could be loaded into a FBO and be provided this way. just should be faster than numerous host<->device transfers.

any suggestions?

You can do CUDA in one thread and let other threads send messages to tell the CUDA thread what to do.
CUDA is serial anyway, so that approach shouldn’t lose any performance (ideally).

good idea!

thank you

I wonder this is possible if we use Driver API because CUDA context can be attached and detached from the host thread.
Anyone tried this?

i have that in mind, too, but didnt have tried yet.

Unfortunately, cuCtxAttach/cuCtxDetach cannot be used to migrate CUDA contexts from one thread to another.

Does cuCtxAttach() generate an error when attempting to attach from a thread that didn’t create it? I don’t see this specified in the programming guide.

This is a very interesting thread. I read through which says “Several host threads can execute device code on the same device”.

I was thinking, can multiple host threads be used to invoke concurrent kernels on a device? Could this be a way to invoke concurrent “processes” on a GPU (not sure if the kernels invoked by different host threads will be executed concurrently)? Has someone tried something like this?

What are the possible ways of executing multiple concurrent “processes” on a GPU? I know that there is no straightforward CUDA support for invoking concurrent kernels (atleast for mainstream GPUs).

Request share your experiences.



Cross-posting does not get you answers any quicker and in fact is counter-productive as it annoys people (well, I can only speak for myself).

Oh I am sorry…I posted it again on a new thread because I suddenly struck if anyone would reply to a thread over an year old. Apologize and thanks for the reply.

You are completely wrong, that is exactly why the exist (the contexts) and it dose work (i am using it). I have both solutions running, in driver api i use contexts which is much more elegant and faster, in runtime i have a cuda thread that dose all the work.