global cuda memory and os-threads

mikemoik · July 30, 2007, 4:23pm

hi,

im just trying to share cuda global memory between OS-threads (eg linux pthreads).

is it true, that sharing the cuda-pointers will not work. i just want to have read access via os-threads.

is there a solution with the driver-api sharing the context?
how could this code look like?

thanks,
moik

e.ping · July 31, 2007, 1:02am

i had the same problem and was told that you cannot share them (but why!).

i was writing a computation library which had some initialization function then a computation function. i had to carefully craft the thread model such that CUDA-related data was managed from the respective threads but program data was shared between them.

paulius · July 31, 2007, 4:21am

Correct, CUDA memory resources cannot be shared between host threads (Programming Guide Section 4.5.1.1).

Paulius

mikemoik · July 31, 2007, 6:41am

well ok, thank you for fast response.

seems that i didnt have recognized 4.5.1.1 :-)

thats a pitty.

could anyone imagine this solution (or is it too weird?):
an opengl-context could easily be shared within threads (i just do this). so cuda global memory could be loaded into a FBO and be provided this way. just should be faster than numerous host<->device transfers.

any suggestions?

asadafag · July 31, 2007, 7:37am

You can do CUDA in one thread and let other threads send messages to tell the CUDA thread what to do.
CUDA is serial anyway, so that approach shouldn’t lose any performance (ideally).

mikemoik · August 2, 2007, 8:49am

good idea!

thank you

hqyang · August 6, 2007, 7:29am

I wonder this is possible if we use Driver API because CUDA context can be attached and detached from the host thread.
Anyone tried this?

mikemoik · August 6, 2007, 7:49am

i have that in mind, too, but didnt have tried yet.

nwilt · August 6, 2007, 7:35pm

Unfortunately, cuCtxAttach/cuCtxDetach cannot be used to migrate CUDA contexts from one thread to another.

darr · January 10, 2008, 9:27pm

Does cuCtxAttach() generate an error when attempting to attach from a thread that didn’t create it? I don’t see this specified in the programming guide.

Aditi · January 21, 2009, 12:20am

This is a very interesting thread. I read through 4.5.1.1 which says “Several host threads can execute device code on the same device”.

I was thinking, can multiple host threads be used to invoke concurrent kernels on a device? Could this be a way to invoke concurrent “processes” on a GPU (not sure if the kernels invoked by different host threads will be executed concurrently)? Has someone tried something like this?

What are the possible ways of executing multiple concurrent “processes” on a GPU? I know that there is no straightforward CUDA support for invoking concurrent kernels (atleast for mainstream GPUs).

Request share your experiences.

Thanks,

Aditi

MisterAnderson42 · January 21, 2009, 1:03am

Cross-posting does not get you answers any quicker and in fact is counter-productive as it annoys people (well, I can only speak for myself).

Aditi · January 21, 2009, 2:53am

Oh I am sorry…I posted it again on a new thread because I suddenly struck if anyone would reply to a thread over an year old. Apologize and thanks for the reply.

erdooom · January 21, 2009, 1:09pm

You are completely wrong, that is exactly why the exist (the contexts) and it dose work (i am using it). I have both solutions running, in driver api i use contexts which is much more elegant and faster, in runtime i have a cuda thread that dose all the work.