Data setup for multi-gpu program can't setup outside of thread?

I have a program that works for a single GPU and I’m porting it to a multi-GPU version using the project in the SDK as a reference. What I want to do is set up a bunch of global memory, textures, etc., store handles in structures, then spawn threads to run the kernels. Inside the threads the structure would be used to access the per-gpu variables, upload data run the kernel, and download data.

The problem is that when I do things like cudaMalloc outside of the thread, I get garbage out of my kernel. When I do the cudaMalloc inside the thread, the kernel works fine. When trying to call cudaMalloc before spawning the thread, if I print the address of the pointer returned by cudaMalloc outside and inside the thread, they are the same but something breaks. Any ideas? :huh:

CUDA resources created by different host (CPU) threads cannot be shared (Programming Guide, section 4.5.1.1).

Paulius

I’d say this is due to CUDA contexts. Each thread can have only one context which has it’s own memory space etc. Basically if you transfer memory inside your main thread it will use the context of the main thread. In the second newly created thread you will be using a different context and hence cannot access data outside your context.
My guess is that there is really no way to prevent this, however maybe someone can prove me wrong.

Thanks, good to know.