I tried to write a simple program based on multiGPU example. Up to now, i met 2 problems:
(1) If I have several arguments for the kernel, how do I pass them to the kernel. It seems that I need to first pass the arguments to gpuThread(), then to the kernel. But how do I pass the arguments to gpuThreads(). i don’t quiet understand the way gpuThread is called, “threads[i] = cutStartThread((CUT_THREADROUTINE)gpuThread, (void *)&threadIds[i]);”
Currently, i can get around this by making the variables passed to kernel global.
i have 2 threads
thread 0: cudaSetDevice(0), do something
thread 1: cudaSetDevice(1), do something
then again :
thread 1: cudaSetDevice(0)
at this point, is the results of previous kernel implementation on device 0 still valid?
In the example, the memory allocation/copy and kernel launch are in the same function gpuThreads. I’d like to separate memory allcoation/copy and kernel launch, because after memory alloc/copy, i want to run the same kernel many times, and after each time, two threads (GPU contexts) will exchange some information.
I did this by dividing gpuThread() into two functions gpuThreadAlloc() and gpuThreadWork(). But it doesn’t work this way.
I attached the code, thanks a lot,
multiGPU.zip (2.83 KB)