I have an GTX295, thus I want to utilize the power of both multi-GPU. However, I have a problem. When I assign a task (a function) to a thread, it looks like the GPU memory used by that function must also be allocated in the same function, for example:
for(i = 0; i < GPU_N; i++)
threadID[i] = cutStartThread((CUT_THREADROUTINE)solverThread, (void *)(plan + i));
cutWaitForThreads(threadID, GPU_N);
Then any GPU memory used by solverThread must also be allocated in that function. In other words, if I want to split the solverThread into two functions, and invoke them like this:
for(i = 0; i < GPU_N; i++)
threadID[i] = cutStartThread((CUT_THREADROUTINE)thread_init, (void *)(plan + i));
cutWaitForThreads(threadID, GPU_N);
for(i = 0; i < GPU_N; i++)
threadID[i] = cutStartThread((CUT_THREADROUTINE)thread_run, (void *)(plan + i));
cutWaitForThreads(threadID, GPU_N);
The function thread_init is used to allocate memory (also use cudaSetDevice), and thread_run is used to compute based on allocated memory. Unfortunately, I found it doesn’t work. CUDA looks like cannot see the already allocated memory in the second function. Any body has experience, please?
Thanks very much!!
Mian