CUDa programming for multi-GPU questions about multi-GPU memory allocation

I have an GTX295, thus I want to utilize the power of both multi-GPU. However, I have a problem. When I assign a task (a function) to a thread, it looks like the GPU memory used by that function must also be allocated in the same function, for example:

for(i = 0; i < GPU_N; i++)
	threadID[i] = cutStartThread((CUT_THREADROUTINE)solverThread, (void *)(plan + i));
cutWaitForThreads(threadID, GPU_N);

Then any GPU memory used by solverThread must also be allocated in that function. In other words, if I want to split the solverThread into two functions, and invoke them like this:

for(i = 0; i < GPU_N; i++)
	threadID[i] = cutStartThread((CUT_THREADROUTINE)thread_init, (void *)(plan + i));
cutWaitForThreads(threadID, GPU_N);

for(i = 0; i < GPU_N; i++)
	threadID[i] = cutStartThread((CUT_THREADROUTINE)thread_run, (void *)(plan + i));
cutWaitForThreads(threadID, GPU_N);

The function thread_init is used to allocate memory (also use cudaSetDevice), and thread_run is used to compute based on allocated memory. Unfortunately, I found it doesn’t work. CUDA looks like cannot see the already allocated memory in the second function. Any body has experience, please?

Thanks very much!!

Mian

Regarding multi-GPU programming, there exists a nice work done by MisterAnderson at http://forums.nvidia.com/index.php?showtopic=66598&st=0 . It is (quite) easy to follow that strucure to perform multi-GPU in CUDA. I have done my work using dual-GPUs in GTX295 based on that framework.
You should better take a look over it.
Giang,