Hi guys,
I am a Computer Engineering grad student, seeking some basic help with CUDA’s multi-GPU functionality.
The requirement is to write a CUDA app that must use both GPUs. I’ve downloaded & executed the multi-GPU example successfully. However, I’m not yet experienced enough with CUDA to understand how to pass a data structure to each GPU. I also need to get results back to the CPU side.
The following code snippets are from NVIDIA’s example. It starts with main() creating a thread per GPU:
int threadIds[MAX_CPU_THREAD];
printf("%d GPUs found\n", s_gpuCount);
CUTThread * threads = (CUTThread *)malloc(sizeof(CUTThread) * s_gpuCount);
// Start one thread for each device.
for(int i = 0; i < s_gpuCount; i++) {
threadIds[i] = i;
threads[i] = cutStartThread((CUT_THREADROUTINE)gpuThread, (void *)&threadIds[i]);
}
// Wait for all the threads to finish.
cutWaitForThreads(threads, s_gpuCount);
free(threads);
And a method that executes whatever kernel:
static CUT_THREADPROC gpuThread(int * device) {
CUDA_SAFE_CALL(cudaSetDevice(*device));
const int mem_size = NUM_BLOCKS * NUM_THREADS * sizeof(float) / s_gpuCount;
float * idata;
CUDA_SAFE_CALL(cudaMalloc( (void**) &idata, mem_size));
float * odata;
CUDA_SAFE_CALL(cudaMalloc( (void**) &odata, mem_size));
// @@ Copy some values to the buffers.
// Invoke kernel on this device.
multigpu_kernel<<<NUM_BLOCKS / s_gpuCount, NUM_THREADS, NUM_THREADS*sizeof(float)>>>(idata, odata);
// @@ Get the results back.
CUT_THREADEND;
}
So for instance, if my CPU starts off with some arrays that need work done to them… how could I pass different arrays to different GPUs? The results must also end up on the CPU side eventually.
I would very much appreciate some guidance. Thank you for reading :)
- Vash