Managing multiple GPUs from a single host thread

From what I understand, there normally must be a one-to-one mapping of host threads to devices. I’m wondering if it is possible to manage multiple GPUs from a single host thread. According to this post: http://forums.nvidia.com/index.php?showtop…rt=#entry984126 it seems like this should be possible. Has anyone tried this? I’m thinking it would look something like:

CUcontext ctxs[NUM_DEVICES];
CUdevice *dev;

for(i=0; i < NUM_DEVICES; i++) {
cudaSetDevice(i);
dev = (CUdevice *)malloc(sizeof(CUdevice));
cuCtxGetDevice(dev);
cuCtxCreate(&ctxs[i],0, dev);
cuCtxPopCurrent(&ctxs[i]);
}

And then you could use cuCtxPushCurrent and cuCtxPopCurrent to switch between contexts. Has anyone tried this or see any basic problems with it (I’m not familiar with how CUDA contexts work)? Would switching contexts possible introduce a lot of overhead?

Any input is appreciated, Thanks very much in advance.

From what I understand, there normally must be a one-to-one mapping of host threads to devices. I’m wondering if it is possible to manage multiple GPUs from a single host thread. According to this post: http://forums.nvidia.com/index.php?showtop…rt=#entry984126 it seems like this should be possible. Has anyone tried this? I’m thinking it would look something like:

CUcontext ctxs[NUM_DEVICES];
CUdevice *dev;

for(i=0; i < NUM_DEVICES; i++) {
cudaSetDevice(i);
dev = (CUdevice *)malloc(sizeof(CUdevice));
cuCtxGetDevice(dev);
cuCtxCreate(&ctxs[i],0, dev);
cuCtxPopCurrent(&ctxs[i]);
}

And then you could use cuCtxPushCurrent and cuCtxPopCurrent to switch between contexts. Has anyone tried this or see any basic problems with it (I’m not familiar with how CUDA contexts work)? Would switching contexts possible introduce a lot of overhead?

Any input is appreciated, Thanks very much in advance.