I’m trying to manually manage execution on multiple GPUs, which seems to require manually loading modules onto each device and then switching between the devices (since chances are the same function will be at different locations on different devices). However, I’m not certain exactly how modules work (whether they’re tied to individual CUDA contexts for example) and this is causing trouble. What I want to have is a module loaded on each device, have a CUfunction object indicating the entry point to my program on that device, and then be able to switch between contexts for each device and call the entry function using the CUfunction object (and cuFuncSetBlockShape, cuLaunchGridAsync, etc). However, my experiments so far are showing this to be not possible. Has anyone done work with managing multiple modules, one for each device, or have any comments on possible methods?
Thanks for any help.
EDIT: Error found. Wasn’t keeping track of my contexts as well as I thought I was, and so the context for one device was being associated with the module loaded on another device which caused an error.