hi,
i’m a cuda beginner. I haven’t got the relation between device and grid(s). If each grid uses own shared memory, and communicates with each other through the global memory, is it necessary to partition a device into many grids? (maybe anyone knows an example?)
In other words, is the relation between device and grid always 1 to 1?
Shared memory is a block level resource, not grid level.
At the moment, the answer is yes. Fermi is supposed to be able to run multiple kernels (and hence grids) simultaneously, but for now it is one grid per physical device. Further to that, you can’t map a grid over multiple devices, so there is no intrinsic multi-GPU capability in CUDA. For that, the programmer has to devise a scheme for decomposing the task over multiple devices.