I’ve been doing some simple experiments on the GPUs this weekend and found that
- when I try to allocate memory for 5 million particles per GPU the compiler tells me there is insufficient memory.
- when I try to allocate memory for 1 million particles per GPU the compiler does not complain, but when I try to execute a very simple kernel I get an unspecified kernel launch failure which I suspect is due to not enough resources/memory on the GPU
- when I try to do the same for 750000 particles as for 1 million the same occurs as.
- when I try to do the same for 700000 particles as for 1 million the compiler does not complain and the very simple kernel will execute.
It appears that memory requirement is not linear on the number of particles. Why is that? When using cuMemGetInfo for smaller numbers of particles it suggested that 5 million particles could be placed on each GPU, ignoring kernels.
And how much memory besides the data is required to execute a kernel?
And how much memory does a kernel require and cannot this be calculated at compile time?
How can I get hold of this info or calculate it myself?