I have a program in which the main process runs on CPU and some part of the calculations during the main process is to be performed on GPU. Basically, what happens in the program is that in each iteration of an outer loop, a system of linear equations (Ax=b) should be solved for four different RHS vectors but with the same matrix. The iterative solution is performed on GPU. But all four RHS vectors are not present initially. In fact, in each outer iteration, first the required matrix and the first RHS are formed initially. However, other RHS vectors are formed on CPU, using the solution of the previous system. So, in each outer iteration, I need to copy the matrix to GPU once and then just send and receive the value of the b and x vector to and from GPU, respectively. In other words, when I send the matrix to GPU , I need to be able to keep it there for subsequent uses, while in the meantime, I need to refer back to CPU to perform other stuff.
Transfer of the b and x vectors between GPU and CPU is straightforward. However, in order to avoid copying the same matrix four times per outer iterations (just copy it form CPU to GPU once and use it several times), What sort of memory allocation can I use for the matrix?
I hope I could describe my question well enough.
Many thanks for your help.