In my compute node 12 cores must share 2 tesla M2070. Since in my program the time occupation of kernel is limited (4-5% of total), I think that 6 processes can easely share one tesla (but I have to try before to be sure).
The problem is that the memory of GPU is not enough for 6 processes. However the large data (1-2 GB) that must be transfered to gpu is the same for all processes so I was wondering if I can transfer the common data once, send the address of gpu memory to other processes that share the gpu and consider the gpu address like a sort of shared memory (“shared” in the interprocess communication view).
Is it possibile?
I don’t know if the gpu address is physical or not. I think that this possibility depends on the driver.
look at the cudaIpc interfaces in CUDA 4.1
Thank you for your answer.
I think this cudaIpc functions are what I need.
Unfortunately I must wait that the cluster administrator adds the module cuda 4.1 with the new toolkit. I hope he will do it soon.
Just one more question: for the communication of the handle to other processes I can use:
MPI_Send(&handle, sizeof(cudaIpcMemHandle_t), MPI_CHAR, dest, tag, MPI_COMM_WORLD);
or the handle contains some pointers?