I have an application that runs on multiple cores utilizing MPI. I have a CUDA kernel function that any given core needs to be able to call at an unspecified time.
Basically, the GPU needs to act as a service accessible at any time by any user (host CPU core). If the GPU is already in use, a requesting core will obviously have to pause and wait for its turn.
Is this functionality built into CUDA or will I need to use something like a semaphore to make it work?
Also, can anyone point me to any academic or whitepapers that describe using the GPU in this fashion? What is the relationship of each core to the GPU? Is one the master, or does each have equal access?