I need this logic to be executed on gpu asynchronously with the cpu:
Memtransfer from host to device
Kernel launch using the result of 1.
The only function for the memory transfer I can use (which doesn’t block cpu) is memcpyHtoDAsync. But the header files also contain a comment that “if the hardware is available, may execute in parallel with the GPU”, which basically means that step 2 can start without step 1 being completed. How can I synchronize step 1 with step 2 without stalling the CPU ?
Ok, so if I want to preload data in the background and get overlapped copy, I have to use a separate stream for that, correct?
Edit: And then the same behaviour should apply to concurrent kernel execution, does that mean that I can benefit by manually putting independent kernels into as many streams as I can to utilize parallel kernel execution at maximum? This would make a lot of sense to do for small kernels which are not able to fully occupy the gpu, but practically I expect this to be very inconvenient to code (I can be wrong though).
Ok, so if I want to preload data in the background and get overlapped copy, I have to use a separate stream for that, correct?
Edit: And then the same behaviour should apply to concurrent kernel execution, does that mean that I can benefit by manually putting independent kernels into as many streams as I can to utilize parallel kernel execution at maximum? This would make a lot of sense to do for small kernels which are not able to fully occupy the gpu, but practically I expect this to be very inconvenient to code (I can be wrong though).
Correct again on all points. I certainty find it difficult to take advantage of concurrent kernels in my applications. Look at it this way, your kernels need to be extremely small to really get much of a benefit from this anyways.
Correct again on all points. I certainty find it difficult to take advantage of concurrent kernels in my applications. Look at it this way, your kernels need to be extremely small to really get much of a benefit from this anyways.