If I cascade 2 kernels,the second kernel wanna use the first kernel result as texture.
And I wanna asynchronously execute my program under CUDA Driver.
How do I do?
I notice there is cudaMemcpyToArrayAsync in CUDA Runtime, but there is no asynchronous cuMemcpyDtoA in CUDA Driver.
If I call cudaMemcpyToArrayAsync with cudaMemcpyDeviceToDevice, it seems also work.
Isn’t CUDA Runtime constructed by CUDA Driver? Why is there no cuMemcpyDtoA?
That is so weird.
If I wanna use the current CUDA Driver, how to implement what I need ( the second kernel use the first kernel result as texture)?
Using stream query? Using cuMemcpyDtoH plus cuMemcpyHtoA?
Is there any better way?