I’m having a few problems with setting up my asynchronous copies. The way in which I am trying to use them is to overlap host computation with the data transfers. The execution path is roughly:
call to asynchronous data transfer subroutine
call to host computation subroutine
call to device computation subroutine
I don’t get any error when the asynchronous data transfer are carried out but the results I’m getting are incorrect.
After running the program in emulation mode it became apparent that the device code is not using the data that was transferred. I think this may be due to the data transfers and the actual device code being in separate files so the data in the device code doesn’t point to the same location as when the data was initially transferred. I tried passing the device data variables as arguments in to the device code but this still didn’t rectify my problem…
Is there a way of passing a pointer to device memory between subroutines in host memory?
Or, does anyone have any other suggestions on the matter?