what is the best way to share data between subroutines


If I have a fortran code of the form

do t=1,tn ! main loop over time

call a(parameters)
call b(parameters)
call a(parameters)
call c(parameters)


Many results and arrays which are passed into one routine are also used in the following routine (for example, “b” might use the results of “a” and/or some of the same input parameters which did not change in “a”).

Now, I have a CUDA kernel written for each routine. Currently there is lots of unnecessary data transfer: the results of “a” are passed back to the host, then immediately passed back to the gpu for “b”. How do I just keep the needed data from “a” on the GPU for direct use by “b”?

One way to do it would be to combine all the subroutines into one massive routine, and then have one call to one massive kernel. This seems the most straightforward way to do it, but I’m wondering what the more elegant/clean solution is. I know this must be like basic cuda 101, but I’m having trouble finding the exact documentation that covers this.


Hi brush,

Copy the data before the do loop and you can just pass the “device” arrays to the subroutines or put them into a module.

  • Mat