In the code I currently work with, many of the host arrays that are copied to my device kernels are allocated far “above” the actual computation routines. As you traverse the tree down from, for example, physics to moist processes, by then an array for temperature is actually a pointer to that space allocated far above.
This setup has often stymied my ability to try async/double buffering, use faster memory transfers, &c. since I’d need pinned memory and it’s a bit difficult to pinpoint, exactly, who first allocated that array to pin it there. (I could, of course, allocate new pinned memory buffers inside my local routines and work with them, but that would be doubling the host memory needed for quite a few, very large arrays. Plus, allocating pinned memory is slow, so doing it every timestep is not good.)
So, I was intrigued when I learned of cudaHostRegister in CUDA 4.0. It seems like it would help in that I could register, say, my local temperature pointer and then gain the chance for faster transfers, the possibility of async. copies, et al.
But that leads to my question: reading the CUDA Fortran User’s Guide, am I right in thinking that I can’t HostRegister a Fortran array/pointer? Rather I’d need to use iso_c_binding and have fun with posix_memalign, c_ptr, c_f_pointer, &c.? (In which case, I’d be valloc’ing a new array and doubling space again…)
Thanks for any help with this and other questions sure to surface as I explore all the new 4.0 routines.