CUDA Fortran 3D pitched memory transfers

I am performing a pitched memory transfer of a 3D array in CUDA Fortran. I have been looping through one of the array dimensions and using the cudaMemcpy2DAsync command to move that particular slice. This works but is pretty slow. I have additionally used CUDA streams to “parallelise” my loop but this still doesn’t really improve the performance.
The excellent CUDA Fortran for Scientists and Engineers book (2014) shows how to do pitched transfers of a 2D array using the cudaMemcpy2D command, and says that “There is also an analogous cudaMemcpy3D() routine for transferring three-dimensional array sections.” (page 52). However, I can’t get this command to work, and it seems it is not supported, but the book says it should be. Could you let me know how I can do this in 3D?
@MatColgrove I wonder if you might know?

1 Like

Hi as14_n,

Brent is probably a better person to help here, but he’s on vacation today. I’ll send him a note and he can take a look when his back tomorrow.

Though his post says: How to use cudaMemcpy3D and cudaMemcpy3DParms in Cuda Fortran

"We don’t really support cudaMemcpy3D in CUDA Fortran. It is very awkward to use. "

But he may have other suggestions.

If you have a minimal example which shows what you’re doing, that may help him offer improvements.


It may be faster to move the entire 3D array. How many extra elements will that make up?