How to use pinned memory for 3D arrays in CUDA?

Body:

Hello CUDA community,

I am working on optimizing data transfer between CPU and GPU for 3D arrays in CUDA. I want to use pinned (lock-page) memory to speed up the data transfer process. Here is my approach, and I would like to confirm if my understanding is correct:

  1. Allocate pinned memory: First, I plan to allocate pinned memory with the same size as my 3D arrays.
  2. Copy CPU arrays to pinned memory: Then, I will copy the data from the regular CPU arrays to the allocated pinned memory.
  3. Transfer data to GPU: Finally, I will transfer the data from the pinned memory to the GPU using cudaMemcpy.

Here is the code I currently have:

fortran

复制代码

istat = cudaMemcpy(AXDENS1, AXDENS, 64*31*5)
istat = cudaMemcpy(AXXMOM1, AXXMOM, 64*31*5)
istat = cudaMemcpy(AXYMOM1, AXYMOM, 64*31*5)
istat = cudaMemcpy(AXENER1, AXENER, 64*31*5)

Is this the correct approach for using pinned memory with 3D arrays in CUDA? Should I first copy the data to pinned memory and then transfer it to the GPU?

Thank you in advance for your help!


Additional Info:

  • CUDA version: [specify version]
  • GPU model: [specify GPU model]
  • OS: [specify OS]

I usually suggest that questions pertaining to CUDA Fortran be posted in the subforum for nvfortran.

Perhaps you can somehow pin the regular CPU arrays in the first place.