Asynchronous Memory Copy in CUDA Fortran

TheMatt · June 3, 2010, 12:58pm

Folks,

I was wondering if anyone has some experience/examples of using asynchronous memcpy with CUDA Fortran? At the moment, a program I have has a structure like this:

compute Aerosol Arrays
copy All Device Arrays included Aerosol Arrays to device
copy Constant Data to device
execute Kernel

The issue is that the compute Aerosol Arrays step can be quite long and I figure why not try and overlap as much of the memory copy with that step that I can. In truth, a good chunk of the data copied to the device are those Aerosol Arrays, but, well, every little bit is nice (plus I can learn for the future).

From what I can glean from the CUDA Fortran guides, I assume I’ll have to use the API calls since I don’t think the implicit memory copies are asynchronous. Is this correct?

If so, that’s why I thought I’d ask for examples while I stumble through the cudaStreamCreate, cudaMemcpyToSymbolAsync, etc.

Matt

MatColgrove · June 3, 2010, 8:07pm

Hi Matt,

Although I haven’t done it myself, you should be able to use the CUDA API calls to accomplish this. Though, I don’t have an example (Sorry).

We’re currently working on expanding the CUDA Fortran language to define this asynchronous behavior. Unfortunately, it doesn’t fit well into the current Fortran syntax so well most likely need to add an extension.

Mat

TheMatt · June 4, 2010, 6:11pm

Hmm. Okay. Do you have any examples showing the allocation/copy process using the API calls?

I ask mainly for the 2D and larger arrays. I figure cudaMalloc and cudaMemcpy are pretty simple since 1D is 1D Fortran or C. But when one starts getting into the 2D realm, I’m wondering do you have to use cudaMallocPitch/cudaMemcpy2D (since Fortran arrays usually don’t act like C “arrays”)?

ETA: Never mind, I figured this out (essentially it does what a padded array version of a program I wrote does). I’m next going to start new topic on 3D arrays since that’s all new to me.

Topic		Replies	Views
how to use cudamemcpy3dasync? Legacy PGI Compilers	4	3669	April 16, 2012
Async GPU Data Tranfer with CUDA Fortran Legacy PGI Compilers	1	1943	January 31, 2011
How to use cudaMemcpy3D and cudaMemcpy3DParms in Cuda Fortran nvc, nvc++ and nvfortran	1	694	November 2, 2022
Fortran + C + CUDA CUDA Remapping Fortran arrays to C fashion? CUDA Programming and Performance	6	2072	January 12, 2010
CUDA Fortran and CUDA API 3D Arrays Legacy PGI Compilers	4	7689	October 5, 2011
cudaMemcpy3Dasync - explicit 3d array copy Legacy PGI Compilers	1	1070	February 7, 2019
Combining cudaMallocPitch() with Asynchronous Transfers in CUDA Fortran Post Content: nvc, nvc++ and nvfortran	1	22	December 31, 2024
CUDA Fortran 3D pitched memory transfers nvc, nvc++ and nvfortran cuda	2	411	May 16, 2023
Quickest way to transfer arrays from Fortran to CUDA device? CUDA Programming and Performance	0	557	March 7, 2011
Strong typing and memory copy Legacy PGI Compilers	7	13310	March 29, 2010

Asynchronous Memory Copy in CUDA Fortran

Related topics