cudaMemcpy2DAsync run-time error


I tried to use Asynchronous 2d data transfer in CUDA Fortran. Even though it does not give neither compile-time nor run-time error, the code did not give the expected results. For comparison I have sent both asynchronous and synchronous code. I could not find the problem, could you help? (2.2 KB) (2.0 KB)

cuda02 is asynchronous, cuda03 is normal.

! Expected result !
81.30000000000001 162.6000000000000 243.9000000000000
81.30000000000001 162.6000000000000 243.9000000000000

Is cudaMemcpy2DAsyn available for Fortran?

Thank you.

This is pretty difficult to debug, there might be a few issues. But, it appears that the cudaMemcpy2DAsync is working. The expected line of output shows up in a different part of the final array. While debugging, you can swap out the API call and do something like:

Also, the source and destination pitches should not be LDA as you have defined them. They should be the actual first dimension of the arrays.

I could not find the problem in the code, as you say it does not give any error related to the asynchronous data transfer. (2.5 KB)

Is there any code that I can look for the correct way of doing it for cudamemcpy2dasync? I could not find any for it.

Thank you for your answer,

Here, the assignment uses cudaMemcpy2D, the API call uses cudaMemcpy2DAsync
program testmemcpy2d
use cudafor
integer, parameter :: n1 = 50, n2 = 60, n3 = 70

integer, allocatable, pinned :: b_h(:,:,:)
integer, allocatable, device :: b_d(:,:,:)
integer(cuda_stream_kind) :: istrm

istat = cudaStreamCreate(istrm)

b_h = 0
b_d = -99
b_h(:, 1:n2/2, :) = b_d(:, n2/2+1:n2, :)
print *, all(b_h(:,1:n2/2,:) .eq. -99)
print *, all(b_h(:,n2/2+1:n2,:) .eq. 0)

b_h = 0
istat = cudaMemcpy2dAsync(b_h(1,1,1), n1n2, b_d(1,n2/2+1,1), n1n2, n1*n2/2, n3, s
istat = cudaStreamSynchronize(istrm)
print *, all(b_h(:,1:n2/2,:) .eq. -99)
print *, all(b_h(:,n2/2+1:n2,:) .eq. 0)
end program

