cudaMemcpy2DAsync run-time error

Hi,

I tried to use Asynchronous 2d data transfer in CUDA Fortran. Even though it does not give neither compile-time nor run-time error, the code did not give the expected results. For comparison I have sent both asynchronous and synchronous code. I could not find the problem, could you help?

cuda02.zip (2.2 KB)

cuda03.zip (2.0 KB)

cuda02 is asynchronous, cuda03 is normal.

! Expected result !
!!!
81.30000000000001 162.6000000000000 243.9000000000000
81.30000000000001 162.6000000000000 243.9000000000000
!!!

Is cudaMemcpy2DAsyn available for Fortran?

Yunus,
Thank you.

This is pretty difficult to debug, there might be a few issues. But, it appears that the cudaMemcpy2DAsync is working. The expected line of output shows up in a different part of the final array. While debugging, you can swap out the API call and do something like:
B(1+OFFSET:LDA+OFFSET,1:NUMPPV) = B_D(1+OFFSET:LDA+OFFSET,1:NUMPPV)

Also, the source and destination pitches should not be LDA as you have defined them. They should be the actual first dimension of the arrays.

I could not find the problem in the code, as you say it does not give any error related to the asynchronous data transfer.

cuda024.zip (2.5 KB)

Is there any code that I can look for the correct way of doing it for cudamemcpy2dasync? I could not find any for it.

Thank you for your answer,
Yunus

Here, the assignment uses cudaMemcpy2D, the API call uses cudaMemcpy2DAsync
program testmemcpy2d
use cudafor
integer, parameter :: n1 = 50, n2 = 60, n3 = 70

integer, allocatable, pinned :: b_h(:,:,:)
integer, allocatable, device :: b_d(:,:,:)
integer(cuda_stream_kind) :: istrm

allocate(b_h(n1,n2,n3))
allocate(b_d(n1,n2,n3))
istat = cudaStreamCreate(istrm)

b_h = 0
b_d = -99
b_h(:, 1:n2/2, :) = b_d(:, n2/2+1:n2, :)
print *, all(b_h(:,1:n2/2,:) .eq. -99)
print *, all(b_h(:,n2/2+1:n2,:) .eq. 0)

b_h = 0
istat = cudaMemcpy2dAsync(b_h(1,1,1), n1n2, b_d(1,n2/2+1,1), n1n2, n1*n2/2, n3, s
tream=istrm)
istat = cudaStreamSynchronize(istrm)
print *, all(b_h(:,1:n2/2,:) .eq. -99)
print *, all(b_h(:,n2/2+1:n2,:) .eq. 0)
end program

1 Like