equivalent to cudaMemcpy2D for copying submatrices?

bertoc · March 27, 2019, 2:21am

In CUDA, there is cudaMemcpy2D, which lets you copy a 2D sub-matrix of a larger matrix on the host to a smaller matrix on the device (or vice versa).

For instance, say A is a 6x6 matrix on the host, and we allocated a 3x3 matrix B on the device previously. cudaMemcpy2D lets you copy a 3x3 submatrix of A, defined by rows 0 to 2 and columns 0 to 2 to the device into the space for B (the 3x3 matrix was already allocated on the device).

To the best of my understanding based on the specs, OpenACC does not have anything similar. We would have to copy the non-contiguous slices of the matrix individually, right?

Thanks.

MatColgrove · March 27, 2019, 2:59pm

Hi bertoc,

In OpenACC the host and device copies of the arrays need to match in size and shape. So you can’t have a host 6x6 matrix map to a device 3x3 matrix. Though you can have a mirrored copy of A and then only copy the 3x3 portion to the device. Something like:

real, dimension(:,:), allocatable :: A
allocate(A(6,6))
A=..something

!$acc enter data copyin(A(2:4,1:3))

Alternatively, you can have B be the device array

real, dimension(:,:), allocatable :: A, B
allocate(A(6,6))
allocate(B(3,3)
A=..something
B(1:3,1:3) = A(2:4,1:3)

!$acc enter data copyin(B)

Keep in mind that OpenACC is device agnostic so implementation details such as if cudaMemcpy2D is used shouldn’t be introduced in your program since it may be different when targeting other devices such as multicore CPU.

If you do want to use cudaMemcpy2D explicitly, then you’d want to mix in CUDA Fortran by making B a “device” matrix. Though, you’ll no longer be portable.

-Mat

bertoc · March 27, 2019, 6:56pm

Thank you for the clear response!