cudaMemcpy fails copying ACC variable to CUF variable

rikisyo · August 7, 2013, 3:17pm

This sample code is to expose a problem when I am trying to make cudamemcpy between the device copy of an ACC variable and a CUF device variable:

program cuf2acc

use cudafor
use openacc

implicit none

integer, parameter :: DA=8

real, allocatable, dimension(:) :: sendbuf, recvbuf
real, device, allocatable, dimension(:) :: sendbuf_d, recvbuf_d

integer :: i

call acc_init( acc_device_default )

allocate(sendbuf(DA),recvbuf(DA))
allocate(sendbuf_d(DA),recvbuf_d(DA))

sendbuf=-2. ; recvbuf=-2.
sendbuf_d=1 ; recvbuf_d=-1.

!$acc data copy( sendbuf, recvbuf )
i = cudaDeviceSynchronize()
!$acc host_data use_device( sendbuf, recvbuf )
i = cudaMemcpy(sendbuf,sendbuf_d,DA,cudaMemcpyDeviceToDevice)
i = cudaMemcpy(recvbuf,recvbuf_d,DA,cudaMemcpyDeviceToDevice)
!$acc end host_data
!$acc end data

write(*,'("S: "8F8.2)') (sendbuf(i),i=1,DA)
write(*,'("R: "8F8.2)') (recvbuf(i),i=1,DA)

end program cuf2acc

It’s compiled by

$ pgf90 -o cuf2acc -Mcuda -acc -ta=nvidia,cc20 cuf2acc.f90

When it runs it always gives an error:

0: copyover Memcpy (dst=0xa15d80, src=0xb00200000, size=32) FAILED: 11(invalid argument)

It seems that the dst is a host address. So I try removing the “host_data” construct around the memcpy, and the error is the same. It seems to me that the host_data construct never exposes the device address of the ACC variables. Why is it so and how can I get around this?

MatColgrove · August 7, 2013, 5:36pm

Hi rikisyo,

Are you trying to test the “host_data” directive or just need to copy the data between the device and host? The “host_data” directive isn’t supported yet in Fortran and is why this example is failing.

If you just need to copy the data, then the simple method is to just assign the host to device array. No need for OpenACC at all.

program cuf2acc

use cudafor
use openacc

implicit none

integer, parameter :: DA=8

real, allocatable, dimension(:) :: sendbuf, recvbuf
real, device, allocatable, dimension(:) :: sendbuf_d, recvbuf_d

integer :: i

call acc_init( acc_device_default )

allocate(sendbuf(DA),recvbuf(DA))
allocate(sendbuf_d(DA),recvbuf_d(DA))

sendbuf=-2. ; recvbuf=-2.
sendbuf_d=1 ; recvbuf_d=-1.

sendbuf=sendbuf_d
recvbuf=recvbuf_d

write(*,'("S: "8F8.2)') (sendbuf(i),i=1,DA)
write(*,'("R: "8F8.2)') (recvbuf(i),i=1,DA)

end program cuf2acc

Mat

rikisyo · August 8, 2013, 2:16pm

Thanks for the reply!

My original purpose is not just copying data back-and-forth between the host and the device, but to use CUDA-aware MPI calls to do face data exchange of my multiblock CFD code. Particularly, I would be extremely happy if there is a supported way to do “CUDA-AWARE MPI AND OPENACC” as claimed in the following webpage:

https://developer.nvidia.com/content/benchmarking-cuda-aware-mpi

If host_data is not supported then there is no way these MPI API can work with ACC variables, much for the same reason as the cudaMemcpy failure. Is there a way to safely expose the device address so that cudaMemcpy can work?

MatColgrove · August 8, 2013, 2:34pm

Hi rikisyo,

First let me clarify that the host_data directive is supported in C and under development in Fortran.

In reading the article, it seems to me that you would simply pass in the CUDA Fortran device array. Depending on the MPI implementation, you may need to write an interface to the MPI_SENDRECV buffer.

Mat

Topic		Replies	Views
memcpy for arrays allocated on the device in openacc Legacy PGI Compilers	1	1951	November 27, 2017
OpenACC C/CUDA error Legacy PGI Compilers	2	4964	October 30, 2013
call to cuMemHostRegister returned error 1 CUDA Programming and Performance	0	653	March 5, 2013
cuMemcpyDtoH error Legacy PGI Compilers	12	25519	January 26, 2010
Data transfer from device to host Legacy PGI Compilers	2	3798	July 10, 2014
A simple memcpy implementation in openacc Legacy PGI Compilers	1	4386	April 17, 2015
memcpy problem CUDA Programming and Performance	5	2782	May 10, 2008
How to create data on device Legacy PGI Compilers	2	3978	December 25, 2014
Simple (?) question on OpenACC host_data usage Legacy PGI Compilers	1	2897	January 7, 2014
program crash when copying from device to host <br /> CUDA Programming and Performance	11	1949	March 31, 2009

cudaMemcpy fails copying ACC variable to CUF variable

Related topics