Acc_malloc() in Fortran to avoid host allocation

gjt · April 6, 2022, 10:52pm

Hi,

I am trying to use acc_malloc() in a Fortran code to avoid host allocations for scratch variables on the GPU. So far my attempts have failed. Here is my latest attempt:

real(ESMF_KIND_R8), allocatable:: copyArray(:,:,:)
type(c_devptr)  :: dev_copyArray
...
dev_copyArray = acc_malloc(size)
call acc_map_data(copyArray, dev_copyArray, size)
...
!$acc data present(copyArray)
!$acc kernels
...
!$acc end kernels
!$acc end data

This fails with

Failing in Thread:1
call to cuStreamSynchronize returned error 700: Illegal address during kernel execution

However, when I explicitly allocate the copyArray on the host side, the same code works, but of course then it isn’t any different than using the create data clause.

Is it possible to avoid host allocation for GPU scratch arrays?

-Gerhard

MatColgrove · April 7, 2022, 3:58pm

Hi Gerhard,

Can you post a more complete example? This should work as expected (see my example below), so I suspect the error is coming from something in the omitted code, such as using “dev_copyArray” in the kernels region.

Here an example:

% cat test.F90

program foo
use iso_c_binding
use cudafor
use openacc
real(8), allocatable:: copyArray(:,:,:)
type(c_devptr)  :: dev_copyArray
integer :: nx,ny,nz,size,i,j,k

nx=32
ny=32
nz=32
size = nx*ny*nz*8
allocate(copyArray(nx,ny,nz))
dev_copyArray = acc_malloc(size)
call acc_map_data(copyArray, dev_copyArray, size)
!$acc data present(copyArray)
!$acc kernels
do k=1,nz
   do j=1,ny
      do i=1,nx
         copyArray(i,j,k)=1.0
      enddo
   enddo
enddo
!$acc end kernels
!$acc update self(copyArray)
!$acc end data
print *, copyArray(1:2,1,1)

end program foo
% nvfortran -acc test.F90 -cuda -V22.3; a.out
    1.000000000000000         1.000000000000000

Also, since you’re using CUDA Fortran features anyway, you should be able to simplify things by making “dev_copyArray” a device array. Especially if you’re using “dev_copyArray” in the kernel region.

% cat test.cuf


program foo
use iso_c_binding
use cudafor
use openacc
real(8), allocatable:: copyArray(:,:,:)
real(8), allocatable, device :: dev_copyArray(:,:,:)
integer :: nx,ny,nz,i,j,k

nx=32
ny=32
nz=32
allocate(copyArray(nx,ny,nz))
allocate(dev_copyArray(nx,ny,nz))
!$acc kernels
do k=1,nz
   do j=1,ny
      do i=1,nx
         dev_copyArray(i,j,k)=1.0
      enddo
   enddo
enddo
!$acc end kernels
copyArray=dev_copyArray
print *, copyArray(1:2,1,1)

end program foo
% nvfortran -acc test.cuf -cuda -V22.3 ; a.out
    1.000000000000000         1.000000000000000

-Mat

gjt · April 11, 2022, 9:00pm

Hi Mat,

Thank you for your help! Sorry for taking some time to get back to this.

It then looks from your first example at least that I must provide host allocation before I can use acc_map_data(). I had thought I can get around host allocation in that case. From your second example I think I understand that what I was trying to do would be possible with a “device array”. That is good info. Thank you.

I am exploring several options here, and have a new question with managed memory. I will start a new thread for that. Thanks again!

-Gerhard

SkyCool · August 8, 2024, 8:16am

Hello, administrator.
In the fitst example. if the command “allocate” is executed before “!acc data”, why don’t use directly the command “!$acc data copyin (copyArray)”.

MatColgrove · August 8, 2024, 4:49pm

Hi SkyCool,

You could certainly do this, but the original question was on how to use acc_malloc and map this device memory to a host array. Mapping and unmapping device data allows for reuse of the device data, in this case scratch memory. Though in Fortran, its easier to use the CUDA Fortran “device” attribute to create device-only data.

-Mat

Topic		Replies	Views
Fortran allocatable array creation&use only on gpu Legacy PGI Compilers (archived)	1	1507	March 1, 2019
How to create data on device Legacy PGI Compilers (archived)	2	4014	December 25, 2014
memcpy for arrays allocated on the device in openacc Legacy PGI Compilers (archived)	1	1987	November 27, 2017
OpenACC create and pcreate clauses need host allocation? Legacy PGI Compilers (archived)	3	5181	February 4, 2014
Clear Gpu memory Legacy PGI Compilers (archived)	1	1464	March 8, 2013
cudaMemcpy fails copying ACC variable to CUF variable Legacy PGI Compilers (archived)	3	3398	August 8, 2013
OpenAcc not allocating memory on GPU Legacy PGI Compilers (archived)	3	2805	August 1, 2012
allocatable arrays inside device data structures Legacy PGI Compilers (archived)	5	7407	August 10, 2017
How to use c_f_pointer in OpenACC nvc, nvc++ and nvfortran	5	1275	February 5, 2021
error with acc_map_data Legacy PGI Compilers (archived)	1	3069	August 10, 2015

Acc_malloc() in Fortran to avoid host allocation

Related topics