call to cuMemFreeHost returned error 700: Illegal address du

Hi,

I try to use OpenACC to port a fortran 90 code to GPU. The code uses very complex data construction. A typical do-loop likes

!$acc data copy(tmp(:,:,:)) copyin(var(1:ni,1:nj,1:nk), el%metric%detJ(1:ni,1:nj,1:nk))         
!$ACC PARALLEL LOOP                                                                             
    do k=1,nk
       do j=1,nj
          do i=1,ni
            tmp(i,j,k) = var(i,j,k)*el%metric%detJ(i,j,k)
          enddo
       enddo
    enddo
!$acc end data

where variables"tmp", “var”, and el%metric%detJ(1:ni,1:nj,1:nk) like

real(kind=dp), allocatable, dimension(:,:,:)  :: tmp, var
real(kind=dp), allocatable, dimension(:,:,:)  :: detJ

When use the PGI v18.10.0 to compile the code, I got some messages likes

  1546, Generating copyin(el%metric%detj(1:ni,1:nj,1:nk),var(:ni,:nj,:nk))
         Generating copy(tmp(1:ni,1:nj,1:nk))
   1547, Accelerator kernel generated
         Generating Tesla code
       1548, !$acc loop gang ! blockidx%x
       1549, !$acc loop seq
       1550, !$acc loop vector(128) ! threadidx%x
   1547, Generating implicit copy(el)
   1549, Loop is parallelizable
   1550, Loop is parallelizable

and it crashed when run it on a P100 node with following error

    1546: data region reached 1 time
        1546: data copyin transfers: 5
             device time(us): total=57 max=19 min=4 avg=11
    1547: data region reached 1 time
        1547: data copyin transfers: 2
             device time(us): total=11 max=7 min=4 avg=5
    1547: compute region reached 1 time
        1547: kernel launched 1 time
            grid: [1]  block: [128]
             device time(us): total=0 max=0 min=0 avg=0
call to cuMemFreeHost returned error 700: Illegal address during kernel execution

Is the error due to the data type of “el”? Why does the compiler

Generating implicit copy(el)

?


type(compElement),intent(inout),target :: el

Thanks for help!

You are almost certainly having a deep copy problem here. If all the data is allocatable, you might try compiling with -ta=tesla:managed. If you want to explicitly manage the data, have a look at some of our blog posts on deep copy:

Hi Brent,
Thanks for your information.

-ta=tesla:managed.

seems to be strictly for the code. Even I compile pure CPU version of the code with the flag, it crashes. So I have to fix memory issues in CPU version firstly.

Regards, /JG