Hello, I would like to ask about using unified memory in cuda fortran.
In calculation of 3D array, I want to extend array size 512^3 to 768^3.
But the memory of GPU is not enough for 768^3 computation.
(My GPU is Tesla P100 16GB, cuda version: 8.0, PGI compiler: 17.7, OS: Linux centOS)
So I set the array as managed array. And the brief of the code is written below.
Module cuda_kernel
Use cudafor
Contains
!Calculation of next time step phi as phi_out
attributes(global)subroutine calphi(—some variables—)
declare variables
Real(8),dimension(nx,ny,nz),device:: phi, phi_out
phi_out = phi + dt * ( ~~ )
return
end subroutine calphi
end module cuda_kernel
program main
use cudafor
use cuda_kernel
declare variables
Real(8),managed,allocatable(:,:,:):: phi, phi_out
<— I set 3 dimensional array as managed memory.
state = cudaSetDevice(0)
blocks = dim3(nx/16,nx/16,1)
threads = dim3(16,16,1)
allocate(phi(nx,ny,nz))
allocate(phi_out(nx,ny,nz))
set initial condition of phi
phi(i,j,k) = ~~~
state = cudaThreadSynchronize()
do timestep = 1,10000 —> iterative computation of time
call calphi<<<blocks,threads>>>(—some variables—)
state = cudaThreadSynchronize()
phi = phi_out —> update of array
state = cudaThreadSynchronize()
end do
deallocate(phi(nx,ny,nz))
deallocate(phi_out(nx,ny,nz))
state = cudaFree(phi)
state = cudaFree(phi_out)
stop
end program main
When I set nx=ny=nz=512, the calculation was successfully executed.
But the problem is, when I set nx=ny=nz=768, the calculation of phi_out was not executed in entire domain.
(I checked the initial condition of phi was defined in entire domain by using synchronization)
And next problem occurs when I update array.
There was no trouble in the case of nx=ny=nz=512, on the other hand, when this value is 768,
error related to memory occurred as below.
0: copyover Memcpy (dst=0x0x700200000, src=0x0x7d8200000, size=3623878656) FAILED: 77(an illegal memory access was encountered)
How can I handle these problems?
I really appreciate if you help me to solve these problems!