Hi Mat,
I set up a 2nd machine, with the same exact configuration and the same GPU (Tesla P4). However, I am running into a memory issue as seen in the error message below. The 1st machine is able to execute the code without any issues.
Here is the test code:
module vecaddmod
implicit none
contains
subroutine vecaddgpu( r, a, b, n )
real, dimension(:) :: r, a, b
integer :: n
integer :: i
!$acc kernels loop copyin(a(1:n),b(1:n)) copyout(r(1:n))
do i = 1, n
r(i) = a(i) + b(i)
enddo
end subroutine
end module
program main
use vecaddmod
implicit none
integer :: n, i, errs, argcount
real, dimension(:), allocatable :: a, b, r, e
character*10 :: arg1
argcount = command_argument_count()
n = 1000000000 ! default value
if( argcount >= 1 )then
call get_command_argument( 1, arg1 )
read( arg1, '(i)' ) n
if( n <= 0 ) n = 100000
endif
allocate( a(n), b(n), r(n), e(n) )
do i = 1, n
a(i) = i
b(i) = 1000*i
enddo
! compute on the GPU
call vecaddgpu( r, a, b, n )
! compute on the host to compare
do i = 1, n
e(i) = a(i) + b(i)
enddo
! compare results
errs = 0
do i = 1, n
if( r(i) /= e(i) )then
errs = errs + 1
endif
enddo
print *, errs, ' errors found'
if( errs ) call exit(errs)
end program
I compile the above code with
nvfortran -acc=gpu -fast -gpu=cc61,cuda12.2,managed -Minfo=accel -stdpar=gpu f1.F90
Thereafter, execution of the code yields the following error:
Out of memory allocating 4000000000 bytes of device memory
Failing in Thread:1
total/free CUDA memory: 7975862272/3861118976
Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 6.1, threadid=1
Hint: specify 0x800 bit in NV_ACC_DEBUG for verbose info.
host:0x7fa5bf74d020 device:0x7fa1d2000000 size:4000000000 presentcount:1+0 line:8 name:a(:n)
allocated block device:0x7fa1d2000000 size:4000000000 thread:1
Accelerator Fatal Error: call to cuMemAlloc returned error 2: Out of memory
File: f1.F90
Function: vecaddgpu:4
Line: 8
I am at a loss as to what is going on here. Any ideas? I wonder if the unified memory is not working because the same code works fine for a smaller value of n
.
Cheers,
Jyoti