array copy from device to device

bguler · November 2, 2010, 5:13pm

Hi,

I have a problem with a simple fortran cuda application. Below is the code. It works fine if I set the dimensions of the array small (i.e. Nb=2, Ns=3 in the below code). However, if I increase the dimensions to the below numbers, then it gives the following error:
0: copyover Memcpy <…> FAILED: 30

It looks like there is a problem during the copy of the device array. But I don’t understand it. Why do I have the problem when the array dimension is high only?

Can anyone suggest anything?

Thanks

Bulent

!==============================================!

module gpu
use cudafor

implicit none

real, device, dimension(:,:,:,:), allocatable :: vd,vdp
integer, device :: Ns_dev,Nb_dev

contains

attributes(global) subroutine v_kernel()
implicit none

integer :: i,j,k,s,b,n,q
real :: v1,v2
real, shared :: vtemp(768),vtemp2(768)

i=blockidx%y
j=blockidx%x
k=ceiling(real(j)/(Ns_devNb_dev))
s=ceiling(real(j-(k-1)Ns_devNb_dev)/Nb_dev)
b=j-(k-1)(Ns_dev*Nb_dev)-(s-1)*Nb_dev
j=threadidx%x

vtemp(j)=vdp(b,s,k,j)
call syncthreads()
vtemp2(j)=log(real(b+s+k+i-j))+vtemp(j)
call syncthreads()
vd(b,s,k,i)=maxval(vtemp2)
end subroutine v_kernel
end module

!==============================================================================!

program main
use cudafor
use gpu

implicit none

integer :: istat, idevice
integer :: j
type(dim3) :: dimGrid, dimBlock
real :: begin, finish, error
integer, parameter :: Nb=20,Ns=30,Nk=7,Ni=768

idevice=0

istat=cudaSetDevice(idevice)

dimGrid = dim3(NbNsNk,Ni,1)
dimBlock = dim3(Ni,1,1)

Ns_dev=Ns
Nb_dev=Nb

allocate(vd(Nb,Ns,Nk,Ni),vdp(Nb,Ns,Nk,Ni))

vdp=0
call cpu_time(begin)
do j=1,5
call v_kernel<<<dimGrid,dimBlock>>>()
istat = cudaThreadSynchronize()
vdp=vd
end do
istat = cudaThreadSynchronize()
call cpu_time(finish)
print*, ‘gpu time=’, finish-begin, ‘seconds’

deallocate(vd,vdp)

end program

MatColgrove · November 3, 2010, 12:30am

Hi Bulent,

I’m not sure what’s going on here. When I run the program, it seems to succeed some times and fails other times with:

0: copyout Memcpy (host=0x2abfc0ea9020, dev=0x5100000, size=12835200) FAILED: 4(unspecified launch failure)

“unspecified launch failure” typically means that a memory access violation occurred so we’ll need to look at your kernel to make sure your not getting an out-of-bound error. Though, I’m out of time for today. I’ll look again tommorrow.

Mat

Topic		Replies	Views
unspecified launch failure Legacy PGI Compilers	7	8971	February 11, 2011
CUDA FORTRAN ERROR Legacy PGI Compilers	1	1516	July 26, 2018
copyout Memcpy failed, unspecified launch failure Legacy PGI Compilers	3	5340	March 9, 2012
Confusion whilst copying from host to device Legacy PGI Compilers	2	2441	July 4, 2012
Apparent bug in Fortran device-to-host copies above 2GB Legacy PGI Compilers	2	2699	May 15, 2013
CUDA: copyout Memcpy FAILED: 4(unspecified launch failure) Legacy PGI Compilers	1	3802	June 6, 2014
Failure of the CUDA Fortran code during runtime Legacy PGI Compilers	1	2899	November 30, 2010
issue with 2d array copy back to host Legacy PGI Compilers	3	5956	September 19, 2014
Error: copyout Symbol Memcpy <...> FAILED: 30<unkno Legacy PGI Compilers	3	3031	October 24, 2011
CUDA fortran device array parameter Legacy PGI Compilers	5	7479	December 24, 2010

array copy from device to device

Related topics