cuMemcpyDtoH error

RobertsGroup · January 11, 2010, 8:06pm

Hi,

I’m trying to use the accelerator directives on a simple loop, but the compilet gives this error:

call to cuMemcpyDtoH returned error 700: Launch failed

The code that I’m using with the !$acc directive is

!$acc region copyin(r_c(1:resids,1:3,1:2),r_cb(1:resids,1:3,1:2)),&
!$acc copyin(epsij(1:20),dat1(1:resids,1)),&
!$acc copyout(Ener)
do j=1,resids
Vbb=0.0D0
Vhp=0.0D0
mol1=dat1(i,1)
mol2=dat1(j,1)

if (mol1.ne.10) then
dx=(r_cb(i,1,1)-r_c(j,1,2))
dy=(r_cb(i,2,1)-r_c(j,2,2))
dz=(r_cb(i,3,1)-r_c(j,3,2))
r=(dx2+dy2+dz**2)0.50D0
sigma=(s_c+s_cb)/2.0D0
rc=sigma*2.0D0(1.0D0/6.0D0)
if (r.le.rc) then
rr=(sigma/r)6
Vbb=4.0D0epsbb&
(rr2-rr+0.250D0)
end if
end if

if ((mol1.ne.10).and.(mol2.ne.10)) then
dx=r_cb(i,1,1)-r_cb(j,1,2)
dy=r_cb(i,2,1)-r_cb(j,2,2)
dz=r_cb(i,3,1)-r_cb(j,3,2)
r=(dx2+dy2+dz**2)**0.5000
rr=(s_cb/r)6
rc=s_cb*2.0(1.0/6.0)
eij=(epsij(mol1)*epsij(mol2))**0.50D0

if (r.le.rc) then
Vhp=4.0D0epshp(rr2-rr)+&
epshp*(1.0D0-eij)
else
Vhp=4.0D0epshpeij*(rr2-rr)
end if

end if

Ener(j)=Vhp+Vbb

end do
!$acc end region

I don’t understand why it gives me that error, since the arrays are small (the variable resids is not bigger than 10), and my GPU has 1.5GB of memory. Could you help me with this problem?

Thanks,
Marco

MatColgrove · January 11, 2010, 8:39pm

Hi Marco,

Would it possible for you to send me an example code which exhibits this behavior? Is so, please send a report to PGI Customer Service (trs@pgroup.com) and ask them to send it to me.

The error “cuMemcpyDtoH” means there was a failure in copying from the device to the host. It could actually be an error in the copy (for example if Ener’s size is smaller then resids) but could also mean the kernel itself has an error. Most likely it’s a problem with the compiler, but I’ll need a full example to tell.

Thanks,
Mat

RobertsGroup · January 14, 2010, 4:04pm

Hi Mat,
I’ve been checking the code, but I haven’t found any mistake in terms of vector sizes or undeclared varaibles. I sent the sample code the same day that you asked me, have you checked it? Is it a problem with the compiler?

Thanks,
Marco

MatColgrove · January 14, 2010, 4:36pm

Hi Marco,

I went through all of TRS mail back til the 10th, but don’t see any messages from you. It’s possible that it got stopped by the corporate spam filter or the attachment was too big. I’ll send you a email directly.

Mat

MatColgrove · January 14, 2010, 6:42pm

Hi Marco,

Thank you for the code. This does appear to be compiler error where a bad value is being used when initializing the cached copies of a variable. I have submitted this problem to our engineers as TPR#16500.

The error does appear to have been found and fix in our internal development compiler and I have requested that this fix be add to our next release (10.2) due at the beginning of February. The work around to this issue is to use the flag “-ta=nvidia,oldcg”.

Best Regards,
Mat

RobertsGroup · January 14, 2010, 8:44pm

Thanks Mat. Using the “-ta=nvidia,oldcg” flag the code runs correctly. However, I stil have a problem. That piece of code is just a subroutine in a main code. When I copy that subroutine, with the same accelerator directives, it gives me another error:

“call ctxSynchronize returned error 700: Launch failed”

What does that error mean?

Thanks,
Marco

MatColgrove · January 14, 2010, 9:28pm

Hi Marco,

It’s a generic error so could be caused by a number of things. Typically though I’ve seen it when there was a seg fault copying the data to the device or a seg fault in the kernel.

Mat

RobertsGroup · January 14, 2010, 9:43pm

Mat,
How it could happend if the seg runs perfectly with the acc directive when it is isolated, and it is the only seg in the code that uses an accelerator region.

Could you give me any hint to solve that issue?

Thanks,
Marco

MatColgrove · January 14, 2010, 11:44pm

Array bounds violation? Feel free to send me the full source if you’re able.

RobertsGroup · January 15, 2010, 3:35pm

Thanks Mat. I’d appreciate if you could help me with that, since I don’t understand why it gives me that error if the subroutine runs perfectly when I copy it to a different project. I’ll send both codes (the full code which gives me the error, and the code with just the subroutine) to the same email.

Thanks again,
Marco

Karloss · January 20, 2010, 12:16pm

Hi Marco,

I have had an error with similar behavior, I managed to bypass/fix it by declaring the inner (non-parallel) loops of my kernel as sequential using !$acc do seq.

Good luck!

Karl

Tuan · January 26, 2010, 7:19pm

May I ask you what is “oldcg” for?

Tuan

MatColgrove · January 26, 2010, 10:21pm

Hi Tuan,

“oldcg” is being used a work around for a bug in the “newcg”. “cg” stands for code-generator. As of 10.0, we added new code generator targeting the NVIDIA GPU. Unfortunately, like may new features, there are bugs. In this case, the bug did not occur in the old code generator from the 9.0 release. Note that “oldcg” flag is not documented and will eventually go away.

Mat

Topic		Replies	Views
cuMemcpyDtoH error 99 Legacy PGI Compilers	3	3605	May 29, 2012
Confusing fortran accelerator problem Legacy PGI Compilers	6	17721	February 4, 2010
APM PGI 10.5 - !$acc do kernel Legacy PGI Compilers	1	4507	May 17, 2010
cuCtxSynchronize error 700 Legacy PGI Compilers	2	11058	August 26, 2009
call to cuMemcpyDtoHAsync returned error 1: Invalid value Legacy PGI Compilers	1	3371	August 8, 2013
Launch fails for sparse matrix-vector multiplication Legacy PGI Compilers	8	9481	August 4, 2010
Launch failed error Legacy PGI Compilers	8	11218	September 13, 2013
call to cuMemcpy2D error 700 Legacy PGI Compilers	4	11004	March 24, 2010
cuMemcpyDtoHAsync error when using OpenACC directives Legacy PGI Compilers	4	7408	June 16, 2014
update clause gives error Legacy PGI Compilers	1	1758	March 20, 2012

cuMemcpyDtoH error

Related topics