device array values modified without reason (garbage?)

Hi everybody

I am experiencing a problem with a device array. I declare it in a module as follows

real, allocatable, save,        device  ::    d_coefx(:)

In the module, there is a subroutine that copies the host array to the device like that

d_coefx(:)=p%modeling%coefx(:)

after that, in the same subroutine, a kernel is executed within a loop

time: do nstep = 1,p%nt
   call kernel<<<dimGrid, dimBlock>>>(p%modeling%order, p%nmin1, p%nmax1, p%nmin2, p%nmax2, d_p2, &
		d_p3, d_v2, d_coefx)
        xc(:)=d_coefx(:)
  	if(nstep .ge. 600 .and. nstep .le. 605)then
           print*,'nstep',nstep
           print*,xc
  	end if
end do time

the values of the device array are not modified inside the kernel, they are just read. Then, they are copied to a host array named xc and the array is printed. In step 603 one of the values is changed, in step 604 two values are changed, and finally, in step 605 three values are changed. This is the output

step          600
  -3.6168982E-05   5.7870371E-04  -1.0850694E-03   5.7870371E-04 
  -3.6168982E-05
step          601
  -3.6168982E-05   5.7870371E-04  -1.0850694E-03   5.7870371E-04 
  -3.6168982E-05
 step           602
  -3.6168982E-05   5.7870371E-04  -1.0850694E-03   5.7870371E-04 
  -3.6168982E-05
 step            603
  -0.4724865    5.7870371E-04  -1.0850694E-03   5.7870371E-04 
  -3.6168982E-05
 step            604
  -0.4724865      -0.4724865           -1.0850694E-03   5.7870371E-04 
  -3.6168982E-05
step             605
  -0.4724865      -0.4724865        495.1141       5.7870371E-04 
  -3.6168982E-05

I really don’t understand where come from this values since the array is only used in that kernel and none overwrite operation is done with it. Does someone have a tip to try to catch the error?
Before of step 600 the values are kept without modifications

Hi oscar_ml,

Unfortunately, there’s not a lot to go on here, but my first guess would be to look for an out-of-bounds write in your kernel. It’s interesting that the bad values are being progressively added between steps 603 and 605 (does it continue for steps 606 and 607?). Since you state that d_coefx is read-only on the device, that’s why I’m thinking that there’s some type of write to another variable that’s crossing over it’s bounds and writing into d_coefx.

Of course, this is just a guess. Are you able to post or send to PGI Customer Service (trs@pgroup.com) a reproducing example? Myself or another PGI person will take a look and see if we can find the error.

-Mat

It’s interesting that the bad values are being progressively added between steps 603 and 605 (does it continue for steps 606 and 607?).

yes it does

isnap            602
-3.6168982E-05   5.7870371E-04  -1.0850694E-03   5.7870371E-04 
  -3.6168982E-05
 isnap            603
  -0.4724865       5.7870371E-04  -1.0850694E-03   5.7870371E-04 
  -3.6168982E-05
 isnap            604
  -0.4724865      -0.4724865      -1.0850694E-03   5.7870371E-04 
  -3.6168982E-05
 isnap            605
  -0.4724865      -0.4724865        495.1141       5.7870371E-04 
  -3.6168982E-05
 isnap            606
  -0.4724865      -0.4724865        495.1141        495.1141     
  -3.6168982E-05
 isnap            607
  -0.4724865      -0.4724865        495.1141        495.1141     
    495.1141



Since you state that d_coefx is read-only on the device, that’s why I’m thinking that there’s some type of write to another variable that’s crossing over it’s bounds and writing into d_coefx.

How can I test your hypothesis if the -Mbounds flag is disabled by cuda fortran?

Of course, this is just a guess. Are you able to post or send to PGI Customer Service (> trs@pgroup.com> ) a reproducing example? Myself or another PGI person will take a look and see if we can find the error.

OK. I’ll try to write a small example.

How can I test your hypothesis if the -Mbounds flag is disabled by cuda fortran?

Correct, there’s no bounds checking support on the device. You’ll need to manually inspect the code or run in under the cuda-gdb debugger.

OK. I’ll try to write a small example.

That would be very helpful.

Thanks!
Mat

Correct, there’s no bounds checking support on the device. You’ll need to manually inspect the code or run in under the cuda-gdb debugger.

cuda-gdb didn’t help much

cuda-gdb --args ./exe input2.dat
NVIDIA (R) CUDA Debugger
8.0 release
Portions Copyright (C) 2007-2016 NVIDIA Corporation
GNU gdb (GDB) 7.6.2
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Lendo símbolos de /home/oscar/�rea de Trabalho/yemoja_backup/shell-psofwi/src/pso_dvtypes/mpi/grid/cuda/cuda2/reproducible/exe...concluído.
(cuda-gdb) r
Starting program: /home/oscar/�rea de Trabalho/yemoja_backup/shell-psofwi/src/pso_dvtypes/mpi/grid/cuda/cuda2/reproducible/./exe input2.dat
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
fatal:  All CUDA devices are used for display and cannot be used while debugging. (error code = CUDBG_ERROR_ALL_DEVICES_WATCHDOGGED(0x18)
(cuda-gdb) bt
Target is executing.
(cuda-gdb)

I already send you a reproducing example

Hi Mat

Did you have time to look at my code; keep me informed of everything as soon as you discover anything.

PS: I hope I am not making a silly mistake

Thanks