Hi! I have two codes performing the same operation (calculation of dt value for each spatial cell and 2D parallel reduction in order to find the minimum value):
- The first one performs correctly. In particular, it reads 3 arrays (h, u and v) in order to calculate the dt for each cell and then performs the reduction;
- The second one gives randomly the error stated in the subject. This code differs from the first one in having a part which calculates the h, u and v arrays from others data (given in input) and some other parts calculating the evolution of the variables h, u and v during the time.
Since I can’t share the second code right now, I’d like to know where the copyout error could come from.
Thanks in advance,
Is this CUDA Fortran? What version of the compiler are you using? How are h, u, and v declared? Are they module variables that are being passed into the kernel?
In looking through our problem reports, I see one similar error in TPR#16767. Here there was a problem passing module device variables as arguments. This error was fixed in the 11.0 release.
the code was written in CUDA FORTRAN and compiled with PGI Accelerator compiler updated to v. 11.9. In order to answer the other questions, I made a simpler independent code: this one reproduces the error on my machine every time I attempt to run it.
Note that this code contains only a portion of the code I’m supposed to port into CUDA FORTRAN. In fact, it reads the arrays from files compiled by the “original code” and it stops just after a couple of subroutines (the “original code” has more subroutines).
As the usual, I uploaded the VS 2008 project on MediaFire:
Thanks for you help,
It looks like I found the problem: I had a shared temporary array which was used to store the value of a variable that (later on in the code) is copied to the host from the device. It seems this operation gives the error reported in the thread’s title.
Hope this helps someone.