CopyOut error by undeclared var. in device subroutine

Using PGI Visual Fortran 2008 v10.9 on Win7

I made a mistake by forgetting to declare an input variable in global device subroutine. The compiler would successfully compile if that input variable is of REAL type (by default). After calling the subroutine with a device array as input, copying out would result in error and program exit. This wouldn’t happen if I had put “IMPLICIT NONE” in the device subroutine.

This is the code that could produce the error. It doesn’t really do anything as I’m just trying to find out what’s wrong.

      MODULE CUDADEV
      USE CUDAFOR
     
      PARAMETER (ISIZE=4,JSIZE=10)     
      CONTAINS
     
      ATTRIBUTES(GLOBAL) SUBROUTINE ABCD(M,THREAL,THINT)
      REAL,DEVICE :: M(ISIZE,JSIZE)
      INTEGER :: I,J
!     NO DECLARATION OF INPUT VARIABLES : THREAL, THINT
     
      I = (blockidx%x-1) * 16 + threadidx%x
      J = (blockidx%y-1) * 16 + threadidx%y
     
      IF (I <= ISIZE .AND. J <= JSIZE) THEN
        IF (M(I,J) > THREAL) THEN
          M(I,J) = -1
        END IF
        IF (J == THINT) THEN
          M(I,J) = 1
        END IF
      END IF
      
      RETURN
      END SUBROUTINE
     
      END MODULE
     
      PROGRAM MAIN
      USE CUDAFOR
      USE CUDADEV
     
      REAL,DEVICE :: Mdev(ISIZE,JSIZE)
      REAL :: M(ISIZE,JSIZE)
     
      DO I=1,MIN(ISIZE,JSIZE)
        M(I,I) = 5
      END DO
     
      Mdev = M
      CALL ABCD<<<dim3(ISIZE/16+1,JSIZE/16+1,1),dim3(16,16,1)>>>(Mdev,2.0,3.0)
      
      WRITE (*,*) cudaGetErrorString(cudaGetLastError())
      
      M = Mdev
	
      WRITE(*,*) M
      STOP
      
      END PROGRAM

And here’s the output

no error

0: copyout Memcpy (host=0x4aaa48, dev=0x5100000, size=160) FAILED: 30(unknown error)

In my opinion, this error message just doesn’t help much.

Hi Mr.Smith,

I see a few issues.

In Fortran, the default is to pass arguments by reference. This includes constants. So here you’re passing host memory references to the constant values of “2.0” and “3.0”. Since the device can not reference host memory, it’s abnormally aborting. Instead, you need to pass these arguments by value.

      ATTRIBUTES(GLOBAL) SUBROUTINE ABCD(M,THREAL,THINT)
      REAL,DEVICE :: M(ISIZE,JSIZE)
      REAL,VALUE :: THREAL, THINT
      INTEGER :: I,J

The second is that your launch configuration should be changed to:

CALL ABCD<<<dim3(ISIZE+15/16,JSIZE+15/16,1),dim3(16,16,1)>>>(Mdev,2.0,3.0)

Finally, you need to add a call to “cudaThreadSynchronize()” before calling “cudaGetErrorString”. A kernel launch is asynchronous, so when you make the call get the last error, your actually getting the last error issued before the kernel launch. Granted, the error message returned isn’t much better, but it’s coming from the CUDA driver so there’s not much we can do to improve it.

 ...     CALL ABCD<<<dim3(ISIZE+15/16,JSIZE+15/16,1),dim3(16,16,1)>>>(Mdev,2.0,3.0)
      istat= cudaThreadSynchronize()
      WRITE (*,*) cudaGetErrorString(cudaGetLastError())
...



% pgf90 test1.cuf; a.out
 unspecified launch failure                                                                                                      
0: copyout Memcpy (host=0x6bb180, dev=0x100000, size=160) FAILED: 4(unspecified launch failure)

Hope this helps,
Mat