Multiple declarations in generated C for CUDA FORTRAN

Hello,

I’m using PGF90 11.7 with CUDA FORTRAN on Linux (OpenSuse 11.3, x86_64, 2.6.34.8, NVIDIA Quadro 4000, CUDA 4.0, drivers 275.09.07).
I have worked on small programs for a few weeks now, and I haven’t run into much trouble (except for some already raised on this forum). But now that I’m trying to convert an existing program to CUDA, I’m having strange errors: the generated C code contains multiple declarations, and thus cannot be compiled by pgnvd.

      SUBROUTINE  prodscal(  rnorm,  a,  b  )
      USE numerics
      IMPLICIT  NONE
      REAL(rp), DEVICE, INTENT(OUT)   ::  rnorm
      REAL(rp), DEVICE, DIMENSION(  sx-1:ex+1,  sy-1:ey+1,  sz-1:ez+1),  INTENT(IN)  ::  a
      REAL(rp), DEVICE, DIMENSION(  sx-1:ex+1,  sy-1:ey+1,  sz-1:ez+1),  INTENT(IN)  ::  b
      INTEGER   ::  i,  j,  k
!$cuf kernel do <<<*,*>>>
      do k = 1, (sx+ex+3)*(sy+ey+3)*(sz+ez+3)
          rnorm = rnorm + a(k) * b(k)
      end do
      RETURN
      END SUBROUTINE prodscal

If I do

pgf90 -Mcuda -ta=nvidia:4.0 -Minfo -c prodscal.f90

I get:

<errors about multiple declarations in a /tmp/<random>.gpu file>
PGF90-F-0000-Internal compiler error. pgnvd job exited with nonzero status code       0 (prodscal.f90: 32)
PGF90/x86-64 Linux 11.7-0: compilation aborted

The generated .gpu file contains (excerpt)

prodscal_35_gpu(
    int tc1,
    signed char* p1,
    signed char* p7,
    int u1,
    int u2,
    int u3,
    int x2,
    int x3,
    int x4,
    int x5,
    int x6,
    signed char* p7)
{

In another version, I even get this (it’s a generated header):

struct DT1_50{int m0;int m4;int m8;};
struct DT1_56{long long m0;};
struct DT1_62{long long m0;};
struct DT1_68{long long m0;};
struct DT1_74{struct DT1_68 m0;};
struct DT1_50{int m0;int m4;int m8;};
struct DT1_56{long long m0;};
struct DT1_62{long long m0;};
struct DT1_68{long long m0;};
struct DT1_74{struct DT1_68 m0;};
extern "C" __global__ void prodscal(signed char* _prnorm,signed char* _pa,signed char* _pb)

which obviously results in

/tmp/pgcudaforA2feohoyjhk8.h(6): error: invalid redeclaration of type name "DT1_50"
(1): here

My code may not be straightforward or even correct, but even if it’s the case, such problems should be detected before code generation, that’s why I’m reporting this.

Antoine

PS: The original code comes from CRIHAN [fr].

Hi Antoine,

Internal compiler errors (ICE) are always problems with the compiler. Often, the output produced by the ICE is only meaningful to our compiler engineers within the context of your program.

Can you please post a reproducing example of send one to PGI Customer Service (trs@pgroup.com).

When I try your code, I get syntax errors due to the missing numerics module and dimension mismatch of a and b. After fixing these issue, I do get an different error being caused by rnorm being a device and not host variable. You can try removing the device attribute from rnorm to see if it’s the same problem, otherwise, I’ll need a reproducing example.

Thanks,
Mat

Thanks Mat. Here’s a complete example:

numerics.f90
prodscal1.CUF
prodscal2.CUF

$ pgf90 -Mcuda -ta=nvidia -g -Mprof=lines -Minfo -c numerics.f90



$ pgf90 -Mcuda -ta=nvidia -g -Mprof=lines -Minfo -c prodscal1.CUF 
prodscal:
     11, CUDA kernel generated
         11, !$cuf kernel do <<<1>>>
         12, Sum reduction generated for rnorm
/tmp/pgcudafor5imgRjvmLpV-.gpu(62): error: invalid type conversion

/tmp/pgcudafor5imgRjvmLpV-.gpu(62): error: invalid type conversion

/tmp/pgcudafor5imgRjvmLpV-.gpu(66): error: invalid type conversion

/tmp/pgcudafor5imgRjvmLpV-.gpu(66): error: invalid type conversion

/tmp/pgcudafor5imgRjvmLpV-.gpu(70): error: invalid type conversion

/tmp/pgcudafor5imgRjvmLpV-.gpu(70): error: invalid type conversion

/tmp/pgcudafor5imgRjvmLpV-.gpu(81): error: invalid type conversion

/tmp/pgcudafor5imgRjvmLpV-.gpu(81): error: invalid type conversion

8 errors detected in the compilation of "/tmp/pgnvdnjmgNIq13oPj.nv0".
PGF90-F-0000-Internal compiler error. pgnvd job exited with nonzero status code       0 (prodscal1.CUF: 16)
PGF90/x86-64 Linux 11.7-0: compilation aborted



$ pgf90 -Mcuda -ta=nvidia -g -Mprof=lines -Minfo -c prodscal2.CUF 
PGF90-W-0155-The number of subscripts is less than the rank of a (prodscal2.CUF: 10)
PGF90-W-0155-The number of subscripts is less than the rank of b (prodscal2.CUF: 10)
  0 inform,   2 warnings,   0 severes, 0 fatal for prodscal
prodscal:
      9, CUDA kernel generated
          9, !$cuf kernel do <<<256>>>
         10, Sum reduction generated for rnorm
/tmp/pgcudaforRpmgbLxjN9wS.gpu(24): error: duplicate parameter name

/tmp/pgcudaforRpmgbLxjN9wS.gpu(40): error: invalid type conversion

/tmp/pgcudaforRpmgbLxjN9wS.gpu(40): error: invalid type conversion

/tmp/pgcudaforRpmgbLxjN9wS.gpu(91): error: duplicate parameter name

4 errors detected in the compilation of "/tmp/pgnvdwqmgcJwUeaR7.nv0".
PGF90-F-0000-Internal compiler error. pgnvd job exited with nonzero status code       0 (prodscal2.CUF: 12)
PGF90/x86-64 Linux 11.7-0: compilation aborted

And here are the corresponding generated files:

pgcudafor5imgRjvmLpV-.gpu
pgcudaforRpmgbLxjN9wS.gpu

I’m sending this to your Customer Service.

Although I didn’t get any error message saying that rnorm shouldn’t be on device, I tried leaving it on host, and it works! (but I don’t see why)

Edit: here is another example of “invalid type conversion”. It happens because the scalars are declared on device (everything is fine if they are on host)

      SUBROUTINE  saxpy(  a,  scala, b,  scalb  )
      USE numerics
      USE cudafor
      IMPLICIT  NONE
      REAL(rp), DEVICE, DIMENSION(sx-1:ex+1,sy-1:ey+1,sz-1:ez+1),  INTENT(INOUT)  ::  a
      REAL(rp), DEVICE, DIMENSION(sx-1:ex+1,sy-1:ey+1,sz-1:ez+1),  INTENT(IN)     ::  b
      REAL(rp), DEVICE, INTENT(IN)  ::  scala
      REAL(rp), DEVICE, INTENT(IN)  ::  scalb
      INTEGER   ::  i,  j,  k
      !$cuf kernel do(3) <<<*,*>>>
      DO  k  =  sz,  ez
         DO  j  =  sy,  ey
            DO  i  =  sx,  ex
               a(i,j,k) = scala * a(i,j,k) + scalb * b(i,j,k)
            END  DO
         END  DO
      END  DO
      RETURN
      END SUBROUTINE saxpy

But I don’t have other “multiple declarations” problems.

Hi Antoine,

Thanks for the examples. I’ve logged this as TPR#18028 and sent it on to our engineers for further investigation.

  • Mat

Antoine,

TPR 18028 has been corrected in 11.8. Thanks again for the report.

regards,
dave