error with derived types in PGI CUDA 10.4

Today I installed PGI CUDA Fortran v. 10.4.
The following code (working with v. 10.3) cannot be compiled anymore with the new version.

module mod_test
 use cudafor
 implicit none
 type comp
   integer :: n
   real*8  :: r
 end type
 type(comp), device :: struct_dev
contains
 attributes(device) real*8 function calc2(i)
   implicit none
   integer, value :: i
   integer :: j
   calc2 = 0.d0
   do j=1,struct_dev%n
     calc2 = calc2 + log(real(i)) * struct_dev%r
   end do
 end function
 attributes(device) subroutine calc(counter,idx,Vettore)
   implicit none
   integer, value :: counter, idx
   real*8, dimension(counter),device :: Vettore
   !real*8, dimension(32), shared :: vettore_shared
   real*8 :: var
   integer :: i, tid
   tid = threadidx%x
   !var = 0.
   do i=1,20
      var = calc2(i)  
   enddo
   Vettore(idx) = Vettore(idx)+var
   !vettore_shared(tid) =vettore_shared(tid)+ var
 end subroutine
 attributes(global) subroutine kernel_test(counter,Vettore)
   implicit none
   integer, value :: counter
   real*8, dimension(counter),device :: Vettore
   integer :: idx
   idx = (blockidx%x-1)*blockdim%x + threadidx%x
   call calc(counter,idx,Vettore)
 end subroutine
end module
program test
 use cudafor
 use mod_test
 implicit none
 integer :: nblocks, nthreads, counter
 integer :: c1, c2
 real*8, dimension(:), allocatable :: Vettore_host
 real*8, dimension(:), allocatable, device :: Vettore_dev
 counter = 9216
 nthreads = 32
 nblocks = counter/nthreads
 call system_clock(count=c1)
 allocate(Vettore_host(counter),Vettore_dev(counter))
 Vettore_dev = 0.d0
 Vettore_host = 0.d0
 struct_dev%n = 10
 struct_dev%r = 0.1d0
 call kernel_test<<<nblocks,nthreads>>>(counter,Vettore_dev)
 Vettore_host = Vettore_dev
 deallocate(Vettore_host,Vettore_dev)
 call system_clock(count=c2)
 write(*,*) 'time ', c2-c1 
end program

This is the error message I get:

/tmp/pgcudaforp5ebTWCUwOMw.gpu(14): error: no suitable conversion function from “struct ::” to “int *” exists

/tmp/pgcudaforp5ebTWCUwOMw.gpu(18): error: no suitable conversion function from “struct ::” to “double *” exists

2 errors detected in the compilation of “/tmp/pgnvdd6ebjwUXYAt3.nv0”.
PGF90-F-0000-Internal compiler error. pgnvd job exited with nonzero status code 0 (test.cuf: 41)
PGF90/x86-64 Linux 10.4-0: compilation aborted

It seems that the problem is in the use of the derived type variable struct_dev, but I cannot understand why.

Thank you for any hint!

Hi Goblinsqueen,

Looks like a new compiler error. 10.4 has a lot of changes in it and we obviously missed this issue. I have sent a report to our engineers (TPR#16836) and hopefully we can have it fixed shortly.

The work around is to continue using 10.3 or replace “struct_dev” with it’s components.

     integer,device :: devn
     real*8,device  :: devr

Thanks,
Mat

Hi Mat,
I think that for the moment I’ll keep using 10.3.
Do you think this problem will be fixed in 10.5?
Do you already know when 10.5 will be released?
Thanks!

Hi goblinsqueen,

Do you think this problem will be fixed in 10.5?

Depending upon when the issue is submitted and it’s severity, we do try an resolve issues within a month. Though, there no guarantee. Feel free to ping me or PGI Customer Service for status.

Do you already know when 10.5 will be released?

We try to release the first Thursday of each month with 10.5 currently schedule for May 6th. (Our versioning changed this year to mean Year.Month). Granted releases date can and do occasionally slip by a few days.

Hope this helps,
Mat

Thank you!

Hi goblinqueen,

I just verified that TPR#16836 will be fixed in 10.5.

  • Mat

It is half-fixed. I can use derived type constant variables, which is very nice because I have packed all my small constants in one derived type, to save on memory transfer, but derived type device arrays do not work yet. E.g.

MODULE cuda_arrays
  USE cudafor
  implicit none
  type :: float2
    real x, y
  end type
  type(float2), dimension(:),  allocatable, device :: gf2
CONTAINS
!==============================
attributes(global) SUBROUTINE kernel(gf2)
  implicit none
  type(float2), dimension(100) :: gf2
  if (threadidx%x==1) gf2(blockidx%x)%x = 1.
END SUBROUTINE kernel
END MODULE
!==============================
PROGRAM test
  USE cuda_arrays
  implicit none
  type(dim3)      :: nt, nb
  allocate(gf2(100))
  nt = dim3(32, 1, 1)
  nb = dim3(100, 1, 1)
  call kernel<<<nb,nt>>>(gf2)
END

gives internal compiler error with v10.5 and compiles with v10.3

Troels

I have just found a work-around. It works if the device array is not declared in the module, but passed as argument. The following compiles and runs with both v10.3, v10.4, and v10.5 :

MODULE cuda_arrays
  USE cudafor
  implicit none
CONTAINS
!==============================
attributes(global) SUBROUTINE kernel(gf2)
  implicit none
  type :: float2
    real x, y
  end type
  type(float2), dimension(100) :: gf2
  if (threadidx%x==1) gf2(blockidx%x)%x = 1.
END SUBROUTINE kernel
END MODULE
!==============================
PROGRAM test
  USE cuda_arrays
  implicit none
  type :: float2
    real x, y
  end type
  type(float2), dimension(:),  allocatable, device :: gf2
  type(dim3)      :: nt, nb
  allocate(gf2(100))
  nt = dim3(32, 1, 1)
  nb = dim3(100, 1, 1)
  call kernel<<<nb,nt>>>(gf2)
END

Troels

Thank you Troels,
your comments are very useful!

I’ll try to fix my code with your solution, but it will be not straightforward (many of my data are declared as derived type).

I hope it will be completely fixed in 10.6.