Pointer Arrays in CUDA Fortran

Hi, all

I need to deal with lots of data which consists of column vectors with different length. As a result, it’s not appropriate to use the two-dimension array. The pointer arrays seems can solve the problem perfectly. Although, it’s not possible to declare an array of pointers in Fortran, we can create an pointer array by using derived data types as the sample routine does.

PROGRAM test_array_pointer

  IMPLICIT NONE

  TYPE :: ptr
    REAL, POINTER :: p(:)
  END TYPE

  TYPE(ptr), ALLOCATABLE :: x(:)

  REAL, TARGET :: a(1)
  REAL, TARGET :: b(2)
  REAL, TARGET :: c(3)


  a=(/1.0/)
  b=(/1.0,2.0/)
  c=(/1.0,2.0,3.0/)

  ALLOCATE(x(3))

  x(1)%p => a
  x(2)%p => b
  x(3)%p => c

  WRITE(*,*)  x(1)%p
  WRITE(*,*)  x(2)%p
  WRITE(*,*)  x(3)%p

  DEALLOCATE(x)

  STOP

END PROGRAM test_array_pointer

The program output:

    1.000000    
    1.000000        2.000000    
    1.000000        2.000000        3.000000    
FORTRAN STOP

It seems that the roundabout method works fine. Then I want to use it in CUDA Fortran, I write a test routine as follows.

PROGRAM test

  USE cudafor

  IMPLICIT NONE

  TYPE :: ptr
    REAL, POINTER :: p(:)
  END TYPE

  TYPE(ptr), DEVICE, ALLOCATABLE :: dev_x(:)

  REAL, DEVICE, TARGET :: a(1)
  REAL, DEVICE, TARGET :: b(2)
  REAL, DEVICE, TARGET :: c(3)

  REAL :: x(1)

  INTEGER :: istat

  a=(/1.0/)
  b=(/1.0,2.0/)
  c=(/1.0,2.0,3.0/)

  ALLOCATE(dev_x(3),STAT=istat)

  WRITE(*,*) istat

  dev_x(1)%p => a
  dev_x(2)%p => b
  dev_x(3)%p => c

  x = dev_x(1)%p

  WRITE(*,*) x

  DEALLOCATE(dev_x)

  STOP

END PROGRAM test

The program output:

           0
Segmentation fault (core dumped)

Errors occur on the memory operations, unfortunately.

BTW, the NVIDIA card is K20C and the compile enviroment shows below.

  • PGI Fortran 13.10 X64, Cent OS 6.4 X64
    -Mcuda=cuda5.5,cc35

Can someone tell me where it goes wrong?
Or is there any other ways to deal with the column vectors with different length?

Hi OceanCloud,

The basic problem here is with “dev_x(1)%p”. You’re trying to access a device data structure on the host. While the base device address of “dev_x” is stored on the host, any internal addresses are not. Hence, to make this work, you need to move these pointer assignments over to the device.

Something like the following:

% cat testp.cuf
module foo
  TYPE :: ptr
    REAL, POINTER :: p(:)
  END TYPE

  TYPE(ptr), DEVICE, ALLOCATABLE    :: dev_x(:)
  REAL, DEVICE                      :: xD(3)
  REAL, DEVICE, allocatable, TARGET :: a(:)
  REAL, DEVICE, allocatable, TARGET :: b(:)
  REAL, DEVICE, allocatable, TARGET :: c(:)

contains

  attributes(global) subroutine assignPtr()
     dev_x(1)%p => a
     dev_x(2)%p => b
     dev_x(3)%p => c
  end subroutine assign

  attributes(global) subroutine getA()
     xD(1:1)= dev_x(1)%p
  end subroutine getA

  attributes(global) subroutine getB()
     xD(1:2) = dev_x(2)%p
  end subroutine getB

  attributes(global) subroutine getC()
     xD = dev_x(3)%p
  end subroutine getC


end module foo

PROGRAM test

  USE cudafor
  use foo
  IMPLICIT NONE


  REAL :: x(3)
  INTEGER :: istat

  allocate(a(1), b(2), c(3))
  a=(/1.0/)
  b=(/1.0,2.0/)
  c=(/1.0,2.0,3.0/)
  x=0.0

  ALLOCATE(dev_x(3),STAT=istat)

  WRITE(*,*) istat
  call assignPtr<<<1,1>>>()
  xD=0.0
  call getA<<<1,1>>>()
  x=xD
  WRITE(*,*) x
  xD=0.0
  call getB<<<1,1>>>()
  x=xD
  WRITE(*,*) x
  xD=0.0
  call getC<<<1,1>>>()
  x=xD
  WRITE(*,*) x

  DEALLOCATE(dev_x,a,b,c)

  STOP

END PROGRAM test

% pgf90 -Mcuda testp.cuf -V13.9 ; a.out
            0
    1.000000        0.000000        0.000000
    1.000000        2.000000        0.000000
    1.000000        2.000000        3.000000
Warning: ieee_inexact is signaling
FORTRAN STOP
  • Mat
1 Like

I know exactly where the problem is

You are a great help, Mat.