openacc-interoperability

galanti · May 18, 2015, 8:50pm

Dear Friends,
I am checking an example given in an article by Jeff Larkin on techniques for combining OpenACC and CUDA. I used the fortran example called openacc_cublas to check some simple call to my subroutine called “barak”. Here is the code:

!-----------------------------------------------------------------------
program main
use cublas
integer, parameter :: N = 2**20
real, dimension(N) :: X, Y

!$acc data create(x,y)
!$acc kernels
X(:) = 1.0
Y(:) = 0.0
!$acc end kernels

!$acc host_data use_device(x,y)
call barak(N, 2.0, x, 1, y, 1)
!$acc end host_data
!$acc update self(y)
!$acc end data

print *, y(1)
end program

subroutine barak(n,c,x,y)
real, dimension(N) :: X, Y
y=c*x
return
end
!------------------------------------------------------------------------------

I compile the code as in the original example.
export CUDA_HOME=/usr/local/cuda-7.0/
pgfortran openacc_barak.o -L$CUDA_HOME/lib64 -lcudart -Mcuda -fast -acc -ta=nvidia -Minfo=accel

But when I run it I get:
Segmentation fault (core dumped)

Could anyone guide what is wrong and how to fix the code?

Thanks in advance,
Barak

MatColgrove · May 18, 2015, 9:19pm

Hi Barak,

There’s a couple of problems here.

First, you’re passing the wrong number of arguments to “barak” and worse you’re passing the literal “1” to the Y array. While not required here, it’s a good idea to use interfaces to catch these types of errors.

Second, you’re passing in the device pointer to the x and y arrays. However, you’re accessing these on the host. This is causing your seg fault. To fix, I put this computation in a compute region.

Since you’ll be using the CUDA 7.0 cuBlas, if you have PGI 15.4 or later, have the PGI compiler use CUDA 7.0 as well (-ta=tesla:cuda7.0 or -Mcuda=7.0).

Mat

module barak_mod

contains
 subroutine barak(n,c,x,b,y,a)
 integer n,b,a
 real c
 real, dimension(N) :: X, Y
!$acc kernels deviceptr(x,y)
 y=c*x
!$acc end kernels
 return
 end
end module barak_mod

program main
 use barak_mod
! use cublas
 integer, parameter :: N = 2**20
 real, dimension(N) :: X, Y

 !$acc data create(x,y)
 !$acc kernels
 X(:) = 1.0
 Y(:) = 0.0
 !$acc end kernels

 !$acc host_data use_device(x,y)
 call barak(N, 2.0, x, 1, y, 1)
 !$acc end host_data
 !$acc update self(y)
 !$acc end data

 print *, y(1)
 end program

% pgfortran -Mcuda -fast -acc -Minfo=accel test.f90
main:
     12, Generating create(x(:),y(:))
     14, Loop is parallelizable
         Accelerator kernel generated
         Generating Tesla code
         14, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
     21, Generating update self(y(:))
barak:
     32, Loop is parallelizable
         Accelerator kernel generated
         Generating Tesla code
         32, !$acc loop gang, vector(128) ! blockidx%x threadidx%x

% pgfortran -Mcuda=7.0 -fast -acc -Minfo=accel test.f90
main:
     12, Generating create(x(:),y(:))
     14, Loop is parallelizable
         Accelerator kernel generated
         Generating Tesla code
         14, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
     21, Generating update self(y(:))
barak:
     32, Loop is parallelizable
         Accelerator kernel generated
         Generating Tesla code
         32, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
% a.out
    2.000000

galanti · May 18, 2015, 10:14pm

Hi Mat,
I copied your corrected example and compiled. Unfortunately, I still have “Segmentation fault”. Perhaps it is because of my compilation:

export CUDA_HOME=/usr/local/cuda-6.0

pgfortran test.f90 -L$CUDA_HOME/lib64 -lcudart -Mcuda=cuda6.0 -fast -acc -ta=nvidia -Minfo=accel

Thanks,
Barak

MatColgrove · May 19, 2015, 3:27pm

Hi Barak,

You must be using PGI 14.7 or earlier. “host_data” had some issues that were fixed in the 14.9 release. Also, the meaning of “update self” was clarified by the OpenACC committee to be equivalent to “update host” around this time as well. Your options are to either update to PGI 14.9 or later or use the following code:

module barak_mod

 contains
  subroutine barak(n,c,x,b,y,a)
  integer n,b,a
  real c
  real, dimension(N) :: X, Y
 !$acc kernels present(X,Y)
 !acc kernels deviceptr(x,y)
  y=c*x
 !$acc end kernels
  return
  end
 end module barak_mod

 program main
  use barak_mod
 ! use cublas
  integer, parameter :: N = 2**20
  real, dimension(N) :: X, Y

  !$acc data create(x,y)
  !$acc kernels
  X(:) = 1.0
  Y(:) = 0.0
  !$acc end kernels

  !acc host_data use_device(x,y)
  call barak(N, 2.0, x, 1, y, 1)
  !acc end host_data
  !acc update self(y)
  !$acc update host(y)
  !$acc end data

  print *, y(1)
  end program

Mat