Hi,
I try to get the small example from (https://devblogs.nvidia.com/easy-introduction-cuda-fortran/) to run on our GPU cluster with P100 GPUs.
I took the example as it from the website added the MPI initialization code and compiled it using mpif90.
We are using MVAPICH2-GDR-2.3a.
mpif90 --version
pgf90 17.10-0 64-bit target on x86-64 Linux -tp px
PGI Compilers and Tools
Copyright (c) 2017, NVIDIA CORPORATION. All rights reserved.
The program is compiled with the command:
mpif90 -o saxp saxpy.cuf
module mathOps
contains
attributes(global) subroutine saxpy(x, y, a)
implicit none
real :: x(:), y(:)
real, value :: a
integer :: i, n
n = size(x)
i = blockDim%x * (blockIdx%x - 1) + threadIdx%x
if (i <= n) y(i) = y(i) + a*x(i)
end subroutine saxpy
end module mathOps
program testSaxpy
use mpi
use mathOps
use cudafor
implicit none
integer, parameter :: N = 4000
integer ierr, npe0, iam0
real :: x(N), y(N), a
real, device :: x_d(N), y_d(N)
type(dim3) :: grid, tBlock
call MPI_Init(ierr)
write(*,*) "Programstart"
call MPI_Comm_size(mpi_comm_world,npe0,ierr)
call MPI_Comm_rank(mpi_comm_world,iam0,ierr)
ierr = cudaSetDevice(0)
tBlock = dim3(256,1,1)
grid = dim3(ceiling(real(N)/tBlock%x),1,1)
x = 1.0; y = 2.0; a = 2.0
x_d = x
y_d = y
call saxpy<<<grid, tBlock>>>(x_d, y_d, a)
y = y_d
write(*,*) 'Max error: ', maxval(abs(y-4.0))
end program testSaxpy
However, if I run the program it fails with a seg fault.
If I comment out the kernel and copy operations it runs fine:
!x_d = x
!y_d = y
!call saxpy<<<grid, tBlock>>>(x_d, y_d, a)
!y = y_d
cudaSetDevice() returns 0
What could cause the error?
Thank you for your help!