CUDA FORTRAN examples don't work for PGI19.4

Dear someone,

Recently we update the PGI to 19.4, the CUDA to 10.1, and the Nvidia Driver to 418.39. It seems everything is fine before I test the new version with some simple examples (like increment2.cuf). It is like that all the calculations on GPU are never fulfilled. We are using Titan V so we always add the tag -Mcuda=cc70 but this tag does not work any more. The example program can be compiled and excuted but have the wrong answer.

The information of workstation is as following:

CPU: Intel E5-2630
GPU: Nvidia Titan V
OS: CentOS 6.9

Any help? Thank you very much!

Ye Luo

Hi Ye Luo,

While I don’t have a system with your exact setup, I just tried running the example on a number of systems including a Titan V, V100, CUDA 10.1, CUDA 10.0, and CentOS 6.9, and they all worked for me. Hence, I’m thinking the issue is something specific to your system.


t seems everything is fine before I test the new version with some simple examples (like increment2.cuf).

Was the only change to update to 19.4 or did you also update the CUDA driver at the same time?

Can you run the PGI utility “pgaccelinfo” to see if the PGI runtime can recognize your GPU?
What’s the output from running “nvidia-smi”?
Are you able to compile and successfully run the CUDA C examples that ship with CUDA 10.1?


Hi Mat, here is the problem so far:
0) I uninstall all things and reinstall them back with latest versions (PGI to 19.4, CUDA to 10.1, Nvidia Driver to 418.39)

  1. running pgaccelinfo will get right info. It will report two TitanV we have;
  2. running nvidia-smi also gives right info.
  3. CUDA C example has no problem. For instance, I can run a matrix multiplication case and got the right answer.

More details:
4) all the case without calculation have no problem, like bandwidthTest and deviceQuery;
5) all the case with calculation (like increment2, limitingFactor and finiteDifference) have problems. When you run the code, it will never get the right answer and show “Segmentation fault (core dumped)”.

Thank you a lot!

Ok, so it sounds like your system is fine and it’s only there specific examples that are causing issues.

As I mentioned, they worked fine for me, so I’m not sure what’s wrong.

When you run the code, it will never get the right answer and show “Segmentation fault (core dumped)”.

A seg fault happens in host code which makes things even more mysterious.

Can you post the full ouptut from the build and run of “increment2”?

Also, can you compile with “-g” and then run the increment2 binary through the debugger to see where the segv is occurring?


Hi, Mat, here are more details,

I built the increment2.cuf use the command:
pgfortran -Mcuda=cc70 increment2.cuf
then run it:
I got:
**** Program Failed ****
Segmentation fault (core dumped)

I try to print out the value in the kernel function but no output (print in host code is feasible). It seems like the program never launch the GPU part. But when I check with nvidia-smi, it shows the program is running and using the global memory.
Complie with “-g” got the same result. The segv happens at the last of the program (yes it is in the host code).

At first, because the key of community edition expires, I need to install the PGI 19.4, and then we I compile the program, it say I need to update CUDA driver (before it is version of 9.0 or 9.1). And then the new CUDA driver need a latest Nvidia driver (418.39). I did all of these by uninstalling all of them first and reinstalling one by one (Nvidia driver to CUDA to PGI compiler). But same problem.

Thank you very much for your patience. This probelm do trouble us a lot.


I’m still at a bit of a loss as to why this would occur.

Let’s try a different tack. Can you go to the OpenACC examples and see if they work?

The OpenACC runtime contains more error checking so might help to show any issues.

CUDA Fortran just calls the CUDA API, so error checking needs to be done by the user. So another thing to try is to capture the return code from the kernel launch to see if/why it may be failing.

% cat increment2.cuf
!     Copyright (c) 2017, NVIDIA CORPORATION.  All rights reserved.
! NVIDIA CORPORATION and its licensors retain all intellectual property
! and proprietary rights in and to this software, related documentation
! and any modifications thereto.
!    These example codes are a portion of the code samples from the companion
!    website to the book "CUDA Fortran for Scientists and Engineers":

module simpleOps_m
  attributes(global) subroutine increment(a, b)
    implicit none
    integer, intent(inout) :: a(:)
    integer, value :: b
    integer :: i, n

    i = blockDim%x*(blockIdx%x-1) + threadIdx%x
    n = size(a)
    if (i <= n) a(i) = a(i)+b

  end subroutine increment
end module simpleOps_m

program incrementTest
  use cudafor
  use simpleOps_m
  implicit none
  integer, parameter :: n = 1024*1024
  integer, allocatable :: a(:)
  integer, device, allocatable :: a_d(:)
  integer :: b, tPB = 256
  integer :: ierr

  allocate(a(n), a_d(n))
  a = 1
  b = 3

  a_d = a
  call increment<<<ceiling(real(n)/tPB),tPB>>>(a_d, b)
  ierr = cudaGetLastError()
  print *, "Error code: ", cudaGetErrorString(ierr)
  a = a_d

  if (any(a /= 4)) then
     write(*,*) '**** Program Failed ****'
     write(*,*) 'Program Passed'
  deallocate(a, a_d)
end program incrementTest
% pgfortran -O2 increment2.cuf -V19.4; a.out
 Error code:
 no error                                                                                                  
 Program Passed

Hi, Mat, OpenACC examples aslo have problems. Not so sure about the problem but it seems that the version has conflict to each other. Now I degrade the CUDA driver from 10.1 to 9.2 and Problem solved. I also notice something from CUDA toolkit documentation:

Table 1. Native Linux Distribution Support in CUDA 10.1
Distribution Kernel* GCC GLIBC ICC PGI XLC CLANG
CentOS 6.10 2.6.32 4.4.7 2.12

It seems CentOS 6.9 is no longer supported. (not sure)

Thank you very much for the help!


It seems CentOS 6.9 is no longer supported.

I’m not sure either but this is most likely the cause. I don’t have a system with CentOS 6.9 that has a GPU so can’t confirm. I do have a CentOS 6.8 system, but it’s using CUDA 10.0 drivers.