MPI+CUDA Fortran

I would like to run the CUDA Fortran code using multiGPUs cards. In my small cluster, I installed the PGI community edition on each node. I installed the OpenMPI v2.0.2 rather than OpenMPI v1.10.2 which is built in PGI community edition. I use the following command line to compile the program test.cuf,

mpif90 -Mcuda test.cuf

The compiler returns the following error message

gfortran: error:unrecognized command line option 「-Mcuda�

But if I use the OpenMPI v1.10.2 which is built in PGI community edition. The above compile command works.

If I want to use OpenMPI v2.0, is there some additional command I need to add to link the CUDA library to make the compile command work ?


Hi Neo,

Your OpenMPI needs to be built with PGI in order to use CUDA Fortran. So if you need to use a version other than the one we ship as part of the install, you’ll need to build it yourself. You can find the source here:

We also have a guide on how to build OpenMPI that you can find here:

The caveat being that the guide is for version 1.4.1 but 2.0 should follow the same steps.

Hope this helps,

The problem is that you installed Openmpi 2.0.2 built with gcc/gfortran,
and the result is that the driver scripts (mpicc,mpif90,mpic++) call gfortran.

The OpenMPI PGI installs was built with PGI compilers, so the driver scripts
call pgfortran/pgcc/pgc++, which understands -Mcuda.

Building with PGI means we also run the MPI tests, and they need to work,
so you can’t just swap compiler back-ends with the mpi drivers andexpect
everything to work.

We don’t recommend people use CUDA Fortran unless they are familiar
with programming using CUDA C. CUDA Fortran programs don’t build or
run with compilers that don’t recognize the CUDA lines in the file
(since they do not look like comments the way OpenACC does.

If you were to build a multiple GPU program foo.f90 using OPenACC, you can
make sure that when you use the 2.0.2 mpif90 built over gfortran, that

mpif90 foo.f90 -o foo_gfortran -fopenmp
should work, when the OpenACC directives are ignored,

mpif90 foo.f90 -o foo_pgi -mp

should also work on the CPU only, and then

mpif90 foo.f90 -o foo_pgi_gpu -mp -acc

should work and use the GPUs (or host, if none present).

Installing the compilers on every node in the cluster is not necessary.
Just put the build area (compilers and scratch) in a commonly mounted
directory, and then build on the cluster master node and run from the
master node across the cluster.


Mat , Dave. Thank for your kindly reply.
I have an another question. If I use the OpenMPI 1.10.2 which is built in PGI community edition.
The following is a simple MPI test code (testmpi.f90)

 program mpitest
    use mpi
    implicit none
    integer :: ID,Proc,Ierr,namelen
    character(MPI_MAX_PROCESSOR_NAME) :: processor_name
    call mpi_init(ierr)
    call mpi_comm_rank(mpi_comm_world,id,ierr)
    call mpi_comm_size(mpi_comm_world,proc,ierr)
    call mpi_get_processor_name(processor_name,namelen,ierr)
    write(*,*) " Hello ! I am : " , ID , " on " , processor_name
    call mpi_finalize(ierr)
  end program

Using the command to compile the code : mpif90
Test 01 : Run the code by using 3 nodes (master,slave1,slave2)
Command : mpirun -host master,slave1,slave2 -np 3 ./a.out
The result is as follows

Hello ! I am : 0 on master
Hello ! I am : 1 on slave1
Hello ! I am : 2 on slave2

Test 02 : Run the code by using 3 nodes (master,slave1,slave3)
Command : mpirun -host master,slave1,slave3 -np 3 ./a.out
The result is as follows

Hello ! I am : 0 on master
Hello ! I am : 1 on slave1
Hello ! I am : 2 on slave3

Test 03 : Run the code by using 3 nodes (master,slave2,slave3)
Command : mpirun -host master,slave2,slave3 -np 3 ./a.out
The result is as follows

Hello ! I am : 0 on master
Hello ! I am : 1 on slave2
Hello ! I am : 2 on slave3

Test 04 : Run the code by using 3 nodes (master,slave1,slave2,slave3)
Command : mpirun -host master,slave1,slave2,slave3 -np 4 ./a.out
The code returns the error message

[slave3:15722] [[7177,0],3] tcp_peer_send_blocking: send() to socket 10 failed: Broken pipe (32)

ORTE was unable to reliably start one or more daemons.
This usually is caused by:

  • not finding the required libraries and/or binaries on
    one or more nodes. Please check your PATH and LD_LIBRARY_PATH
    settings, or configure OMPI with --enable-orterun-prefix-by-default

  • lack of authority to execute on one or more specified nodes.
    Please verify your allocation and authorities.

  • the inability to write startup files into /tmp (–tmpdir/orte_tmpdir_base).
    Please check with your sys admin to determine the correct location to use.

  • compilation of the orted with dynamic libraries when static are required
    (e.g., on Cray). Please check your configure cmd line and consider using
    one of the contrib/platform definitions for your system type.

  • an inability to create a connection back to mpirun due to a
    lack of common network interfaces and/or no route found between
    them. Please check network connectivity (including firewalls
    and network routing requirements).

The error message means that the communication between the master and
slave3 node has a problem. However, the Test 01-03 show that thers is
no communication problem between master and slave3 node
. Now I change the OpenMPI version from 1.10.2 to 2.0.2 and run the Test04 again.
The result is correct

Hello ! I am : 0 on master
Hello ! I am : 1 on slave1
Hello ! I am : 2 on slave2
Hello ! I am : 3 on slave3

Is there someone can answer this question ? Is this a bug of OpenMPI 1.10.2 ?


A late contribution. Try using the 17.4 compilers with Openmpi 1.10
and see if things work.

My suspicion is that if you

ldd ./a.out

on slave3, you will find that some libs cannot be found, and
you need to add their paths to LD_LIBRARY_PATH, so slave3
will work.

compare the output with

ldd ./a.out
on slave2.