I am using PGI 20.7 and I have code written in both OpenACC and MPI, is there a command in pgi compiler that I can use to execute the code?
thank you in advance
I am using PGI 20.7 and I have code written in both OpenACC and MPI, is there a command in pgi compiler that I can use to execute the code?
thank you in advance
Sorry, I’m not clear on the question. Are you asking about using “mpirun” to execute your MPI program? How to execute the compilers to build your code? How to use both OpenACC and MPI in the same code? Or are you having a runtime issue when running your executing your program?
I am sorry if my question not clear, I have a code written in c++ and I put OpeACC pragma on it and I also use mpi on it too so my source code be like this: c++ + OpenACC + mpi .
Can I use mpirun directly with pgi or do I have to do some configuration
The MPI you use does need to be configured and built with the NVIDIA HPC Compilers (formerly called PGI). We do ship a version of OpenMPI with the NVIDIA HPC SDK that you can use (see the “/20.7/comm_libs/openmpi/openmpi-3.1.5/bin” directory). Set you PATH environment variable to include this directory, as well as the LD_LIBRARY_PATH to include the compiler runtime libraries (i.e. “<base_path>/20.7/compilers/lib”) so the MPI drivers can find them. Then you can use “mpicxx” in place of “nvc++” to compile and then use “mpirun” to execute your program.
If you are using multiple GPUs, like one per rank, you should also add code to perform the device assignment. Otherwise all ranks will use the default device. I’ll typically put the following boiler plate code in my MPI applications just after calling MPI_init:
...
#ifdef _OPENACC
#include <openacc.h>
#endif
...
MPI_Comm_rank(MPI_COMM_WORLD, &settings->rank);
MPI_Comm_size(MPI_COMM_WORLD, &settings->num_ranks);
#if _OPENACC
int num_devices;
int gpuId;
MPI_Comm shmcomm;
int local_rank;
acc_device_t my_device_type;
MPI_Comm_split_type(MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, 0,
MPI_INFO_NULL, &shmcomm);
MPI_Comm_rank(shmcomm, &local_rank);
my_device_type = acc_get_device_type();
num_devices = acc_get_num_devices(my_device_type);
gpuId = local_rank % num_devices;
acc_set_device_num(gpuId, my_device_type);
#endif
Basically, this get the local rank id for the node, checks how many devices are on the node, then round-robins the device assignment.