Partially present on the device, multiple (more than 3) GPU and multiple (more than 3) MPI_COMM_WORLD

Hello,

I have an application that runs on multiple GPUs using MPI and these GPUs can be grouped into different MPI_COMM_WORLD.

The mapping to the real world is that, assume we have an assembly of mechanical components, and each component can be computed by a few GPUs. There are boundary interfaces among the components and the components will exchange information as the simulation goes along and this is done through MPI.

Now for simplicity, assume each component will use only one GPU. I have tested the code on two components and each component use one GPU and it is working fine.

But I have issues when I run with three components. Suppose I am using 3 GPUs (i.e. A, B and C) for each component, now GPU B need to exchange information with A and C, and the code crashes when B is exchanging information with A and C. The error I have is : FATAL ERROR: variable in data clause is partially present on the device: name=stmp

I have attached the code snippet where it crashes, Do I need to do something special when a GPU need to “talk” to another two GPUs simultaneously which are in a different MPI_COMM_WORLD?

Sorry for the long text and If you need more information about the code, I am happy to provide.

Many thanks for your help in advance,
Feng

tmp.cpp (5.8 KB)

Hi Feng,

A “partially present” error means that the memory already exists on the device but has a different size than what was originally allocated. While there are many causes, often it’s caused when forgetting to put a variable in a “exit data” directive before reallocating on the host.

The error I see in your code is at line 106:

                kdt->build( nx,n, tmp );
                delete[] stmp; stmp=NULL;
                delete[] stmp1; stmp1=NULL;
               #pragma acc exit data delete(stmp,stmp1)

You’re deleting the device data after deleting the host data and setting this variable to NULL. Passing a null pointer to the exit data becomes a no-op so the next time through this section, “stmp” still has an associated device pointer, but with a different size. To fix, move the “exit data” directive before deleting it from the host.

It may be ok if “stmp” and “stmp1” weren’t set to NULL, but it’s always best practice to delete the device data before the host data.

Now for simplicity, assume each component will use only one GPU. I have tested the code on two components and each component use one GPU and it is working fine.

My guess is that it only works because “stmp” isn’t reallocated but when using more ranks, it is.

Do I need to do something special when a GPU need to “talk” to another two GPUs simultaneously which are in a different MPI_COMM_WORLD?

The error doesn’t have to do with the number of ranks, but rather the code path taken as you add more ranks. Hence, I’ll answer this more generally.

Each MPI Rank would be it’s own process so has it’s own memory space and unique CUDA context. You can set-up a shared memory segment that’s visible from multi-ranks and GPUs on the same node, but this is complex to set-up and not something I’d recommend.

The more common approach is to use CUDA Aware MPI so the MPI communication calls with device data are directly transferred between device (either on the same node or across nodes) rather than having to copy the data back to the host.

To use CUDA Aware MPI, you need to use an MPI that has it enabled (such as the OpenMPI we ship with the compilers) and pass the device pointer to the MPI calls. In OpenACC, you’d wrap the calls with a “#pragma acc host_data use_device(<variable>)” region. “host_data” says to use the device pointer on the host.

For example:

          #pragma acc host_data use_device(topSndBuf, topRxBuf)
          {
            MPI_Irecv ( topRxBuf,  (15+8+3)*LSIZEX, MPI_DOUBLE, mpi_peer_top, tagDOWN, MPI_COMM_WORLD, &reqRxTopHalo   );

            MPI_Isend ( topSndBuf, (15+8+3)*LSIZEX, MPI_DOUBLE, mpi_peer_top, tagUP,   MPI_COMM_WORLD, &reqTxTopBorder );
          }

Hope this helps,
Mat

1 Like

Hi Mat,

Many thanks for your reply! You are right on the mistake. I need to delete it on the device first and then on the host. I have been staring at the code for a while and did not notice this.

Once again, thanks for your help.
Feng

Thanks for this post Mat, very informative, and it helped me fix a problem in my code!

Paul

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.