The Usage of Unified Memory in MPI Programs

Hey guys:

I have a question about the usage of unified memory.
I think the prototype cudaMallocManaged accept 2 parameters (with default value for the third parameter) and 2 parameters work for pure CUDA program while 3 parameters ran into segmentation fault.

However, I saw this error message in MPI program (simple ping-pong like latency)
too few arguments to function ‘cudaMallocManaged’ (compilation error using MVAPICH2 mpicc)

cuerr = cudaMallocManaged((void **)buffer, MYBUFSIZE);
if (cudaSuccess != cuerr) {
fprintf(stderr, “Could not allocate device memory: %s\n”, cudaGetErrorString(cuerr));
return 1;
}

If I used three parameters like this:
cudaMallocManaged((void **)buffer, MYBUFSIZE, 0);
The compilation passed but run into runtime error saying: invalid argument (from cudaGetErrorString)

I don’t think there are any interaction with MPI library since the CUDA memory allocation itself run into errors.

How to deal with this?
Is there any restriction on the compilation of MPI library as well?

Thanks a lot.

If you specify the 3rd parameter, the documentation:

http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1ge7d3c4b1d1bb4810e5ffd90fbe8f5dda

Indicates that 0 is not a valid choice, thus the runtime error.

You may also be interested in the relevant section of the programming guide:

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-unified-memory-programming-hd

Thanks, the document really helps.