Cuda-aware OpenMPI fails on Jetson K1

Hi

I’ve setup a fresh L4T R21 system and compiled OpenMPI 1.8.3 on my own via
“./configure --enable-mpi-thread-multiple --with-threads --with-cuda ; make ; sudo make install”.

When i execute a simple sample code, found on
https://www.olcf.ornl.gov/tutorials/gpudirect-mpich-enabled-cuda/#Examples
and compiled
via mpicc -o example example.c -L/usr/local/cuda/lib -lcudart -I/usr/local/cuda/include
with

dribbroc@tegra-ubuntu:~/ompi_cuda$ mpicc --version
gcc (Ubuntu/Linaro 4.8.2-19ubuntu1) 4.8.2
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

dribbroc@tegra-ubuntu:~/ompi_cuda$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2014 NVIDIA Corporation
Built on Thu_Jul_17_18:43:29_CDT_2014
Cuda compilation tools, release 6.5, V6.5.12
dribbroc@tegra-ubuntu:~/ompi_cuda$

I get

dribbroc@tegra-ubuntu:~/ompi_cuda$ mpirun -np 2 example
Success!
--------------------------------------------------------------------------
The call to cuMemHostRegister(0xace67b80, 33792, 0) failed.
  Host:  tegra-ubuntu
  cuMemHostRegister return value:  801
  Memory Pool:  sm
--------------------------------------------------------------------------
[tegra-ubuntu:03031] 11 more processes have sent help message help-mpi-common-cuda.txt / cuMemHostRegister failed
[tegra-ubuntu:03031] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

It seems, that the programm executes and finishes correcty, but nevertheless these errors are strange.

Has anyone else experience with cuda aware openmpi and encountered similar problems?

Thanks
Dirk

PS: OpenMPI 1.8.4 shows same behaviour.

PPS More verbosive:

dribbroc@tegra-ubuntu:~/ompi_cuda$ mpirun --mca orte_base_help_aggregate 0 -np 2 ./example
Success!
--------------------------------------------------------------------------
The call to cuMemHostRegister(0xace81700, 33792, 0) failed.
  Host:  tegra-ubuntu
  cuMemHostRegister return value:  801
  Memory Pool:  sm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
The call to cuMemHostRegister(0xace92000, 263168, 0) failed.
  Host:  tegra-ubuntu
  cuMemHostRegister return value:  801
  Memory Pool:  sm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
The call to cuMemHostRegister(0xacf12900, 1024, 0) failed.
  Host:  tegra-ubuntu
  cuMemHostRegister return value:  801
  Memory Pool:  sm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
The call to cuMemHostRegister(0xa4e80400, 33792, 0) failed.
  Host:  tegra-ubuntu
  cuMemHostRegister return value:  801
  Memory Pool:  sm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
The call to cuMemHostRegister(0xa4e91000, 263168, 0) failed.
  Host:  tegra-ubuntu
  cuMemHostRegister return value:  801
  Memory Pool:  sm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
The call to cuMemHostRegister(0xa4f11900, 1024, 0) failed.
  Host:  tegra-ubuntu
  cuMemHostRegister return value:  801
  Memory Pool:  sm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
The call to cuMemHostRegister(0xa53ddb80, 33792, 0) failed.
  Host:  tegra-ubuntu
  cuMemHostRegister return value:  801
  Memory Pool:  sm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
The call to cuMemHostRegister(0xa5426480, 263168, 0) failed.
  Host:  tegra-ubuntu
  cuMemHostRegister return value:  801
  Memory Pool:  sm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
The call to cuMemHostRegister(0xa546ae80, 1024, 0) failed.
  Host:  tegra-ubuntu
  cuMemHostRegister return value:  801
  Memory Pool:  sm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
The call to cuMemHostRegister(0xadf0fb80, 33792, 0) failed.
  Host:  tegra-ubuntu
  cuMemHostRegister return value:  801
  Memory Pool:  sm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
The call to cuMemHostRegister(0xadf58480, 263168, 0) failed.
  Host:  tegra-ubuntu
  cuMemHostRegister return value:  801
  Memory Pool:  sm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
The call to cuMemHostRegister(0xadf9ce80, 1024, 0) failed.
  Host:  tegra-ubuntu
  cuMemHostRegister return value:  801
  Memory Pool:  sm
--------------------------------------------------------------------------

Just a quick update:

cudaHostRegister is not supported on ARM32.

Sorry to reserect an old thread…

Is there something I can replace cudaHostRegister with in order to run this code? Also, is it then assumed that the code is running on the GPUs? The code still runs and I get output, but I get this error once for each machine. I am running this on a small group of Jetson TK1s

Hi spencer_k,

The cudaHostRegister cannot be supported on ARM. The reason is that we dont have IO-coherence on tegra.
You may try cudaHostAlloc, but it is both CPU and GPU uncached. So you might have performance problems.

Thanks

sorry to interrupt, but would you mind giving more info? I tried to replace the “cuMemHostRegister” with “cuMemHostAlloc” in ‘ompi/opal/mca/accelerator/cuda/acclerator_cuda_component.c’ / ‘acclerator_cuda.c’ and ‘/ucx/src/uct/cuda/cuda_copy/cuda_copy_md.c’ , but still got the same problem

The call to cuMemHostRegister(0x7f342e0398, 4, 0) failed.
Host: hzx-desktop
cuMemHostRegister return value: 801
Registration cache: checkmem

or can I just ignore the problem? cause I got the correct output on the jetson nano