Dear all,
I would like to get more familiar with using CUDA-aware MPI and NCCL. So, I would like to be able to compile (and eventually try out) the NCCL examples from the library documentation (https://docs.nvidia.com/deeplearning/sdk/nccl-developer-guide/docs/examples.html#communicator-creation-and-destruction-examples).
I use CentOS/7.5, CUDA/10.0, OpenMPI/4.0.0 and NCCL/2.4 on a SkyLake compute node with 4xP100 devices per node.
More concretely, when I try to compile the first example in the URL above, I get the following errors:
$> gcc -g -Wall -Wextra -pedantic -std=gnu99 -lcudart -lnccl ex-01.c
ex-01.c: In function 'main':
ex-01.c:46:5: warning: passing argument 1 of 'cudaMalloc' from incompatible pointer type [enabled by default]
CUDACHECK(cudaMalloc(sendbuff + i, size * sizeof(float)));
^
In file included from /apps/software/CUDA/10.0.130/include/cuda_runtime.h:96:0,
from ex-01.c:3:
/apps/software/CUDA/10.0.130/include/cuda_runtime_api.h:4126:58: note: expected 'void **' but argument is of type 'float **'
extern __host__ __cudart_builtin__ cudaError_t CUDARTAPI cudaMalloc(void **devPtr, size_t size);
^
ex-01.c:47:5: warning: passing argument 1 of 'cudaMalloc' from incompatible pointer type [enabled by default]
CUDACHECK(cudaMalloc(recvbuff + i, size * sizeof(float)));
^
In file included from /apps/software/CUDA/10.0.130/include/cuda_runtime.h:96:0,
from ex-01.c:3:
/apps/software/CUDA/10.0.130/include/cuda_runtime_api.h:4126:58: note: expected 'void **' but argument is of type 'float **'
extern __host__ __cudart_builtin__ cudaError_t CUDARTAPI cudaMalloc(void **devPtr, size_t size);
May I ask you to show me how to correctly compile the examples into executables?
Best regards,
E.M.