Hi Martijin
Thank you for your reply about the issue.
I didn’t describe the question clearly, the h/w environment is list below:
- Hardware:
ConnectX-3 (Mellanox Technologies MT27500 Family [ConnectX-3])
Nvidia K80
- Software:
ubuntu-16.04, kernel 4.8.7
nvidia-driver: nvidia-diag-driver-local-repo-ubuntu1604-384.66_1.0-1_amd64.deb (downsite: NVIDIA DRIVERS Tesla Driver for Ubuntu 16.04 Tesla Driver for Ubuntu 16.04 | 384.66 | Linux 64-bit Ubuntu 16.04 | NVIDIA )
cuda-toolkit: cuda_8.0.61_375.26_linux.run (CUDA Toolkit Download | NVIDIA Developer CUDA Toolkit 11.7 Update 1 Downloads | NVIDIA Developer )
MLNX_OFED: MLNX_OFED_SRC-debian-4.1-1.0.2.0.tgz http://content.mellanox.com/ofed/MLNX_OFED-4.1-1.0.2.0/MLNX_OFED_SRC-debian-4.1-1.0.2.0.tgz http://www.mellanox.com/downloads/ofed/MLNX_OFED-4.1-1.0.2.0/MLNX_OFED_SRC-debian-4.1-1.0.2.0.tgz http://www.mellanox.com/downloads/ofed/MLNX_OFED-4.1-1.0.2.0/MLNX_OFED_SRC-debian-4.1-1.0.2.0.tgz
nv_peer_mem: 1.0.5
I have two servers, with one server has a K80 GPU. I want to use perftest to test the RDMA and GPUDirect. Reference to this https://devblogs.nvidia.com/parallelforall/benchmarking-gpudirect-rdma-on-modern-server-platforms , I install nv_peer_mem in server with 80 GPU.
When i didn’t use --use_cuda, the ib_write_bw work well, but when i use --use_cuda, it hase error, and i print the error message, the ib_write_bw run into ibv_reg_mr, and then got an error: “File has opened”. If i didn’t insmod nv_peer_mem, ibv_reg_mr got an error: “Bad address”.
The background is that i had run the same experiment correct before, which i use kernel 4.4.0, and MLNX_OFED 4.0-2.0.0.1, and didn’t install NVMe over Fabrics. Then my workmate install kernel 4.8.7, and NVMe over Fabrics. After then, the ib_write_bw with --use_cuda can never run collect.
Is there any question in my experiment, and experiment environment. And another question, can i use one ConnectX-3 to support NVMe over Fabrics and GPUDirect RDMA at the same time.
Thanks very much for your reply again, and looking forward to your reply.
Yours
Haizhu Shao