MLNX_OFED version:
MLNX_OFED_LINUX-5.4-3.0.3.0-ubuntu20.04-x86_64
System:
Ubuntu 20.04
When mr size reaches about 2GB:
- ibv_dereg_mr(mr) gets stuck(no return, and process hangs)
- kill -9 cannot immediately kill the process, htop shows the process occupying 100% CPU
- ibv_devinfo shows “failed to open device”
However, mr = ibv_reg_mr() still works well, I can even do rdma operations when ibv_dereg_mr not called.
When mr size not reaching 2GB, about 2045MB, it also works well.
Code Example:
// mem_ptr points to mmaped 2M hugepages, dereg problem occurs when mem_sz reaches 2GB.
auto mr = ibv_reg_mr(pd, mem_ptr, mem_sz, IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_READ | IBV_ACCESS_REMOTE_WRITE | IBV_ACCESS_REMOTE_ATOMIC);
// When sleeping, everything works well, ibv_devinfo also shows the device
int count = 30;
while (count > 0) {
count–;
sleep(1);
LOG(2) << "sleeping… mr size: " << mr->length;
}
{
auto rc = ibv_dereg_mr(mr); // When problem occurs, ibv_devinfo prints “failed to open device”
LOG(2) << “dereg mr”; // When problem occurs, process stucks and this line is not printed
LOG_IF(2, rc != 0) << "dereg mr error: " << strerror(errno);
}