cudaHostRegister with cudaHostRegisterIoMemory flag returns cudaErrorOperatingSystem


for my current project, I need to pin the DMA region of rdma nic (connectx-4), which is also known as user accessible region - UAR in rdma-core and map it to cuda address space using cudaHostGetDevicePointer. I have K40c and K80 GPUs. With K40c (256 MB BAR1 space), I don’t need to enable above 4G decoding in BIOS and the mapping is successful. But for K80, I need to enable above 4G decoding. However, when this feature is enabled, I get cudaErrorOperatingSystem from cudaHostRegister with cudaHostRegisterIoMemory flag no matter which one of the mentioned GPUs is connected.
The dmesg has the following error message:
NVRM: 0000:05:00.0: Failed to DMA map MMIO range [0x38fc06013000-0x38fc06013fff]

The features of my system:
OS: Ubuntu 20.04
Cuda version: 11.4
Nvidia driver: 470.239.06
Motherboard: Supermicro X9SRA/X9SRA-3