why disabling irq on linux causes rdma_read and rdma_write to fail ?

I have two howts machine connected by Mellanox infiniband HCA. I’m executing a simple RDMA application to perform RDMA write and RDMA read operation

from one machine (client) on the other machine (server). To know which interrupts are related to HCA cards on each machine, I ran the following command less proc/interrupts

67:475880 50253 0 0 PCI-MSI-edge mlx4-async@pci:0000:01:00.0 68:399002 0 73 0 PCI-MSI-edge mlx4_0-0 69: 0 3264 23 0 PCI-MSI-edge mlx4_0-1 70: 0 0 0 0 PCI-MSI-edge mlx4_0-2 71: 0 0 0 0 PCI-MSI-edge mlx4_0-3

On the server machine, I’ve experimented that using the function __disable_irq() on those 4 interrupts causes all RDMA read/write operations performed by the client to fail with the error message “transport retry counter exceeded”.

My question is why and when RDMA read/write operations can generate irqs on the remote machine, I taught that they don’t involve the remote CPU, then they will not perform any kind of IRQ ?

Then, why disabling those interrupts causes these operations to fail ?

Message was edited by: FOPA Léon Constantin

Hello FOPA,

To better answer your question, it requires access to your server to see what’s really happening.

I suggest you contact Mellanox Support directly. Support@mellanox.com mailto:Support@mellanox.com