H100 PCIe RDMA crashes


I have written a kernel driver for a third party device to do RDMA over PCIe to my H100 GPUs. When I set it to do the RDMA through the root complex it can do that successfully, but when I try to have it go through a PCIe switch to the nearest GPU I get hardware crashes with no crash logs. The switch is a pretty standard gen4 PEX of sorts. Everything shows up in lspci as it should, and I don’t suspect any errors on the third party device.

I would appreciate any tips, advice or technical support.
Thank you,

Correction, these are actually “A100 SXM” GPUs