Unable to Write to Output Buffer of Network Card Using GPUDirect

I am attempting to use GPUDirect to read from and write to memory through a network card attached on a nearby PCI slot. I have a Tesla K20c GPU and a SuperMicro X11SSM-F motherboard with the GPU plugged into the “PCI-E 3.0 x8 (in x16 slot)”. In our GPU algorithm we are successfully able to read from the input buffer but when we then attempt to write to the output of this network card we are unsuccessful. When we run this identical algorithm on a different system with an identical GPU but instead plugged into a complete x16 slot the algorithm is able to successfully write to the digitizer output. Therefore, we are confident that the algorithm is working correctly but for some reason on the system with the x8 slot the GPU is unable to write to the correct memory buffer. Our question is does the fact that we are plugged into a x8 slot affect our ability for the GPU to successfully write to the desired memory buffer?

This is outside of my area of expertise. I am guessing this could have to do with how the PCIe slots are hooked up. From the SuperMicro website, there appear to be four PCIe slots on that single-socket motherboard:

1 PCI-E 3.0 x8 (in x16 slot)
1 PCI-E 3.0 x8
2 PCI-E 3.0 x4 (in x8 slot)

If the GPU is in the first slot listed, what slot is the network card in?

The CPUs supported by the motherboard seem to be all low-end processors with only 16 PCIe lanes, and the slots provide more PCIe lanes in total than the CPU has PCIe lanes. The chipset used on the board (C236) supports up to 20 PCIe lanes.

One speculative hypothesis is that not all slots are connected to the CPU but instead some are connected to the chipset, and that this may interfere with the desired functionality (one device hanging off the CPU directly, the other hanging off the chipset).