GPU + infiniband CUDA allocation failures

Hi all -

Just wondering if anyone uses infiniband with nvidia GPUs running CUDA? I’m using Mellanox infiniband adapters and Quadro 5600s, and I randomly see cuda allocation failures, even allocation of 1 byte at CUDA startup.

I run my application in MPI mode or standalone mode. In MPI mode over ethernet, I don’t see any problems, and in standalone mode the same is true. When I run MPI over IB, I get cudaMalloc failures right off the bat about 90% of the time. As I add more nodes to my run, that number goes up exponentially until with 3 nodes, I pretty much can never get it to work.

Anway, figured I’d post this in here to see if anyone has experience with this. I have been trying to build a repro case, but I can’t seem to get it to fail without running my program (which consists of about 50,000 lines of code, I’m sure NVIDIA doesn’t want to deal with that)