GPU + infiniband CUDA allocation failures

bbudge · January 26, 2008, 12:45am

Hi all -

Just wondering if anyone uses infiniband with nvidia GPUs running CUDA? I’m using Mellanox infiniband adapters and Quadro 5600s, and I randomly see cuda allocation failures, even allocation of 1 byte at CUDA startup.

I run my application in MPI mode or standalone mode. In MPI mode over ethernet, I don’t see any problems, and in standalone mode the same is true. When I run MPI over IB, I get cudaMalloc failures right off the bat about 90% of the time. As I add more nodes to my run, that number goes up exponentially until with 3 nodes, I pretty much can never get it to work.

Anway, figured I’d post this in here to see if anyone has experience with this. I have been trying to build a repro case, but I can’t seem to get it to fail without running my program (which consists of about 50,000 lines of code, I’m sure NVIDIA doesn’t want to deal with that)

Topic		Replies	Views
Crashes when trying to use infiniband "call to ibv_create_cq failed with error Cannot allocate memory" CUDA Programming and Performance	0	407	January 23, 2025
CUDA memory allocation failures on Windows XP x64 CUDA Programming and Performance	1	2185	August 14, 2008
cudaMalloc failing cuda malloc failing CUDA Programming and Performance	0	2034	August 8, 2011
Buying Nvidia Products is a Serious Waste of Money: They Don't Work CUDA Developer Tools	0	469	June 26, 2020
cudaErrorMemoryAllocation just after CUDA installation CUDA Setup and Installation	0	1502	May 12, 2021
Allocation error causes all other allocation attempts to fail CUDA Programming and Performance	1	8338	March 21, 2011
CUDA on iMac with NVIDIA GeForce 9400 Successful and Failed Tests CUDA Programming and Performance	5	41430	March 20, 2010
Installation problems on iMac CUDA Setup and Installation	2	667	April 8, 2017
Segfault during allocation CUDA Programming and Performance	0	437	October 20, 2016
Bad Memory allocation for p2p Bandwidth and Latency Test CUDA Programming and Performance	0	943	January 30, 2018

GPU + infiniband CUDA allocation failures

Related topics