Extremely slow cudaMalloc

Sasha_Buzko · September 1, 2011, 9:19pm

Hi all,

I’m seeing some very strange (and slow) performance from a cudaMalloc call to allocate space for an array of floats. I get up to 10 seconds of waiting when allocating 300k floats. I’ve isolated the call in a separate piece of code that can be tested (below).

Strangely, the test routinely takes 8-10 seconds on a Tesla 2050, but frequently takes no time at all on a GTX480. It appears that this behavior depends on the particular hardware being used.

There is no X server running on the Tesla, and there is one on the GTX480. Both systems use CUDA 4.0 with the 270.41.19 driver on a CentOS 5.5 (x86_64).

Is it a bug? Is there an undocumented syntax issue?

Any suggestions would be much appreciated.

Sasha

#include

global void kernel(){

printf("kernel executed\n");

}

int main(){

int size = 300000;

float *result = new float;

float *dev_result;

std::cout << "Allocating memory..." << std::endl;

cudaMalloc((void**)&dev_result, size * sizeof(float));

std::cout << "Completed" << std::endl;

kernel<<<1,1>>>();

cudaFree(dev_result);

delete [] result;

}

mfatica · September 1, 2011, 9:44pm

Enable the persistent mode of the driver ( nvidia-smi -pm ) or keep nvidia-smi running in a loop.

ssh c0-4 “time ./a.out” ( node with driver in non persistent mode )
Allocating memory…
Completed
kernel executed

real 0m1.998s
user 0m0.003s
sys 0m1.916s

ssh c0-0 “time ./a.out” ( node with driver in persistent mode)
Allocating memory…
Completed
kernel executed

real 0m0.140s
user 0m0.005s
sys 0m0.125s

Sasha_Buzko · September 1, 2011, 11:12pm

Thanks for the advice
the persistence mode did the trick in the production code.

Topic		Replies	Views
cudaMalloc, cudaFree speed CUDA Programming and Performance	2	3574	April 4, 2013
Exceptionally slow cudaMalloc() after upgrading to driver version 384.66 on Linux CUDA Programming and Performance	2	588	September 18, 2017
Is cudaMalloc slow when called multiple times? CUDA Programming and Performance	3	142	July 5, 2024
cudaMalloc() CUDA Programming and Performance	0	838	October 9, 2013
Extremely slow cudaMalloc call on GTX 1080 with CUDA8RC CUDA Programming and Performance	0	582	July 27, 2016
Help! First cudaMalloc takes 10 seconds! CUDA Programming and Performance	8	1495	February 11, 2012
Why does dynamic global memory allocation gradually slow down? CUDA Programming and Performance	0	452	October 28, 2017
cuda is really slow - even when doing nothing CUDA Programming and Performance	10	2362	September 3, 2010
CUDA initialization very slow on GeForce GTX 465 Initialization takes 1-4 seconds on GeForce GTX 4 CUDA Programming and Performance	4	4187	November 22, 2012
cudaMalloc() time difference cudaMalloc() takes different times on (nearly) identical GPUs CUDA Programming and Performance	4	1480	October 17, 2009

Extremely slow cudaMalloc

Related topics