about overhead of cudaMalloc() function how many time does cudaMalloc take to allooc memory space?

takeuchi · June 24, 2009, 6:05am

Three arrayA of 1023 elements and three arrayB of 1024 elements be allocated as bellow

int *a1,*a2,*a3;
int *b1,*b2,*b3;
int sizeA = 1023 * sizeof(int);
int sizeB = 1024 * sizeof(int);

cutStartTimer(timer);
CUDA_SAFE_CALL(cudaMalloc( (void**) &a1, sizeA));
CUDA_SAFE_CALL(cudaMalloc( (void**) &a1, sizeA));
CUDA_SAFE_CALL(cudaMalloc( (void**) &a1, sizeA));
cutStopTimer(timer);

printf(“alloc time = %f”, cutGetTimerValue(timer)); // alloc time = 0.016ms

cutStartTimer(timer);
CUDA_SAFE_CALL(cudaMalloc( (void**) &a1, sizeB));
CUDA_SAFE_CALL(cudaMalloc( (void**) &a1, sizeB));
CUDA_SAFE_CALL(cudaMalloc( (void**) &a1, sizeB));
cutStopTimer(timer);

printf(“alloc time = %f”, cutGetTimerValue(timer)); // alloc time = 0.825ms

can anyone tell what makes difference between times of allocating these array of 1023 elements and 1024 elements ?

Tobi_W · June 24, 2009, 7:07am

First of all, you’re overriding sizeA, so all mallocs using the same size.

If this is just a typo in your post, but not in your application, try cudaMalloc() in a loop with 10 to 100 loops and divide the measured time with it. This is more accurate, because the noise in your single measurement could lead to weird results…

Nico · June 24, 2009, 7:13am

Check out this thread.

Here’s the output of the code (modified to make it do what you want it to do :) ) in a Linux environment:

alloc time = 178.873993 (1023)
alloc time = 179.369995 (1024)

N.