Calculate time ?

I want to calculate computing time, and I have a problem.

My experience result is as following.

Before executing kernel function, I execute some cudaMalloc() functions.

When I execute cutStartTimer( timer) before the first cudaMalloc(), I find the operation time is so extremely long (120 ms).

So I move cutStartTimer( timer) after the first cudaMalloc(), I find the time is so short (1.33ms).

I don’t know why the first cudaMalloc() wastes so long time (120 - 1.33) ?

Thanks for any info :blink:

  1. Dynamic memory allocation is very expensive
  2. Is cudaMalloc() the first cuda* call in your program? The first such call also initializes the driver runtime and the GPU to prepare them for CUDA calculations. That also takes a significant amount of time.

So, is there any solution to displace the Dynamic memory allocation efficiently ?

Thanks for reply :rolleyes:

Are there Static memory allocation methods ?

Thank you for reply.

Only allocate once at the beginning of the progam.

Of course. Just declare a device array. You will need to use cudaMemcpToSymbol to copy to it.

Sure. Here’s one:

float myStaticMemory[1000];

Extend it to CUDA as per what MrAnderson said.

There’s also dynamic allocation that is light. Eg you can make your own stack.

float* stackmemory = malloc(1000000);

int stackpointer = 0;

float* myMalloc(int size) {

float* pointer = &stackmemory[stackpointer];


return pointer;


void myFree(int size) {




for(int i= 0; i< 100; i+= 1){

	 a = myMalloc(i);

	 b = myMalloc(10*i);

	 // use a and b

	 myFree(10*i + i);


The above code will be much faster than calling malloc() multiple times. Extend the concept to cudaMalloc() as well.