Cuda Malloc CudaFree before CudaMalloc, how is that possible?

Hello All,

I have a question it maybe too simple for some but I am confused.

I have written a program where I do CudaMalloc and it works fine but takes around 51 msec to complete. I have another program where I am using cusparse library to do the same task but here CudaMalloc takes much less time (5 msec) for the same data type, with the same size as previous. Why is it so?

When profiled I have noticed that in the (cusparse) second case its doing a CudaFree before CudaMalloc, but I have no statement for CudaFree at that point… Is it that the library is cleaning the memory before any operation is called?

Regards,

Walter

Hi,
The first call to cudaMalloc, cudaFree, and any other cuda function that manages the memory also triggers the cuda context initialisation on the card. That make this very first call always longer. Usually, one tends to do it as early as possible in the code and an usual way of doing so is either through a “cudaMalloc(&foo, 0)” or a “cudaFree(NULL)”. I guess that should explain the behaviour you see.