Is first cudaMalloc() will take more time? then how much?

Hello,

Is first cudaMalloc() will take more time?

char D_MALLOC(size_t size)
{
char buf = NULL;
CUDA_SAFE_CALL(cudaMalloc((void
)&buf, size));
CUDA_SAFE_CALL(cudaMemset(buf, 0, size));

return buf;

}

because I have written some and calculated the time frame for D_MALLOC.
the first D_MALLOC is taking nearly 77ms. and others are just around 0.17 ms.

Is this behaviour fine??

Yes. The first CUDA call from the Application API does an implicit initialisation of the device. This takes a long (relatively speaking) time. For this reason, I have any application do a dummy cudaMalloc/cudaFree in the initialisation routines.