cudaMalloc's taking different times

anshu · December 20, 2010, 12:37pm

Dear all,

I am using cudaMalloc to allocate device memory and calculating the timings in the following way:

clock_t start = clock();

        cutilSafeCall(cudaMalloc((void**) &sumImg_d,sizeof(int)*W*H )); //1st call

	tGPU=tGPU+ ((double)clock()-start)/CLOCKS_PER_SEC;

	cutilSafeCall(cudaMalloc((void**) &skinIntData_d,sizeof(int)*W*H)); //2nd call

	

        cutilSafeCall(cudaMalloc((void**) &out_d,sizeof(bool)*(W-dw)*(H-dh)));//3rd call

	

	

	cutilSafeCall(cudaMemcpy(sumImg_d,sumImg,numImgBytes,cudaMemcpyHostToDevice));// 4th call

	

	cutilSafeCall(cudaMemcpy(skinIntData_d,skinIntData,numImgBytes,cudaMemcpyHostToDevice));// 5th call

Now for every frame the first cudaMalloc is taking around 30ms while the 2nd one is taking only around 8ms ( btw these are the total timings and not just for one call of the fucntion). Size of both are the same still 1st one is taking more time is anything wrong? I heard that it could be because of context but not sure if that is the problem. Also the 3rd cudaMalloc is also taking around 35 ms I am not sure on what things this time depends? Is it because I am allocating a bool ??

I am using time.h to get the timings. I just call clock() function at the start and end of the functions to calculate the total time. Also is it necessary to use cudaThreadSynchronize() function to calculate timings in this case ( as far as i know these functions are not asynchronous so I think it should be ok if i dont use cudaThreadSynchronize()) . Kindly point out if this is not correct.

Regards,

YDD · December 20, 2010, 2:53pm

I’m not sure I’d trust [font=“Courier New”]clock()[/font] to have that sort of time resolution - for ms time resolution, use [font=“Courier New”]gettimeofday[/font] - if you’re still using cutil, then the cutil timers should be a convenient wrapper. In general though, the first [font=“Courier New”]cuda*[/font] call will be slow, since the driver has to initialise a GPU context (the execptions are the device query routines).

anshu · December 22, 2010, 1:42am

Thanks for the reply.

I also tried using GPU timers ie cutCreateTimer etc. but timings are still the same. I am working on a video and I call GPU kernel every frame. So if every time I call the first malloc is taking a lot of time then the performance of my implementation will be very bad. Is there any way to avoid this??

Regards

Lev · December 22, 2010, 2:01am

call malloc once at start of a program.

Topic		Replies	Views
Cudamalloc affects the delay of cudalaunchkernel CPU launching latency CUDA Programming and Performance cuda , kernel	2	723	November 30, 2021
CudaMalloc is taking huge time for first time, How to overcome this issue CUDA Programming and Performance cuda	1	1067	April 12, 2021
Questions about cudaMalloc Questions about runtime for cudaMalloc and cudaMemcpy CUDA Programming and Performance	1	3344	June 23, 2009
Calculate time ? CUDA Programming and Performance	5	2819	November 23, 2008
Why does cudaMalloc time depends on kernel calling? cudaMalloc takes more time if you call a kernel CUDA Programming and Performance	3	11856	August 31, 2009
cudaMemcpy on 9800GTX2 CUDA Programming and Performance	10	11626	December 18, 2008
Help! First cudaMalloc takes 10 seconds! CUDA Programming and Performance	8	1514	February 11, 2012
Memory Allocation Time Takes too much time!! CUDA Programming and Performance	3	4587	August 28, 2009
Help regarding slow cudaMalloc CUDA Programming and Performance	9	9869	November 29, 2008
cudaMalloc takes several seconds CUDA Programming and Performance	6	2519	August 13, 2013

cudaMalloc's taking different times

Related topics