Calling kernel in a loop spends much time in cudaFree

jsyu3799 · July 16, 2018, 4:21am

hello, i have one problem about calling kernel in a loop.

now i use a kernel in a for-loop.
when i choose the number of iteration by small value, it is no problem.
But when i increase the number of iteration, the computation time is increased particularly in cudaFree.

so my question is following

Is a calling kernel in a loop related to spend much time in cudaFree or cudaDeviceSynchronize?

gpu : gtx1060
memory size : 180000 * sizeof(float)
thread : 1024
block : about 177

thank you for your any answer.

cbuchner1 · July 16, 2018, 9:02am

cudaDeviceSynchronize() waits for the GPU to finish before allowing the CPU thread continue. This may involve a polling busy loop (100% utilization of one CPU core) to achieve the lowest possible latency.

cudaFree() probably implicitly synchronizes the CUDA context as well because altering the memory heap on the device while it’s still computing would be unacceptably risky.

It’s likely the time spent in these API calls is just waiting for the GPU to finish. When you say that there is no problem in low iteration counts, it may also be that you are hitting some kind of limit on the kernel launch queue that causes blockage in larger iteration counts. Impossible to tell without knowing details of your kernel (i.e. how long does it compute for one iteration)

In general it is good advice to keep heap allocations out of tight compute loops. It’s better to allocate enough memory for your use case once and reuse that in the inner loops over and over. In performance critical cases you might have to allocate several buffers in page locked host memory, making use of CUDA streams to overlap memory transfers and compute.

Topic		Replies	Views
cudaFree() error + loop CUDA Programming and Performance	1	6684	April 1, 2010
cudaFree painfully slow CUDA Programming and Performance	4	4588	January 29, 2010
Looping kernel calls Unspecified launch error on cudaFree() ?? CUDA Programming and Performance	5	1738	May 13, 2009
cudaFree extremely slow CUDA Programming and Performance	15	2200	February 6, 2020
cudaFree time linearly depends on cublas call CUDA Programming and Performance	3	1048	March 26, 2013
cudaFree is slow CUDA Programming and Performance	5	2836	November 13, 2010
cudaFree while kernel is executing CUDA Programming and Performance	1	9103	February 15, 2011
cudaFree takes approx 99.5% of total time. CUDA Programming and Performance	2	1630	April 11, 2018
about latency to free device memory CUDA Programming and Performance	3	5556	February 18, 2008
cudaFree in parallel with CUDA kernel CUDA Programming and Performance	1	4182	December 29, 2010

Calling kernel in a loop spends much time in cudaFree

Related topics