about latency to free device memory

astrox · February 16, 2008, 7:31am

This is a newbie’s question.

I wonder how much time is needed to free the device memory by “cudaFree”.
Is it asynchronous and is it depending on the allocated memory size?
Recently, I came to know that the latency of it could be a significant factor to the performance of my applications. <img src=‘http://hqnveipbwb20/public/style_emoticons/<#EMO_DIR#>/crying.gif’ class=‘bbc_emoticon’ alt=‘:’(’ /> .
In some cases, the time consumed to deallocate the device memory is around 10-100 microseconds and this seems ok.
But sometimes, it takes about 1000-2000 microseconds.
This huge latency occurrs randomly and makes the performance of my code down seriously.

I have measured the timing by the cutCreateTimer, cutStartTimer, and cutGetTimerValue functions and whether or not using threadSync does not help.

Is there anyone who knows the expected time for the cudaFree?

Thank you very much in advance.

Juhan

wumpus · February 16, 2008, 11:51am

Like I posted to some other topic recently: if performance is of the issue, always use your own memory pool that you can optimize for your own allocation patterns, Just grab a big block (or several) at the beginning of your program. Never rely on operating system alloc() and free() to be fast, and be sure to never use them in inner loops.

astrox · February 17, 2008, 2:45am

Oh, that’s good idea. My CPU program already has that kind of memory management tool, and I have to do similar job for the same thing on the GPU memory.

Thanks again,

Juhan

Sarnath · February 18, 2008, 6:46am

Juss to share my experience. I had an CPU loop (that runs around 1000 or 2000 times) that had 2 calls to cudaMalloc(). It used to take “seconds” (like 20 or 40 seconds) for that loop itself to complete. So, When I did one massive allocation and shared it among the 1000 iterations – I found that it was just taking a few milliseconds.

Topic		Replies	Views
cudaFree extremely slow CUDA Programming and Performance	15	2223	February 6, 2020
cudaFree is slow CUDA Programming and Performance	5	2847	November 13, 2010
Why cudamalloc and cudaFree so expensive? CUDA Programming and Performance cuda	7	2916	November 14, 2020
cudaFree painfully slow CUDA Programming and Performance	4	4597	January 29, 2010
16GB cudaMalloc() on A10 (24GB) takes ~300-400ms after previous cudaFree CUDA Programming and Performance tensorrt , cuda , driver	7	528	February 7, 2024
Calling kernel in a loop spends much time in cudaFree CUDA Programming and Performance	1	781	July 16, 2018
cuMemAlloc/cuMemFree perfomance Their implementation has anything to do inside the device? CUDA Programming and Performance	3	1678	February 25, 2009
cudaFree time linearly depends on cublas call CUDA Programming and Performance	3	1056	March 26, 2013
cudaFree() error + loop CUDA Programming and Performance	1	6688	April 1, 2010
cudaMalloc, cudaFree speed CUDA Programming and Performance	2	3617	April 4, 2013

about latency to free device memory

Related topics