This is a newbie’s question.
I wonder how much time is needed to free the device memory by “cudaFree”.
Is it asynchronous and is it depending on the allocated memory size?
Recently, I came to know that the latency of it could be a significant factor to the performance of my applications. <img src=‘http://hqnveipbwb20/public/style_emoticons/<#EMO_DIR#>/crying.gif’ class=‘bbc_emoticon’ alt=’:’(’ /> .
In some cases, the time consumed to deallocate the device memory is around 10-100 microseconds and this seems ok.
But sometimes, it takes about 1000-2000 microseconds.
This huge latency occurrs randomly and makes the performance of my code down seriously.
I have measured the timing by the cutCreateTimer, cutStartTimer, and cutGetTimerValue functions and whether or not using threadSync does not help.
Is there anyone who knows the expected time for the cudaFree?
Thank you very much in advance.