cudaMalloc taking 4 seconds

spwanasin · November 20, 2011, 3:44pm

Hello,

I just started to analyse one one of my cuda programs with NVIDIA Nsight and noticed that cudaMalloc is taking 4 seconds to complete. So I started to comment out the program to find exactly where to problem was and I found that even for just 2 line program it was taking ~~1-4 seconds, regardless of the number of times I use cudaMalloc, or the size of the allocated memory.

int main(void)
{
int test;
cudaMalloc((void**)&test,sizeof(int));

}

But then I noticed something that was even weirder, if i compiled the program 2-3 times cudaMalloc’s time shortens to about 0.1 - 0.4 seconds. But 0.1 seconds to just allocate an integer is a long time.

If anyone has any advice, please share.

Thanks!

L_F · November 22, 2011, 8:31pm

Have you initialized the device (e.g. cuInit(0)) before allocating?

spwanasin · November 23, 2011, 12:14am

No, I didnt have cuInit(0) in my program because I have never seen that used in any example. So I added the line “cuInit(0);” to the beginning of my program and nothing changed.

I have also tried

"cudaSetDevice(0);

cudaThreadSynchronize();"

which shifts the 4 second overhead time to the cudaThreadSynchronize call.

I have been searching all over the internet trying to figure out a fix and still no luck. I know that there are other people with this same problem, because I have found threads about it but no real answers.

This article really summarizes what I’m experiencing in the “warm up” part

The guy reports that there are an initializing overhead for cuda of 3-5seconds. !!! Are you experiencing this? Because I’m calling shenanigans that there is a 3-5second overhead for everyone.

Please help me out,

Thanks!

tmurray · November 23, 2011, 12:28am

what OS and what GPU?

spwanasin · November 23, 2011, 5:40am

OS: Windows 7 Pro. 64bit

GPU: EVGA GTX 560TI 2GB

The device query can be found in the attached file

Topic		Replies	Views
cudaMalloc takes several seconds CUDA Programming and Performance	6	2494	August 13, 2013
Help! First cudaMalloc takes 10 seconds! CUDA Programming and Performance	8	1495	February 11, 2012
CudaMalloc is taking huge time for first time, How to overcome this issue CUDA Programming and Performance cuda	1	1012	April 12, 2021
5000ms for warm up? CUDA Programming and Performance	2	1757	April 6, 2009
Is cudaMalloc slow when called multiple times? CUDA Programming and Performance	3	142	July 5, 2024
Slow Initialization CUDA Programming and Performance	7	2697	July 30, 2009
First cudaMalloc() takes long time? CUDA Programming and Performance	13	16962	April 23, 2021
Question about GPU Memory Overhead with Cudamallocmanaged CUDA Programming and Performance	7	980	August 21, 2024
cuda is really slow - even when doing nothing CUDA Programming and Performance	10	2362	September 3, 2010
Long initialization time C1060 CUDA Programming and Performance	3	1162	August 6, 2009

cudaMalloc taking 4 seconds

Related topics