cudaMalloc() time difference cudaMalloc() takes different times on (nearly) identical GPUs

V_Bala · October 16, 2009, 10:42pm

I have 2 systems, one with a Tesla c1060 and the oher with Tesla m1060. I am running identical programs on both. When I am doing cudaMalloc() (about 32Mb), the c1060 takes about 51ms, but the m1060 takes 1.4 seconds.

Anyone know what I should do to bring the time down on the m1060?

Thanks in advance for any help.

Bala

tmurray · October 16, 2009, 11:03pm

run nvidia-smi -l in the background while you are running your cudaMalloc, and you will probably see the disparity disappear.

V_Bala · October 16, 2009, 11:18pm

Thank you. I will try this.

Is there an equivalent cuda API call I can make from the application itself?

Thank you,

Bala

tmurray · October 16, 2009, 11:28pm

presumably you’re not running X on the M1060 box and are on the C1060 box. there’s a period of time to initialize internal driver state that gets torn down when there are no clients of the Linux driver, which is why that warmup time is there. there will be a utility to manage this in a future driver release.

V_Bala · October 17, 2009, 5:38pm

Thank you. That worked in bringing the cudaMalloc time down.

But there is still a difference in the kernel execution time. The program executes in a very tight loop, just calling the kernel for an iteration count that can go upto 1000. The kernel is about 4% slower.

Anything else I should run to bring everything on par between these two Tesla platforms?

Thank you,

Bala

Topic		Replies	Views
Long initialization time C1060 CUDA Programming and Performance	3	1162	August 6, 2009
Help! First cudaMalloc takes 10 seconds! CUDA Programming and Performance	8	1495	February 11, 2012
Questions about cudaMalloc Questions about runtime for cudaMalloc and cudaMemcpy CUDA Programming and Performance	1	3335	June 23, 2009
Very slow kernel launch after a number of kernel has been lauched. CUDA Programming and Performance	3	5584	June 7, 2010
slowness of first cudaMalloc (K40 vs K20) CUDA Programming and Performance	2	852	October 29, 2015
Extremely slow cudaMalloc CUDA Programming and Performance	2	12290	September 1, 2011
slowness of first cudaMalloc (K40 vs K20) CUDA Programming and Performance	0	1010	October 28, 2015
slowness of first cudaMalloc (K40 vs K20) CUDA Programming and Performance	0	768	October 28, 2015
slowness of first cudaMalloc (K40 vs K20) CUDA Programming and Performance	0	671	October 28, 2015
cudaMalloc taking 4 seconds CUDA Programming and Performance	4	801	November 23, 2011

cudaMalloc() time difference cudaMalloc() takes different times on (nearly) identical GPUs

Related topics