Long initialization time C1060

Adam_Simpson · August 6, 2009, 3:54pm

We just received or new Fedora 10 workstation which contains dual C1060’s which I access through SSH. After running a few tests I noticed that it seems to take ~1.7 seconds to run the first cudaMalloc which seems quite high from what I was able to find elsewhere. Can anyone confirm/deny that it is normal to take ~1.7 seconds to do the initial cudaMalloc on the C1060? I have made a simple sample that demonstrates this, the result is ~1.7 seconds. Thanks

I compile the program as such:

nvcc -O3 MallocTest.cu

//MallocTest.cu

#include <iostream>

#include <cuda.h>

//Timers

#include <sys/time.h>

timeval startTime, stopTime, totalTime;

int main(void)

{

   float *a_d; // pointer to device memory

   int N = 1024;

   size_t size = N*sizeof(float);

//Get Starting Timer

   gettimeofday(&startTime, NULL);

// allocate array on device

  cudaMalloc((void **) &a_d, size);

//Print total time

  gettimeofday(&stopTime, NULL);

  timersub(&stopTime, &startTime, &totalTime);

  std::cout << "Wallclock time  : " << totalTime.tv_sec + totalTime.tv_usec/1000000.0 << " seconds." <<std::endl;

//cleanup

 cudaFree(a_d);

}

kalman · August 6, 2009, 9:01pm

We just received or new Fedora 10 workstation which contains dual C1060’s which I access through SSH. After running a few tests I noticed that it seems to take ~1.7 seconds to run the first cudaMalloc which seems quite high from what I was able to find elsewhere. Can anyone confirm/deny that it is normal to take ~1.7 seconds to do the initial cudaMalloc on the C1060? I have made a simple sample that demonstrates this, the result is ~1.7 seconds. Thanks

I compile the program as such:

nvcc -O3 MallocTest.cu
//MallocTest.cu

#include <iostream>

#include <cuda.h>

//Timers

#include <sys/time.h>

timeval startTime, stopTime, totalTime;

int main(void)

{

   float *a_d; // pointer to device memory

   int N = 1024;

   size_t size = N*sizeof(float);

//Get Starting Timer

   gettimeofday(&startTime, NULL);

// allocate array on device

  cudaMalloc((void **) &a_d, size);

//Print total time

  gettimeofday(&stopTime, NULL);

  timersub(&stopTime, &startTime, &totalTime);

  std::cout << "Wallclock time  : " << totalTime.tv_sec + totalTime.tv_usec/1000000.0 << " seconds." <<std::endl;

//cleanup

 cudaFree(a_d);

}

Is your driver nvidia loaded ? Try with lsmod | grep nvidia, normaly if you are at run level 3 (no X up) each time

you start that code first of all the driver is loaded and the it takes some time.

Adam_Simpson · August 6, 2009, 10:03pm

Yes my runlevel is 3 and lsmod | grep nvidia returned:

nvidia			   9679432  0 

i2c_core			   29216  2 nvidia,i2c_i801

It appears that the driver is not loaded as you said if I am reading the above correctly? Is this pretty much the nature of using a Tesla series card or is there any way to get around it? Thanks for the help.

tmurray · August 6, 2009, 10:06pm

I think you can run nvidia-smi in a loop in the background to get around this. Lower the sampling rate to ensure it takes a minimum of CPU usage, though.

(there’s a non-trivial amount of time required for the first device to attach to the driver, and that’s reset when all devices have detached)

Topic		Replies	Views
Help! First cudaMalloc takes 10 seconds! CUDA Programming and Performance	8	1496	February 11, 2012
cudaMalloc() time difference cudaMalloc() takes different times on (nearly) identical GPUs CUDA Programming and Performance	4	1480	October 17, 2009
Initialization time on GTX 460 CUDA Programming and Performance	17	8577	November 9, 2011
really slow cudaGetDeviceCount() several seconds to complete a cudaGetDeviceCount() call CUDA Programming and Performance	3	1212	May 18, 2011
Slow CUDA programs' startup CUDA Programming and Performance	10	7264	January 23, 2012
cudaMalloc taking 4 seconds CUDA Programming and Performance	4	801	November 23, 2011
slowness of first cudaMalloc (K40 vs K20) CUDA Programming and Performance	2	852	October 29, 2015
CUDA initialization takes long time that varies up to 30 seconds on Amazon p3.16xlarge Windows machi... CUDA Programming and Performance	5	1582	December 8, 2019
CudaMalloc is taking huge time for first time, How to overcome this issue CUDA Programming and Performance cuda	1	1048	April 12, 2021
slowness of first cudaMalloc (K40 vs K20) CUDA Programming and Performance	0	671	October 28, 2015

Long initialization time C1060

Related topics