cuInit taking a long time? cuInit taking a second

Hi,
I’m somewhat new to CUDA, but I’m working on a FCT implementation on the GPU. I just noticed that for some reason cuInit is taking about a second to run.

I use the following code portion to time cuInit:

gettimeofday(&tm0, NULL);
CUresult result = cuInit(0);
gettimeofday(&tm1, NULL);
printTimeDiff(“Choose Device”, tm1, tm0);

where printTimeDiff prints the difference from tm0 to tm1 in seconds.
I get the following output:

./FCT
Choose Device 1.269787

Does cuInit always take this long, or is there some sort of error?
Also, I’ve noticed that if I don’t use cuInit and I call cudaMallocPitch, instead of getting the CUDA_ERROR_NOT_INITIALIZED error that I should get, the malloc takes over a second.

The first call to any cuda function initializes the driver and copies a bunch of stuff to the card, it is an overhead that cannot be avoided. I’ve never timed it on my machine, but over 1 s seems a little long to me: What is your GPU, Mobo, OS and anything else about the hardware you think might be relavant?

Thanks for the reply. I’m not sure of the exact machine specs, as it is a school owned computer that I ssh into, but it is running dual 8800 GTXs on some sort of DELL (probably an XPS). The OS is SUSE Linux.

Does anyone else have an idea of what the problem could be?

Use the bandwidth example to test the link speed.

ok, I’ll try that as soon as I can figure out why -lcutil, -lcuda, and -lcudart aren’t working correctly.

Thanks for your suggestions, but my professor finally figured out that it is a problem with the stability of the certain version of linux that we are using. Reading the device file takes really long.