why any CUDA program takes more than 1s? driver initialization time?

I am running CUDA 2.1 on Linux. I noticed that every CUDA program takes more than 1s, even the simplest toy program. It seems there is some overhead time in a CUDA program. Is it the initialization of CUDA driver or something like that?

It shouldn’t take more than 1 second. There is a kernel call overhead though, I think it’s ~10 µs (which is probably negligible in many cases).

So there might be something wrong with my system? We have several different machines here. All of them have the same problem. Even the smallest program would take a minimum of 1s. They are all Linux boxes, though. But the driver, OS and hardware are different.

Anyone has the same problem?

Could you post the code you are using to time your kernels, etc ? I had no such problems on 64 bit RHEL5.3.

Yes, there is a certain amount of driver initialization time. This is normal. But it does seem to vary from machine to machine.

On CentOS 5 with a Tesla D870 running CUDA 2.1 and no X windows:

time ./deviceQuery --noprompt

2.7s

This time is repeatable (about 3s) for any of the quick SDK samples, like scan.

On gentoo with a 9800 GX2 / CUDA 2.2 beta / running KDE4:

time ./deviceQuery --noprompt

0.007s

That’s exactly what I am talking about. We have EL5.2, CentOS and Fedora 9 here. They all have more than 1s initialization time. Maybe I should try some distribution that is not from Redhat.

Initialization from a console is slow. I think we have a workaround coming with 2.2 final, but I need to check on that…

Humm… Interesting. Could you tell me why it is slow from console? Just curious.