5000ms for warm up?

Hello!
Recently I encountered a strange problem that everytime I launch my program, it takes some 5000ms for the first cudaMalloc and after that does everything seems ok. And I run some SDK samples and also found 5000ms lantency exists before the program really takes effect. Is that something wrong with the driver? My GPU is S870.
Thanks~

It is initialization overhead for the context setup and such (I believe it is needed for each GPU)
I believe NVIDIA is working on getting it down.

Specifically, it was mentioned that 1+ second startup times are an issue running on a linux console w/o X windows. Startup times with X windows are ~100ms. It was mentioned that CUDA 2.2 might improve this. If I have the time, I might try out timing the beta later today.