cuda device initialization very slow in ubuntu 8.04 with new driver different driver / card combos t

So on a Ubuntu 8.04 system, I was running the 195.36.15 driver, and using Cuda 2.3. Initially I was using a GTS 250 card and everything was lovely, and cuda would initialize in around 200 ms ( for the first cudaMalloc call ).

Now the GTS 250 is essentially discontinued, but we must still run on Ubuntu 8.04 for a little while. A possible card that we want to use would be a GTS 450. However we need a newer driver for this card to work. So now my new setup is:

Ubuntu 8.04, driver 260.19.21, and Cuda 3.2.16, even though Cuda 3.2 is mean for Ubuntu 10.04.

The driver works, and initializing the driver DOES work, however the first cudaMalloc will now take 3-4 seconds if using a GTS 450. If I use a GTS 250 in this setup everything is quick and works as before.

Is there anything I can do to remove this delay? Ideally we want this to be a minimal as possible and it would take a bit of work to re-architect my program so I initialize CUDA earlier, so a user does not experience the delay of initialization.