263.06 developer drivers are 3x SLOWER ! The speed of calculations is 3x slower when using new 2

I have developed a modelling code using CUDA 3.1 and drivers 257.21 and much to my suprise found out the when I upgraded to CUDA 3.2 with their drivers 263.07 the same code takes about 3x longer to calculate !

To check what is going on I have downgraded the drivers to 257 and compiled my code by changing environment settings to point to the “c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.1” paths. After profiling the code I have upgraded the drivers to 263 and run the same 3.1 code. Next I have compiled the code using environment pointing to 3.2. In all cases the code was compiled for production.

These are the benchmars using my modelling program (time for a single step of the model):
CUDA 3.1 and Drivers 257.21: 2 - 3 seconds
CUDA 3.1 and Drivers 263.06: 7 - 8 seconds
CUDA 3.2 and Drivers 263.06: 7 - 8 seconds

Apparently the culprit is the Drivers 263.06. So instead of the expected boost in performance I have achieved the opposite !
Judging from the new version of the profiler that comes with CUDA 3.2 it is the CUDA API calls that slow down the calculations.

You can see this effect in the enclosed graphs from the profiler. They are labelled as c31 or c32 and d257 or d263.
As it can bee seen the API calls in 263 are 3x slower than in 257. One can see also that there is a difference between c31 and c32 with d263.

Can anything be done to bring the drivers back to their speed ?

I have been using these drivers:
devdriver_3.1_winvista-win7_64_257.21_general.exe
devdriver_3.2_winvista-win7_64_263.06_general.exe

[attachment=24670:test_257_263.jpg]
I have developed a modelling code using CUDA 3.1 and drivers 257.21 and much to my suprise found out the when I upgraded to CUDA 3.2 with their drivers 263.07 the same code takes about 3x longer to calculate !

To check what is going on I have downgraded the drivers to 257 and compiled my code by changing environment settings to point to the “c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.1” paths. After profiling the code I have upgraded the drivers to 263 and run the same 3.1 code. Next I have compiled the code using environment pointing to 3.2. In all cases the code was compiled for production.

These are the benchmars using my modelling program (time for a single step of the model):
CUDA 3.1 and Drivers 257.21: 2 - 3 seconds
CUDA 3.1 and Drivers 263.06: 7 - 8 seconds
CUDA 3.2 and Drivers 263.06: 7 - 8 seconds

Apparently the culprit is the Drivers 263.06. So instead of the expected boost in performance I have achieved the opposite !
Judging from the new version of the profiler that comes with CUDA 3.2 it is the CUDA API calls that slow down the calculations.

You can see this effect in the enclosed graphs from the profiler. They are labelled as c31 or c32 and d257 or d263.
As it can bee seen the API calls in 263 are 3x slower than in 257. One can see also that there is a difference between c31 and c32 with d263.

Can anything be done to bring the drivers back to their speed ?

I have been using these drivers:
devdriver_3.1_winvista-win7_64_257.21_general.exe
devdriver_3.2_winvista-win7_64_263.06_general.exe

Can you post a repro case?

Can you post a repro case?

I will try to prepare a version of the code that would be suitable for this purpose