I have developed a modelling code using CUDA 3.1 and drivers 257.21 and much to my suprise found out the when I upgraded to CUDA 3.2 with their drivers 263.07 the same code takes about 3x longer to calculate !
To check what is going on I have downgraded the drivers to 257 and compiled my code by changing environment settings to point to the “c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.1” paths. After profiling the code I have upgraded the drivers to 263 and run the same 3.1 code. Next I have compiled the code using environment pointing to 3.2. In all cases the code was compiled for production.
These are the benchmars using my modelling program (time for a single step of the model):
CUDA 3.1 and Drivers 257.21: 2 - 3 seconds
CUDA 3.1 and Drivers 263.06: 7 - 8 seconds
CUDA 3.2 and Drivers 263.06: 7 - 8 seconds
Apparently the culprit is the Drivers 263.06. So instead of the expected boost in performance I have achieved the opposite !
Judging from the new version of the profiler that comes with CUDA 3.2 it is the CUDA API calls that slow down the calculations.
You can see this effect in the enclosed graphs from the profiler. They are labelled as c31 or c32 and d257 or d263.
As it can bee seen the API calls in 263 are 3x slower than in 257. One can see also that there is a difference between c31 and c32 with d263.
Can anything be done to bring the drivers back to their speed ?
I have been using these drivers: