Strange performance on cuda 4.0?

Hi there.

I tested a code using 2 machines with same GPUs and different cuda versions.
I got such natural results as 37.9Gbps and 36.7Gbps.
It may well be that cuda4.0 performance is 1% higher than cuda2.0, I guess.

Next, I tried to test an another code using those 2 machines.
Strangely at this time, the results were 30.7Gbps and 19.9Gbps(64% diffrence)???

Why this differences happen? This performance on cuda4.0 is proper?

If anyone know the reason, please help me…

<Environment 1>
CPU : Core i7
OS : CentOS 5.5
GPU : GTX 285(with 1GB Global Memory)
cuda version : 4.0

<Environment 2>
CPU : Core i7
OS : CentOS 5.5
GPU : GTX 285(with 1GB Global Memory)
cuda version : 2.3