GTX580 is 2x-1.5x SLOWER than GTX480

we first bought 6 single-GPU GTX480 cards, and found it’s slower a little bit than a two-GPU version of GTX295.
Then we bought 120 single-GPU GTX580 cards and hope they can be faster than the 480 and even 295, because many website said GTX580 is 20% faster than the GTX480.

For all our test, we put 6 cards on the same compter, run the exactly same program and input data.
If we use 6 GTX480, it takes 26 hours to finish. But if we use 6 GTX580, it takes 39 hours.
we use CUDA 3.2, because we failed to compile it with CUDA4.
The computer we used have 48Gb memory and two 6-core XEON CPUs. I use centos, and I always copy all the data into /dev/shm before I run the program. The program has 6 CPU threads to control 6 GPU individually.

DO somebody know what’s reason? Is there somebody to do a real test on GTX580 and get a 20% faster as said by NVIDIA?
by the way, the GTX580 is made by PNY. The GTX480 is made by Zotac.

Although you are using CUDA 3.2, are you at least using the latest NVIDIA drivers? The GTX 580 was released after the drivers that distributed with CUDA 3.2, so you should at least eliminate that potential problem.

I had changed to the new driver. But the speed looks the same slow.