C1060 slower than S1070?

We have two HPC systems, one has two desktops connected to one S1070-400 and the other is two desktops each having two C1060 cards. We tried to run the same bench mark program using ONE GPU on both systems and observed that C1060 is about 10% slower than S1070, though we expected the performance should be the same.
The desktops have the same system (windows xp 64bit), same CPUs and other configurations, except the GPUs. The graphics card information is the same:
Driver version: 197.03
CUDA Cores: 240
Graphics clock: 610 MHz
Processor clock: 1296 MHz
Memory clock: 800 MHZ(160 MHz data rate)
Memory interface: 512-bit
Memory: 4096 MB
Bus: PCI Express x16 Gen2

The only difference I can see is Video BIOS version, C1060 has 62.00.62.00.07 and S1070 has 62.00.62.00.09.

Anybody have the same experience or could give us some idea why C1060 is slower?

Many thanks.

P.S. I meant to post here but posted in “CUDA on Windows XP” board by error
Can anyone tell me how to delete the post there…

There are two models of S1070: the S1070-400 with the same clocks of the C1060 and the S1070-500 with faster shader clock ( around 10%)

What we have is S1070-400. So the clock rates are the same as C1060.