C1060 slower than S1070?

We have two HPC systems, one has two desktops connected to one S1070-400 and the other is two desktops each having two C1060 cards. We tried to run the same bench mark program using ONE GPU on both systems and observed that C1060 is about 10% slower than S1070, though we expected the performance should be the same.
The desktops have the same CPUs and other configurations, except the GPUs. The graphics card information is the same:
Driver version: 197.03
CUDA Cores: 240
Graphics clock: 610 MHz
Processor clock: 1296 MHz
Memory clock: 800 MHZ(160 MHz data rate)
Memory interface: 512-bit
Memory: 4096 MB
Bus: PCI Express x16 Gen2

The only difference I can see is Video BIOS version, C1060 has and S1070 has

Anybody have the same experience or could give us some idea why C1060 is slower?

Many thanks.

There are two models of S1070: the S1070-400 with the same clocks of the C1060 and the S1070-500 with faster shader clock ( around 10%)

What we have is S1070-400, the clock is the same…

  1. I believe that the s1070 has a slightly higher clock than the c1060. According to the specs, the s1070 is 1.296 to 1.44 GHz and the c1060 is 1.3 GHz

  2. If you are also counting PCIe bandwidth, connecting the s1070 to two machines means one interface card per machine which means one PCIe slot per machine. The s1070 uses a 16 to 32 lanes PCIe switch internally, which means that if you are using only one card you get the full PCIe bandwidth. Putting two c1060 in one machine means that you are using two PCIe slots and then the performance depends on the motherboard. Some switch to 2x8 lanes mode (slowest), some have a 16 to 32 switch, which means same performance as s1070 (more common in systems with 4 or 8 PCIe slots) and some have two PCIe controllers (fastest in for your setup). What you need to check is that you are not dropping to 2x8 mode (usually reported in the bios)

Thanks laughingrice.

S1070-400 has the GPU processor clock of 1.296GHz and S1070-500 has the clock of 1.44GHz. What we have is S1070-400, so it should be the same as C1060.

I checked the bios of the systems and didn’t find anything regarding PCIe speed. But from the NVidia control panel, the two systems are all running on PCIe x16, as shown in the snapshots attached.
s1070.bmp (262 KB)
c1060.bmp (262 KB)

You can try running the program in the profiler (or enable command line profiling). All you probably need is the timings for now to try and see where the difference in speed occurs (just CUDA, just memcpy, all across the board …). It would help to try and pinpoint the problem.