Looking at the architecture of the Tesla S1070 unit, versus the Tesla C1060 unit, I wonder if the Device to Host (and Host to Device) bandwidth for the S1070 is only half of the bandwidth of the Tesla C1060.
Bear in mind that I am no hardware wizard. However, my first impression is that since each C1060 has it’s own PCIe x16 bus, while the T1070 (which is basically four C1060 bundled together) has only two PCIe x16 buses, the T1070 must have only half of the I/O performance compared to four C1060 units, each connected to its own PCIe x16 bus.
Am I missing something important here, or am I basically right?
If you hit both at the same time, yes, you effectively get PCIe 8x to each GPU. However, if you’re only hitting a single GPU at a time, you’ll get full PCIe 16x to the GPU. (BR04, the PCIe switch used, is a switch, not just a splitter)