This is just a general question to double check my thoughts, so I hope I am in the right place.
With the Tesla 1u units where you have two cards to every PCI slot, this presumably means the maximum transfer rate is shared between the two cards. Therefore if a computation if bandwidth limited, there is nothing to be gained by using a second card if it is plugged into the same PCI slot?
There can still be a benefit. You still get full 16x transfer performance to each card individually, so as long as there are times while one process stops transferring to run a kernel on GPU 1 the other can copy data to GPU 2 without any penalties.
So unless your “bandwidth limited” application is perfectly 100% streamed using overlapping async operations, there could be a benefit to using the 2nd card. The ideal case is if you have a 50% or less duty cycle between transmission of data and running of non-overlapped kernels as then neither process will see a slowdown from sharing the PCIe slot.