Ok, now I see what you are getting at.
First I was assuming he was swapping cards, but now that I read it again he has multiple gpu in same system.
Then again how does device to device memory transfer work ? If via PCI express then still shitty.
But I suppose there is a special sli connector between different cards ? In that case maybe device to device will not be bottlenecked by PCI express.
Also if you think about it… device 2 device test still doesn’t make any sense.
The bandwidth will be bottlenecked by the slowest card.
How to tell which card is the slow one ?
It will still require individual testing of each card seperately.
However you seem to indicate device 2 device transfer on a single card.
I still think your theory is incorrect.
The bandwidth you are seeing even on device 2 device in single card is the PCI express bandwidth and not GPU <-> GPU ram bandwidth.
No kernel is ever executed for the bandwidth test.
It will only show a difference if one of the cards is slower than PCI express bandwidth.
For now I assume both cards are faster than PCI express bandwidth… so both will be bottlenecked the same way for PCI express bandwidth… at least that’s my expected outcome of this test for his system… ;)