I’m testing the matrix transpose samples (transposeNew) and on the geforce gtx 285 for some reason I’m not seeing the effects of bank conflicts. Any explanation? In fact the no bank conflict code actually runs a bit slower
Your GTX275 has roughly 50% more memory bandwidth than a GTX275. You probably should be seeing numbers up over 100Gb/s for the no-conflict cases. What does the simple copy version of the transpose show on your card?
There is still the effect of partition camping in this example. I’ve gone up to around 120GB/s on a simple copy (give or take a bit depending on the version). Interestingly I get better copy performance utilizing textures by the way (on the tesla coalesced copy gets me 75GB/s, with textures 110GB/s).
I’ve also re-written the code myself with the same result and tested as well on a laptop with nvs 140m which does show bank conflict effects, and on linux with tesla s1070 which doesn’t, so it’s something with the architecture of the g200/t10 as far as I can tell and not the code. It’s just against all documentation and claims, so I don’t understand what’s happening (unless scheduling is able to hide bank conflicts somehow)