Does anyone else have experience of seeing quite different memory bandwidth to main memory based on the overall distribution of the memory access - and I’m talking about random reads that are likely to be sufficiently far apart that coalescing is almost never going to happen…
That is, suppose you were to randomly pull uint32’s from 64MB vs. 512MB - would you expect to see similar performance for these cases? As it happens, I saw a quite substantial performance drop-off in a case like this (factor of 2-5x).
The tentative explanation that I have is that this is due to memory architecture; that I’m seeing something like the latency between switching rows in each memory bank… or something. This is very much outside my area, and I don’t know much about how high level concepts (bits of my memory addresses) translates to rows, columns and banks (although I’ve seen detailed documentation for the GDDR3 memory used on one of the cards we have).
Anyone know anything about this? Am I way off base on the explanation here? I’m pretty sure that coalescing isn’t it…