Variable latency/bandwidth of main memory

Does anyone else have experience of seeing quite different memory bandwidth to main memory based on the overall distribution of the memory access - and I’m talking about random reads that are likely to be sufficiently far apart that coalescing is almost never going to happen…

That is, suppose you were to randomly pull uint32’s from 64MB vs. 512MB - would you expect to see similar performance for these cases? As it happens, I saw a quite substantial performance drop-off in a case like this (factor of 2-5x).

The tentative explanation that I have is that this is due to memory architecture; that I’m seeing something like the latency between switching rows in each memory bank… or something. This is very much outside my area, and I don’t know much about how high level concepts (bits of my memory addresses) translates to rows, columns and banks (although I’ve seen detailed documentation for the GDDR3 memory used on one of the cards we have).

Anyone know anything about this? Am I way off base on the explanation here? I’m pretty sure that coalescing isn’t it…

or TLB entries thrashing in their cache?

Good point. I had forgotten that there’s some sort of address protection stuff going on… hmmn, yet another possibility.

Hi Geoff,

My investigation showed a memory page size of 32Kb where accesses are the same speed as consecutive but misaligned access. There is also an undocumented effect something to do with the arbitration for global memory access that can make more than 2x difference in performance just by moving an opcode around or a slight difference in occupancy or code segment length between global accesses. I consider this a hardware bug and of course it is not documented!