I was measuring memory bandwidth performance in my kernel and it was too low, so I decided to write a little test kernel to see what was the effective max memory bandwidth that I was getting. I measured a little over 50%, which makes me think I’m doing something wrong. Perhaps I’m not measuring this correctly. All I do is move a chunk of data from global to on chip memory and then measure how low that took. I’m using an 8800GTX.
What percentage of the advertised bandwidth should I be expecting?