What's a reasonable memory bandwidth performance to expect? My current maximum is only around 50

I was measuring memory bandwidth performance in my kernel and it was too low, so I decided to write a little test kernel to see what was the effective max memory bandwidth that I was getting. I measured a little over 50%, which makes me think I’m doing something wrong. Perhaps I’m not measuring this correctly. All I do is move a chunk of data from global to on chip memory and then measure how low that took. I’m using an 8800GTX.

What percentage of the advertised bandwidth should I be expecting?

How you access the data makes a difference. You need to do some loop unrolling, select the right data type, etc. See this for an implementation that gets close to the max: http://code.google.com/p/thrust/source/bro…/trivial_copy.h