I am working on characterizing the performance of the gpu, but I am seeing some weird upticks on the exponential curve. Below is a chart that shows the optimal cost per element in a stream. I think in that chart, I am fixing the number of blocks to 400 that for this code gives an stable optimal performance.

In this next chart it is kind of a zoom in of the first part of thr graph with finer detail. Each column for a block size going 1 to 55. This time I varried the number of blocks from 1 to 512. That is what makes up the disribution of points in each block size column. The top bound is when the number of blocks is 1 and the bottom bound or optimal performance is 512.

[IMG]I am working on characterizing the performance of the gpu, but I am seeing some weird upticks on the exponential curve. Below is a chart that shows the optimal cost per element in a stream. I think in that chart, I am fixing the number of blocks to 400 that for this code gives an stable optimal performance.

So this is what is confusing me. On the graphs you can see for the first half (block size < 192) that after multiples of 16 there are sharp up ticks for the lower bound. It doesn’t make since to me why this is. There are 16 multprocessors, but I thought that only affected the performance based number of blocks. there are 8 stream processors per multiprocessor, but that isn’t exactly sixteen. Can someone tell me what is going on here?

The other thing that I am confused on is that for block size less than 192 there is a big difference between max and min performance that eventually convergers at 192. After 192, the bound between max and min is very tight regardless of the number of blocks. What is that the case?

Thanks for the help.