Partition Camping for Memory Bandwidth Optimization

I would like to write a code to demonstrate the improvement in bandwidth due to partition camping
Can anyone suggest a simple example to do that except Matrix Transpose as in SDK

Will it work for 8x and 9x series?

Spawn as many blocks as there are MPs. Make sure each block is dense enough to fully utilize the MP (i.e majority of smem or regs)… And then try accessing global memory in such a way tht blocks will interfere (all blocks reading same memory region), blocks will not interfere etc…

Thank you Sarnath
I am trying to do that but I still have a doubt. I am using GeForce 9600 GT card which has 8 multiprocessors and global memory has 6 partitions.
Does this imply that some active blocks will always interfere leading to partition camping?

Whether blocks interfere OR not depends on what memory address they are accessing…

but yes if you have 6 buckets and 8-accesses, you may find at least 2 partitions experiencing collision from 2 blocks… (or 1 partition experience collision from 3 blocks)