No performance gain after 0 bank conflicts

Hi All,

I removed bank conflicts in my program and I’m checking that through CUT_BANK_CHECKER which reports 0 bank conflicts in my kernel code. But I got no performance gain after that !!

Has any anyone encountered this before or have any clue what’s going on?
I would appreciate any help, I’m kind of stuck with this issue …

Thanks Much.

If you access global/local memory a lot, then your kernel runtime might be dominated by global/local memory latency. Everything else (arithmetic, shared memory access) would happen while waiting for this latency, and so speeding up arithmetic and/or shared memory access wouldn’t decrease kernel runtime.