I removed bank conflicts in my program and I’m checking that through CUT_BANK_CHECKER which reports 0 bank conflicts in my kernel code. But I got no performance gain after that !!
Has any anyone encountered this before or have any clue what’s going on?
I would appreciate any help, I’m kind of stuck with this issue …
Yes, I have encountered this before. Bank conflicts present only a minor overhead, so if your app is limited by global memory or FLOPs you will not notice a performance difference.
Please do not cross-post to every single CUDA forum. It is very annoying to read the same question again and again. It will not get you an answer any faster.