Memory transactions

I have analyzed a program and here are the results for transactions:

Shared memory load transactions per request = 2.6
Shared memory store transactions per request = 1.1
Global load transactions per request = 33.1
Global store transactions per request = 0
Share memory efficiency = 22%

I have read [1] and it is stated that lower transaction numbers are preferred. Assuming each transactions accesses 4 bytes, 33 transactions actually access 132 bytes. Is that considered to be bad? Is is true to say that bank conflict is high global memory?
The kernel code is not mine.

[1] https://docs.nvidia.com/gameworks/content/developertools/desktop/analysis/report/cudaexperiments/kernellevel/memorystatisticsglobal.htm

It depends. In scenarios where it’s hard to get global memory accessing coalesced, one might need to switch caching load strategy for a cure, you could refer to link 1 p.44 for further details.

BTW, bank conflict is only used to describe the behavior of shared memory or register(in the latter case, people usually explicitly say register bank conflict for distinguishing). It is not used for global memory, use (non-)coalescing instead.

  1. http://on-demand.gputechconf.com/gtc/2013/presentations/S3466-Programming-Guidelines-GPU-Architecture.pdf
  2. Programming Guide :: CUDA Toolkit Documentation