When I profile my kernel: I got this message:
This kernel has uncoalesced shared accesses resulting in a total of 25165824 excessive wavefronts (43% of the total 58720256 wavefronts). Check the L1 Wavefronts Shared Excessive table for the primary source locations. The CUDA Best Practices Guide has an example on optimizing shared memory accesses.
I’m not sure what is ''uncoalesced shared accesses". I know that for global memory, coalesceing accesses means making neighboring threads read neighboring elements with a proper starting address. However, from what I knew, different banks in shared memory can be accessed simultaneously. So I don’t understand what causes the ‘uncoalesced shared accesses’.
I have some guesses: maybe it is caused by bank conflicts, but if so, why the ncu report report the bank conflict and the uncoalesced shared accesses in two different sections?
I think the best way to interpret it as “probably bank-conflicted”. The “excessive” term here means that for whatever reason, the request results in more than the minimum transactions (usually, 1 or 2 for shared memory) needed to satisfy a request. In the modern profiler-speak, requests and transactions are mapped to different words. Transactions => wavefronts.
Detailed profiler questions could be asked on a profiler forum. You can also find more involved descriptions of how to determine shared activity from the profiler.
Coalesce for shared memory means threads coming together in the same wavefront through each thread using a different bank or threads using broadcast on the same bank. Excessive wavefronts are due to bank conflicts. The counters count additional wavefronts due to vector load/stores (e.g. 64-bit, 128-bit ) as part of ideal wavefronts.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.