Hi again. i want to ask about strange result i have when i profile my program.
in my program i use 128 bytes short stak. I switched on local-memory counter, so i know exactly how much local memory are in use.
so, with local memory stack i have this (128 bytes lmem use):
and this with shared memory (0 bytes lmem use):
the last result have discouraged me. i do not use any global or local memory in the last case, so what does “gst uncoalesced” mean in that case?
the second strange moment is that local stack works faster then shared stack.
may be the shared-memory banks conflicts are the cause to that?