Now that I have the profiler working after adding -noprompt as an argument, I can see what’s going on. The profiler is giving me some results regarding coalesced and uncoalesced loads.
For example in method kernel2 in total
gld uncoalesced = 60762750
gld coalesced = 93318
This does not look good.
How can I identify where coalescing can be done? Is there a general rule or set of rules I can apply?
How will using texture and/or shared affect this?