Not sure, if this forum is the proper place for profiler questions but I’d try.
I am new to CUDA, wrote my first kernel, profiler shown me about 90% of Global memory excess load, I understand, it’s because severely not coalesced memory access.
I changed memory read pattern, kernel became twice faster and profiler shows me Global memory excess load(%): -43.15 in memory throughput analysis window.
As I understand and doc confirms it should be in 0-100% range. Or I am wrong? How should I interpret this negative number?
I am using 4.0 RC2 tools and GTX460