Non coalesced memory access on cuda visual profile


I am writing a kernel and I suspect that I am not able to get peak bandwidth because of some non-coalesced memory access. From previous posts I understand that I can find the number of non-coalesced memory using the visual profiler. I see some settings in the configurations tab in settings. But I don’t understand meaning. Also, I get an error saying “Minimum expected columns not found in profiler output file”.

Can someone help me with this or point me to some place where I can find this info? Also, Is there some other way to find the number of non-coalesced memory accesses and bank conflicts in the kernel?


for non-coalesced accesses you want to select at least the non-coalesced reads & writes
if you get the error you see, it means that either:

  • your program ran too long, and the visual profiler ‘timed out’ (you can change how long it needs to wait)
  • your program changes the current directory before starting the kernel (the profiler expects the output in the directory the program was started in)

bank conflicts are not measured by the profiler, there are macros in the SDK examples that you can use to find out if you have them. However, bank conflicts are approximately the last thing to optimize.

Doesn’t “warp_serialize” give an indication on the bank conflicts? More specifically, if one has 0 serialized warps, doesn’t that mean that no bank conflicts occured?


I believe that if you have zero serializations, you have no bank conflicts. However, if you have some number of serializations, you may or may not have bank conflicts (as there are other things that can cause a warp to serialize).

My timeout settings are long enough for the program to run. Also, the program does not change the current directory. In fact, I get this error message only when I enable gld, gst stats in the configuration. If those are not enabled, the program runs fine.

Any help will be appreciated.


The profiler seems to work when the gld and gst in configurations tab are turned off. But when I turn it on I get the error 94. I am testing this on a Mac.

I read somewhere that detailed profiling does not work for some cards. I am using Nvidia 8800 GS. Does detailed profiling work with this?


G200 based cards don’t properly report uncoalesced loads to the profiler: all counters are zero.

You should have checked the release notes… profile counters are not supported on Mac yet.