I am writing a kernel and I suspect that I am not able to get peak bandwidth because of some non-coalesced memory access. From previous posts I understand that I can find the number of non-coalesced memory using the visual profiler. I see some settings in the configurations tab in settings. But I don’t understand meaning. Also, I get an error saying “Minimum expected columns not found in profiler output file”.
Can someone help me with this or point me to some place where I can find this info? Also, Is there some other way to find the number of non-coalesced memory accesses and bank conflicts in the kernel?
for non-coalesced accesses you want to select at least the non-coalesced reads & writes
if you get the error you see, it means that either:
your program ran too long, and the visual profiler ‘timed out’ (you can change how long it needs to wait)
your program changes the current directory before starting the kernel (the profiler expects the output in the directory the program was started in)
bank conflicts are not measured by the profiler, there are macros in the SDK examples that you can use to find out if you have them. However, bank conflicts are approximately the last thing to optimize.
I believe that if you have zero serializations, you have no bank conflicts. However, if you have some number of serializations, you may or may not have bank conflicts (as there are other things that can cause a warp to serialize).
My timeout settings are long enough for the program to run. Also, the program does not change the current directory. In fact, I get this error message only when I enable gld, gst stats in the configuration. If those are not enabled, the program runs fine.