Unexplained low gld_efficiency


I am currently optimizing a kernel but I have an issue the gdl_efficiency returned by nvvp or nvprof.

When running the profiler I have an gld_efficiency of 15%, which is very low, but when I am looking at global memory access pattern with nvvp, the profiler tells me that “No issue has been found”.
I checked the code and I cannot not find any uncoalesced memorty read access.

Is is possible to have a very low gld_efficiency but “no issue” in global memory access pattern.

I am using an NVidia quadro K4200 (CC 3.0) with CUDA 7.5

Thank you very much for your help.
Best Regards


Hi again.

My post does not seems to have a lot of success :( .
Do you know if there is another way to measure the gld_efficiency (without using the CUPTI api) , in order to confirm the previous measure ?

Thank you
Best Regards