Trying to understand some cudaprof counters gld efficiency and gst efficiency


I don’t understand how I can get an efficiency like this:

gld efficiency gst efficiency

10.042 40.6479
0.217913 1873.17
1 11.2941
1 8.47059
1 12
1 2.34146
1 48
1 10.6667
1 6
0 0

I thought efficiency went from 0 to 1 (nº request / nº trans), but I must be making a mistake.
Then, what would be the maximum efficiency value?

About divergent branches, if you have a code with an “if” without “else”: if (gid < N), then you don’t have two paths of execution. I wonder how bad is that for the performance, because this kind of things are counted like divergence branches.

Thank you.