I apologies for this newbie question.
I am unable to understand the Global memory store efficiency . Basically the difference between number of global memory store requests to total number of global memory store transactions is not clear. The two terms “requests” and “transactions” are confusing me(I assume whenever there is a request, it is followed by a transaction; every request leads to one transaction).
An therein lies the distinction which you have missed. There isn’t always a 1:1 relationship between requests and transactions. A single request might generate a number of transactions, depending on whether the coalescing rules can be met, and what the size being written is.
Yes, which means that instead of a failure to coalesce resulting automatically in 16 transactions, it can now (with 1.2 devices) result in fewer transactions, but not necessarily one. The programming guide explains the situations where this can happen.
Is that the version 3.0 profiler? It has some obvious bugs in the memory throughput and efficiency calculations - I get global memory throughputs of 500Gb/s from my GT200 with that version (see here).