Global memory store efficiency


I apologies for this newbie question.
I am unable to understand the Global memory store efficiency . Basically the difference between number of global memory store requests to total number of global memory store transactions is not clear. The two terms “requests” and “transactions” are confusing me(I assume whenever there is a request, it is followed by a transaction; every request leads to one transaction).


An therein lies the distinction which you have missed. There isn’t always a 1:1 relationship between requests and transactions. A single request might generate a number of transactions, depending on whether the coalescing rules can be met, and what the size being written is.

Got it! But I understand that in compute capability 1.2 devices Coalescing rules are relaxed…

Yes, which means that instead of a failure to coalesce resulting automatically in 16 transactions, it can now (with 1.2 devices) result in fewer transactions, but not necessarily one. The programming guide explains the situations where this can happen.

Thanks a lot!

So for 1 request could be up to 16 transactions. Therefore efficiency should be from 1/16 to 1. But why I can see sometimes efficiency about 2000 ?

Where can you see global memory load or store efficiency values >1?

Here. Profiler screenshot is also attached.

[codebox]global void dispKernel(int inters, int width, int height, int istep, float res, size_t rstep)


int x = blockIdx.x * blockDim.x + threadIdx.x;

int y = blockIdx.y * blockDim.y + threadIdx.y;

if ( x > winsz2 && y > winsz2 && x < (width - winsz2) && y < (height - winsz2) )


    int minInd = 0;

    int minDiff = 0x7FFFFFFF; // int max

    for(int e = 0; e < n; ++e)



    res[y * rstep + x] = minInd;




Is that the version 3.0 profiler? It has some obvious bugs in the memory throughput and efficiency calculations - I get global memory throughputs of 500Gb/s from my GT200 with that version (see here).

I think you can discount them as wrong.

Thank you for answer. Profiler version is 2.3.10