Global memory store efficiency

Kiran_CUDA · March 5, 2010, 2:57pm

Hi,

I apologies for this newbie question.
I am unable to understand the Global memory store efficiency . Basically the difference between number of global memory store requests to total number of global memory store transactions is not clear. The two terms “requests” and “transactions” are confusing me(I assume whenever there is a request, it is followed by a transaction; every request leads to one transaction).

Thanks

avidday · March 5, 2010, 3:04pm

An therein lies the distinction which you have missed. There isn’t always a 1:1 relationship between requests and transactions. A single request might generate a number of transactions, depending on whether the coalescing rules can be met, and what the size being written is.

Kiran_CUDA · March 5, 2010, 3:06pm

Got it! But I understand that in compute capability 1.2 devices Coalescing rules are relaxed…

seibert · March 5, 2010, 3:53pm

Yes, which means that instead of a failure to coalesce resulting automatically in 16 transactions, it can now (with 1.2 devices) result in fewer transactions, but not necessarily one. The programming guide explains the situations where this can happen.

Kiran_CUDA · March 6, 2010, 3:45am

Thanks a lot!

Nerei · March 22, 2010, 2:38pm

So for 1 request could be up to 16 transactions. Therefore efficiency should be from 1/16 to 1. But why I can see sometimes efficiency about 2000 ?

avidday · March 22, 2010, 2:42pm

Where can you see global memory load or store efficiency values >1?

Nerei · March 22, 2010, 3:10pm

Here. Profiler screenshot is also attached.

[codebox]global void dispKernel(int inters, int width, int height, int istep, float res, size_t rstep)

{

int x = blockIdx.x * blockDim.x + threadIdx.x;

int y = blockIdx.y * blockDim.y + threadIdx.y;

if ( x > winsz2 && y > winsz2 && x < (width - winsz2) && y < (height - winsz2) )

{

    int minInd = 0;

    int minDiff = 0x7FFFFFFF; // int max

    for(int e = 0; e < n; ++e)

    {

}

    res[y * rstep + x] = minInd;

}

}

[/codebox]

avidday · March 22, 2010, 3:16pm

Is that the version 3.0 profiler? It has some obvious bugs in the memory throughput and efficiency calculations - I get global memory throughputs of 500Gb/s from my GT200 with that version (see here).

I think you can discount them as wrong.

Nerei · March 22, 2010, 3:35pm

Thank you for answer. Profiler version is 2.3.10