what does gld_request really mean?

pacard · January 18, 2010, 11:30am

I am trying to profile my program with cuda profiler. I used the following events:

gld_32b : 32-byte global memory load transactions
gld_64b : 64-byte global memory load transactions
gld_128b : 128-byte global memory load transactions
gld_request : Global memory loads

My understanding is that gld_request=gld_32b+gld_64b+gld_128b.
But I am getting this output:

gld_32b=[ 9315200 ]
gld_64b=[ 3031600 ]
gld_128b=[ 1537150 ]
gld_request=[ 1962736 ]

So what does gld_request really mean?

I am using GTX280 on Redhat EL_5.3_x86.

avidday · January 18, 2010, 12:26pm

I don’t think that is correct. On the GT200, a global load request is a half-warp wide request for a global memory load from the memory controller. The GT200 memory controller can decompose the request into a sequence of transactions to service the gld_request, so one gld_request can produce more than one 32 byte, 64 byte or128 byte load (see for example Figure 5-4 in the CUDA 2.3 programming guide). So I wouldn’t expect that relationship you suggest would be valid in any case except perhaps code with perfectly coalesced read behaviour.

pacard · January 18, 2010, 1:03pm

avidday, thank you for you reply.

Now the data can be explained.

But I still have a set of odd data now here:

gst_32b=1638400

gst_64b=0

gst_128b=307200

gst_request=57344

Summing all together, the number of memory transactions is 1945600. So each request generates 1945600/57344=33.9 transactions. :-(

Am I getting something wrong here?

pacard · January 18, 2010, 1:07pm

By the way, I am using the atomicCAS and atomicAdd operations heavily. Could that have caused the problem?

Cygnus_X1 · January 18, 2010, 1:49pm

That looks somewhat suspicious and incorrect, but still hints that you have nearly completly random global memory writes and it asks for optimisation :)

pacard · January 18, 2010, 2:04pm

Well, I tried my best to make the memory access pattern more coalesce-able. This is totally beyond my imagination.

I would accept the reality if transaction/request = 10, but not 33. External Media

Topic		Replies	Views
What information does "gld_request" provide? (cudaProf Counter) CUDA Programming and Performance	0	4152	February 17, 2010
Perplexed by Global Load Transactions Per Request in P100 CUDA Programming and Performance	1	613	January 9, 2020
What is a memory transaction and a request? CUDA Programming and Performance	1	1083	March 6, 2020
coalesced reads CUDA Programming and Performance	3	1375	May 3, 2014
What does gld and gst mean in the visual profiler ? CUDA Programming and Performance	2	2301	June 16, 2009
Trying to understand Transactions per request for P100 CUDA Programming and Performance	2	1470	February 26, 2018
Visual profiler and compute capability 1.3 CUDA Programming and Performance	4	9945	May 3, 2010
Where gst_requests are coming from? CUDA Programming and Performance	4	1767	April 16, 2010
gld counter - visual profiler question CUDA Programming and Performance	1	2266	June 12, 2009
what exactly does gld_128b mean in cuda profiler CUDA Programming and Performance	0	3192	October 28, 2009

what does gld_request really mean?

Related topics