Cuda Profiler 1.1 - question on gst coalesced value

kpg · April 5, 2009, 7:38pm

The number of coalesced stores reported by the profiler (1.0 or 1.1) doesn’t seem to match my understanding.

For a very simple case:
I have declared two global memory pointers of type float - float* data_1, float* data_2.

Allocated memory to them as follows:

long d_size = sizeof(float) * 8192;
size_t pitch = 128;
CUDA_SAFE_CALL( cudaMallocPitch( (void**) &data_1, &pitch, d_size, 1));
CUDA_SAFE_CALL( cudaMallocPitch( (void**) &data_2, &pitch, d_size, 1));

the dimgrid and dimblock are set as follows:
dim3 dimBlock(1, 1, 1);
dim3 dimGrid(1, 1, 1);

And the instructions in the kernel code is:
unsigned int x = blockIdx.x*blockDim.x + threadIdx.x
data_2 = data_1 * 2;

The profiler returns : gld coalesced = 1 and gst coalesced = 2

if I change to 32 threads in one block: dim3 dimBlock(32, 1, 1);
The profiler returns : gld coalesced = 2 and gst coalesced = 8

What am I missing. Why aren’t the number of loads equal to the number of stores?
Even if I read from data_2 and write to data_1 (to check if the global memory alignment is the problem – I still get the above reported number of loads and stores, respectively)

Any insights?

I am using the 280 GTX card (1.3 compute capability) on a Linux (Fedora 3) machine and CUDA 2.0. Both profilers - 1.0 and 1.1 return the same values.

Thanks -
kpg

tmurray · April 5, 2009, 7:57pm

I think I remember what they mean, but you don’t want to use them anyway because they are pretty useless on GT200. You should grab the 2.2 beta from the registered developer site and use the new counters that measure individual memory transaction sizes.

Topic		Replies	Views
cuda profiler error about coalesced store CUDA Programming and Performance	2	1120	January 6, 2010
Profiler not reporting coalesced ld/st CUDA Programming and Performance	1	438	January 19, 2011
Help interpreting profiling information? CUDA Programming and Performance	2	1248	January 9, 2009
The global memory read and write is different? The result in profiler is confusing. CUDA Programming and Performance	6	4457	April 2, 2010
coalesce counter meaning CUDA Programming and Performance	5	4319	April 15, 2009
Profiler coalescing counters On a GTX 260 CUDA Programming and Performance	4	2414	August 13, 2008
gst uncoalesced option in CUDA2.3 profiler? CUDA Programming and Performance	7	2115	September 17, 2009
Tracking down non-coalesced events with profiler CUDA Programming and Performance	1	1050	May 17, 2009
problems about cudaprof CUDA Programming and Performance	2	1374	February 18, 2010
Unexpected Profiler output, zeros for all global read/write CUDA Programming and Performance	3	1921	December 23, 2008

Cuda Profiler 1.1 - question on gst coalesced value

Related topics