Hello everyone!
Let’s say I have the following custom data structure:
typedef struct
{
int idata;
float fdata;
}custom_type_t;
and the following kernel:
__global__ void kernel(custom_type_t *data, int stride)
{
int cx = (blockIdx.x * blockDim.x) + threadIdx.x;
int cy = (blockIdx.y * blockDim.y) + threadIdx.y;
int idx = (cy * stride) + cx;
data[idx].idata = 23;
data[idx].fData = 2.0;
}
The data array has 64x64 elements and the execution configuration is a grid of 4x4 Blocks with each block having
16x16 threads. So it should be possible to coalesce the 16x8 byte transactions of a half-warp into 1x128 byte
transaction IIRC.
When I am running this kernel on my Geforce 9600 GT the profiler output tells me that all global stores are incoherent.
gputime=[ 12.672 ] cputime=[ 28.216 ] gridSize=[ 4, 4 ] blockSize=[ 16, 16, 1 ] occupancy=[ 1.000 ] gst_coherent=[ 0 ] gst_incoherent=[ 4096 ]
I already tried to add “align(8)” but that didn’t help. After having a look at the alignedTypes-Samples, I noticed that
the variables of the structures do all have the same type. Do I have to pass the structure members as seperate arrays or
is there another way to achieve coalescing with the above data structure?
__global__ void kernel(int *data, float *fdata, int stride)
(...)
When using the following data structure all global stores are coherent:
typedef struct __align__(8)
{
float idata; // changed from int to float
float fdata;
}custom_type_t;
Profiler Output:
gputime=[ 7.168 ] cputime=[ 20.952 ] gridSize=[ 4, 4 ] blockSize=[ 16, 16, 1 ] occupancy=[ 1.000 ] gst_coherent=[ 512 ] gst_incoherent=[ 0 ]
Thanks a lot in advance for your help!
Bye,
kameo