hwo to make float2 and float4 data coalesced?

viswyh · May 27, 2008, 6:09am

In cuda programming guide, they said the device can read 32-bit 64-bit,or 128-bit wrods from global memory into registers in a single instruction.
So i try to make a test, with the following code
typede sturct __align(16)
{
float coeff[4];
}element;

global void
float4_coalesced(elementodata, elmentidata)
{
shared float5 sdata[BLOCK_DIM];
int index = blockIdx.x * blockDim.x+ threadIdx.x;
int tx = threadIdx.x;
sdata[tx] = idata[index];
__syncthreads();
odata[index] = sdata[tx];
}
i thought it should be coalesced,however when i use the cuda visual profiler to test the result is not coherent.
can any one tell me why? and how can I to make it coalseced except the following method:
shared float s_data[BLOCK_DIM4];
s_data[threadIdx.x] = idata[index];
s_data[threadIdx.x+BLOCK_DIM] = idata[index+BLOCK_DIM];
s_data[threadIdx.x+2BLOCK_DIM] = idata[index+2BLOCK_DIM];
s_data[threadIdx.x+3BLOCK_DIM] = idata[index+3*BLOCK_DIM];
__syncthreads();

seibert · May 27, 2008, 12:28pm

Can you dump the PTX for this kernel? You’ve defined your element struct as a float[4] array, whereas the float4 struct is defined to be:

struct __align__(16) float4

{

  float x, y, z, w;

};

While both structs should occupy the same memory layout, they aren’t semantically the same, and it is possible that the compiler is doing something funny.

Topic		Replies	Views
global memory latency CUDA Programming and Performance	4	2132	June 22, 2008
Quick question about memory coalescence CUDA Programming and Performance	5	5703	May 5, 2008
Cannot coalesce global memory reads using builtin vector types CUDA Programming and Performance	6	3344	July 14, 2010
why is it uncoalesced ? SDK example simpleGL CUDA Programming and Performance	9	13695	February 3, 2011
Coalesced vs non-coalesced in reduction example Why float4-reads are not coalesced? CUDA Programming and Performance	10	4121	October 15, 2008
Coalescing Custom Data Structures CUDA Programming and Performance	1	3743	September 2, 2009
coalescing struct loading problem CUDA Programming and Performance	21	12785	March 5, 2010
coalesced struct reads CUDA Programming and Performance	2	750	September 29, 2011
Coalesced vs non-coalesced in reduction example Why float4 is not coalesced CUDA Programming and Performance	1	1404	October 13, 2008
Coalesced memory access CUDA Programming and Performance	3	3105	January 20, 2009

hwo to make float2 and float4 data coalesced?

Related topics