performance problem of aligned structure

mianlu · August 3, 2008, 10:00am

Hi,

I’d like to have the data structure with four double numbers (GTX280 card), however there is no built-in double4 vector type on CUDA, thus I have to implement it by myself. I use the following alignment code and define the type structure GPU_qd

struct align(16) GPU_qd
{
double2 d1;
double2 d2;
};

I use Profiler to test the access it shows it’s still coalesced access. But it seems the performance is not very good. The time to access 4 M GPU_qd number is about 55 ms(no other operations, just read from one array to another array using kernel, rather than memcpy function), however if I access 16 M double number is only about 27ms. They have the same size, I think it should be similar. Does anyone knows the problem? And I wonder why NVIDIA only support up to double2 rather than double4, however float2 and float4 are both supported.

Thanks!!!

Mian

mianlu · August 3, 2008, 10:19am

okay… I found one problem, the number of coalesced access it seems doubled… so I think the built-in vector do have some optimization, I really hope CUDA can support double4

E.D_Riedijk · August 3, 2008, 10:24am

You will only see coalesced accesses on GT200, there memory access rules have changed. There is no such thing as uncoalesced access anymore.

I believe the memory controller supports mem-transfers of up to 128 bits (= 4 floats / 2 doubles), so that could be the reason double4 is not supported.

Topic		Replies	Views
Coalesced access to global memory for double4 CUDA Programming and Performance	8	3625	September 8, 2015
coalesced access of a struct of double's is this rite? CUDA Programming and Performance	14	7935	June 29, 2009
hwo to make float2 and float4 data coalesced? CUDA Programming and Performance	1	3577	May 27, 2008
Coalesced Memory Access to Structs CUDA Programming and Performance	11	4713	September 19, 2009
vector data types Speedup by Vectorizing CUDA Programming and Performance	11	6453	December 14, 2007
why is it uncoalesced ? SDK example simpleGL CUDA Programming and Performance	9	13736	February 3, 2011
Coalesced vs non-coalesced in reduction example Why float4-reads are not coalesced? CUDA Programming and Performance	10	4163	October 15, 2008
Problems with doubles on GTX280 Emu works, float works, double on device fails. CUDA Programming and Performance	5	5132	November 11, 2008
Quick question about memory coalescence CUDA Programming and Performance	5	5729	May 5, 2008
Coalesced memory access CUDA Programming and Performance	3	3126	January 20, 2009

performance problem of aligned structure

Related topics