Alignment Issue?

Hi,

If I compile this code:

__global__ void test(double d, float f, unsigned int i) { }

int main()

{

	test<<<1, 1>>>(2.0, 2.0f, 2);

	return 0;

}

like this:

nvcc-4.0 --keep main.cu -o main -arch=sm_20

and look at the cudaSetupArgument functions (cleaned up a little), I get this:

cudaSetupArgument(__par0, 0UL);

cudaSetupArgument(__par0, 8UL);

cudaSetupArgument(__par0, 12UL);

however, if I swap the order of the float and double parameters, I get this:

cudaSetupArgument(__par0, 0UL);

cudaSetupArgument(__par0, 8UL);

cudaSetupArgument(__par0, 16UL);

What’s going on here, why is the float parameter taking up 4 bytes when put second and 8 when put first? Why can’t I just use sizeof(variable) to calculate what these values should be?

Dan

All primitive types are aligned to their size, this is a restriction of the GPU hardware (most processors that are not x86 have this restriction and even x86 takes a performance hit if you don’t do this). So if the double is specified after the float, then the float starts at address 0, and the double cannot start at address 4 because 4 is not 8-byte aligned, so it is padded and placed at address 8 instead.

That makes complete sense, I guessed it was something to do with alignment, thanks.