Data alignment Alignment of data declarations in struct

I am experimenting with porting some existing code (x264 encoder) using CUDA. The issue I am running into is that of the mechanics for data alignment in NVCC.

The existing x264 code base has declarations designed to force data alignment within a struct declaration. It looks like this…

   /* Current MB DCT coeffs */



        DECLARE_ALIGNED( int, luma16x16_dc[16], 16 );

        DECLARE_ALIGNED( int, chroma_dc[2][4], 16 );

        // FIXME merge with union

        DECLARE_ALIGNED( int, luma8x8[4][64], 16 );



            DECLARE_ALIGNED( int, residual_ac[15], 16 );

            DECLARE_ALIGNED( int, luma4x4[16], 16 );

        } block[16+8];

    } dct;

Where the DECLARE_ALIGNED macro is declared elsewhere as

# define DECLARE_ALIGNED( type, var, n ) type var __attribute__((aligned(n)))

This above example fails to compile using the 0.8 version of nvcc on RHEL 4.3

In the CUDA Progrmaming Guide Section, the align specification is stated to apply to structs. How do I force alignment of members of a struct?



You don’t have this control with nvcc. You can only align an entire struct.

As mentioned in Section, you only have to worry about alignment issues if you want to efficiently read from or write to global memory. If this is the case for the struct below, you should just use align(16).

The issue is that in order to preserve the semantics of copying a complete struct between the GPU and the host copy, I need to make sure that the memory packing of struct is the same (or at least very compatible). Otherwise, I will be obliged to copy each member of the struct individually to ensure that the data is copied to the right piece of memory in the struct.