float4 alignment inconsistency...

I have some CUDA code that was getting memory corruptions, and it has come down to an issue with the alignment of the float4 structures.

In my .cpp module, I call size = sizeof(TestStruct), and it returns 20. If I call into a function in the .cu module, and in there, still in host code, call (sizeof(TestStruct)) then it returns 32.

So, the .cpp module is not recognising the 16-byte alignment of the float4.

#include "cuda.h"
#include "cuda_runtime.h"
#include "vector_types.h"

class TestStruct {
	float	test1;
	float4	test2;

void CMainFram::OnTest()
    int ltest = sizeof(TestStruct);  // returns 20

I’m using Visual Studio 2010, and the project has been moved over the years from Visual Studio 6.0, so there might be some odd settings, but t’s been . What do I need to fix to get Visual Studio to correctly align the float4 in the .cpp files?

Edit to Add: I am working on Win32. I have looked into host_defines.h, and discovered that the CUDA alignment is only set for the CUDA compiler, and for Win64. Win32 just defines builtin_align to nothing. I tried adding the definition for Win32 as well, but it created lots of C2719 compiler errors, as follows…

error C2719: 'val': formal parameter with __declspec(align('16')) won't be aligned

I’m guessing 32-bit windows can’t handle this, and the CUDA libraries just haven’t got this problem sorted. For now, I will have to restructure my code to move all float4 references into the .cu source file. I think I can get away with this, but it would still be really nice to resolve this properly. (Yeah, yeah, I know 64-bit code IS the future. You know what I mean.)

For now, it’s DANGER DANGER, don’t ever use the float4 type in .cpp files, or any header files that will be included in .cpp files. I will stick to the .cuh header file naming convention to help remind me to keep float4 variables out of .h files, and that .cuh files MUST only ever be included into .cu files, never .cpp or .h, even though they can.

You can try using the align directive (in a .cpp file):


CUDA has a similar align directive for use in .cu files:


the CUDA C programming guide has various discussions about vector type alignment and structure alignment, such as here:


In this particular case, you could probably also fix the issue simply by reversing the positions of test1 and test2 in the structure.

I have not used a 32-bit Windows platform in years. However, as I recall, the Win32 ABI simply has no provisions for 16-byte alignment, the maximum alignment it supports is 8 bytes. This restriction will therefore apply to all host code, whether it be in a .cu file compiled with nvcc (which in turn calls MSVC to compile it) or in a separate .c or .cpp file compiled directly with MSVC.

Back when the Win32 ABI was defined, x86 processors did not impose alignment restrictions on data. Alignment was only needed to achieve optimal performance, not functional correctness. The largest data types natively supported by the hardware were floating-point types of the x87 FPU, in particular ‘double’ (8 bytes) and ‘long double’ (10 bytes) and both required only 8-byte alignment for optimal performance.

If your use case allows it, I would suggest transitioning your code to a 64-bit Windows application. From my perspective, 64 bit is the mainstream for all platforms supported by CUDA except for ARM platforms.

Even if you cannot get 16-byte alignment, some padding issues in the form of undesired host/device differences in structs can be worked around by ordering structure members according to size, largest size first, then moving down to increasingly smaller sizes.

Thanks for your thoughts. Seems like there is no proper solution, but plenty of options to workaround it. I will just have to remain aware of such issues. For now, I have worked the float4 structures all back into the .cu module, and the .cpp files just stay away from such things.

Thanks for your help.