own float3 and float4

how to declare own float3 and float4 so that it can be cast from CUDA float3 and float4 in device code.

//declaration in host side
struct my_float3{float x,y,z};
struct my_float4{float x,y,z,w}f4;

//inside device
float4 c4;
float3 c3;

f4 = c4;
f3 = c3;

Is there anyway we can do this?
I am thinking that alignment might cause issues.

Thanks in advance for the help.

Is there a particular reason why you would want to define your own float4 type? You can certainly define a float3 struct as shown and it should work fine across host and device code. It will have four-byte alignment.

Yes. I have a code that I cannot change.
I will try this then.


Note that if you define your own float4 type as a simple struct, it will have insufficient alignment to qualify for vector loads on the device side, which may reduce performance. CUDA’s built-in float4 type is implemented as a struct with added alignment attributes on both host and device.

What won’t work (in the general case) is using CUDA’s aligned float4 for device code and interface it to your own unaligned float4 on the host side. This kind of mix-and-match might work under carefully constrained circumstances, but generally speaking you definitely want to use the same type for both host and device code. So your original concerns along those lines were justified.

Note that CUDA does not provide a built-in float3 type so you have no choice but to define your own.

Can I declared my_float4 as the same alignment like float4 of CUDA?

CUDA provides a builtin float3 type afaik

I stand corrected, there is indeed a float3 struct declared in the CUDA header file vector_types.h. No alignment attributes are used so it uses four-byte aligment based on the component type.

You can declare your own float4 type with compatible alignment (16) of course. Inside CUDA code, you can use the align attribute (see the CUDA C Programming Guide). Outside of CUDA code you will need to use the alignment mechanism provided by your host’s native toolchain.