Using float4

Expanding 1D code to 3D code, so making use of float4 would be useful for coalescing.

But if I only wish to manipulate the x,y and z component of float4 can I just refer to them, or do I also need to refer to the w component even though it is redundant?

For example, what happens with the following code?

float4	 pos;

pos.x+=2;

pos.y+=3;

pos.z+=1;

Would this be coalesced? If not would I need to add a statement referring to pos.w for coalescing?

And in order for the compiler to interpret the three (or four) statements as coalesced do they need to be sequential? i.e. as above or can statements be inserted in between, such as

float4	 pos;

int		  c;

pos.x+=2;

c+=10;

pos.y+=3;

pos.z+=1;

Also, the compiler tells me that operations such as adding two float4s cannot be done, such as

float4   pos1,pos2,pos3;

pos3 = pos1 + pos2;

and that if adding two float4s the addition must be written explicitly, such as

float4   pos1,pos2,pos3;

pos3.x = pos1.x + pos2.x;

pos3.y = pos1.y + pos2.y;

pos3.z = pos1.z + pos2.z;

If this is true when will float4 addition be available? I am using a C870.

You can also use the “volatile” keyword to force the compiler to read all the components even though you only use some of them. e.g.:

volatile float4 pos = posArray[i];

pos.x += 2;

It shouldn’t matter which order you access the components in.

CUDA doesn’t provide operations on the vector types out of the box, but you can define them yourself or use the “cutil_math.h” header included in the SDK.

Would this cause every usage of pos to result in a memory access? For example will

volatile float4 pos = posArray[i];

pos.x = 2;

pos.y = 3;

pos.z = 4

Be 3 (or maybe even 4) coalesced memory accesses?

Edit: changed example to make it simpler

Upon further reflection, I realized that this question is dumb - I got confused by the volatile keyword. Since pos and posArray are separate variables, writing to pos will have no affect on posArray. If pos is in registers, then writing to pos just writes to those registers. pos being volatile just means that those writes cannot be optimized out - the value in the register has to be modified - but this does not involve memory.