Converting vector types to arrays


I am working with some legacy code, that is passing in a float4 to the kernel . I need to access a coordinate of it that is determined at runtime: So, I have a variable currAxis, and if it’s 0, I need to access point.x, if it’s 1, I need to access point.y, etc. What is the fastest way of doing it? Can I count on &point to store the address of to point.x, &point+1 to store the address of point.y, etc? Or do I have to run through a series of if statements?

Eventually I will convert my code to pass in a float[4], but I’d rather get it working first with as few modifications to the original as possible. Also, I am a little puzzled by the original developer’s choice of a float4, because my data is at most three-dimensional. Could anyone think of a good reason why he did it this way, maybe something about alignment?



I handle this in my kernel with an if-block. Not particularly fast or elegant, but it isn’t the bottleneck in my code, so I can afford to be a little wasteful. The pointer trick sounds interesting, though. I’d try that and see how it works.

Correct. Devices with compute capability 1.0 and 1.1 (i.e. pre GTX 200 series) cannot load a 96-bit data type in a coalesced way. They could only coalesce reads from 32, 64 and 128 bit types. Compute capability 1.2 and later does not have this problem, and even with the older cards there is a trick for reading float3 coalesced by using shared memory as a buffer.

Yup, it works!

So, alignment means that if I have an array of float3, I cannot access array[4] for the x-coordinate of the second vector because there would be spacing between one float3 and the next?

What he means by coalesced access is that to read a float3, the device must issue two load instructions (a 64-bit and a 32-bit); for a float4, the device can issue one 128-bit load instruction (which is obviously faster, even though it’s more data).

I’m not sure about the alignment though, I can’t remember if the compiler automatically upgrades float3 to float4 alignment. I think most people just use float4 around here due to the coalesced access.

The C99 standard states that

(C++ is a bit more complicated but I remember it to be essentially the same.)

What you have to be aware of is that the compiler is free to insert padding within the struct.

So what you are doing will probably work for simple structs with members of equal size, but you could get in trouble with more complex structs.