swizzling? float4 arithmetic support?

I’m not sure if this is a bug, or just how the developers intend things to work, but inside of a kernel, I would think the following would work (and I would certainly like it to):

float4 a, b;

//initialize a,b

a += b;

However the compiler gives the following error

"error: no operator “+=” matches these operands

        operand types are: float4 += float4"

Do you really intend that we should have to write

a.x += b.x;

a.y += b.y; 

...

?

It would also be nice if swizzling worked.

ie

float4 a;

a.xy = make_float2(1, 2);

gives an error upon compilation that a has no member xy.

Or something like

a = b.xyxy

also doesn’t work with the same failure (that b doesn’t have member xyxy).

You’re correct - we support vector data types (float4 etc.) for efficient memory access, but currently there are no operations defined on them.

Note that on a scalar architecture like G80 there’s no performance advantage to using vector types for calculation, but I agree they are convenient, especially if you’re porting code from shaders.

We’ve considered providing a header that defines the standard operators similar to Cg (although probably not swizzling). Would this be enough?

Note if you don’t want to wait for us to provide one, you can write your own custom operators. We aren’t officially supporting C++ yet, so we can’t guarantee anything, but I have used operator overloading successfully for a case like this. You would just need to write some code like this:

__device__ __host__ float4 

operator+( float4 lhs, float4 rhs) 

{

    float3 res = { lhs.x + rhs.x , lhs.y + rhs.y , lhs.z + rhs.z };

    return res;

}

(I didn’t test this code, so I might have some mistakes, but that’s the basic idea.)

Note how I declared the function as both device and host. This causes the compiler to create two versions of the function. One that can be called from host code, and one that can be called from device code (from a global or another device function).

Mark

I hope operator overloading is working OK, else I would be worried about the min/max implementations for example in $(CUDA_SDK)/include/math_functions.h :whistling:

Peter

I think such a header would be very useful - especially for the people with some kind of graphics background moving to CUDA. I’ll try out Mark’s suggestion and at least add support for the operators I need right now.

Thanks,

Erich