union and local memory

omg. Please, Help. nvcc place those structures in local memory if i use union.

struct __align__(16) my_vec




  float M[4];



  	float x,y,z,w;




is there any special option to switch off those behaviour?

i want to my vectors place in registers

You can not use indexing with registers, nor padding. Turn off the padding and don’t use an array of floats. As an alternative you can keep the array and place your struct in shared memory, i.e. shared my_vec vec0;.
Whatever you do don’t use padding on this memeory level, this is no sse code.

Yes, CUDA will use local memory to represent unions. This is because registers don’t support re-interpretation or indexing.

Just use .x.y.z.w and you should be fine. The alignment is no problem, it is just ignored for registers.

ok, thanks for reply

Hold on, what?!?!?! This means that I can’t use short arrays at all practically? How on earth is one supposed to do programming on this??? Everything has to be typed manually every time? Shouldn’t these kind of things be something that the compiler has to worry about? And shouldn’t this be one of the reasons that the loop-unrolling-pragma exists?

I mean this seems exactly the kind of things that programming languages were created for, no? To abstract away these stupidities. I guess in this sense the real culprit is C being too low-level, but I think this particular problem could be optimized by the compiler…

Hmm, checking the Programming Guide (v 1.1) says the following:

"An automatic variable declared in device code without any of these qualifiers generally resides in a register. However in some cases the compiler might choose

place it in local memory. This is often the case for large structures or arrays that would consume too much register space, and arrays for which the compiler cannot determine that they are indexed with constant quantities."

This would seem to suggest that one can use arrays, if the compiler is smart enough in the indexing - so I guess only the disassembly will tell the final truth… :)

Benchmarking and --ptxas-options=v will tell the truth also :)
If you’re using array with constant indiex which can be determined at compile-time then compiler will probably use register.