More data movement/conversion functions


I know there’s __int_as_float and __float_as_int functions presented in current CUDA API. However, I would like to convert to/from more data types, such as short2 and unsigned int and even vector types. I know at ptx level, the “as operator” are implemented using mov.b32 instruction, and as I read the ptx manual, I think the move instruction is capable of moving from any type to any other type. So it’s possible to implement something like __uint_as_float, __float_as_uint, …probably need to split into two mov instruction when it comes to __short2_as_int, __int_as_short2, __short2_as_float…
Since we can’t write inline ptx capability code, I think these kind of function will be handy if one wants to do some trick on data types. Any suggestion? Any alternative way to do such thing?


By the way, I tried to implement this by using union structure, but it turns to use local memory (store and load to perform a conversion), which is bad for performance.

I hope I can either write inlined ptx assembly or have those functions from NVIDIA. Although I think inlined ptx will not be possible because the design of device emulation (will be really complicated work), maybe it’s better to have those conversion functions to abstract execution.

Can’t you just use explicit casts for this?

How to do that without using local memoy?

If I write this:

uint a = 1234567;

short2 b = *((short2*)&a);

then NVCC will forced the conversion by using local memory (store as uint and load as short2).

Could you show me how to do the any type to any type conversion (as long as they’re of the same size), such as short2 to uint, ushort2 to float,…something like that?

I tried to cast it directly from/to short2 to/from uint and no success. I think the bitwise shift operation is slower then the cvt instruction in ptx. What do you think? What’s it the correct way or fastest way to bitwise convert to size-compatible data types?