Nomenclature may differ between programmers. When simply moving bits between variables of different types, I would call that re-interpretation, not conversion. So uint32_t x; __int_as_float(x)
is a re-interpretation, while uint32_t x; float(x)
is a conversion. Below is a (not very useful) worked example using different flavors of mov
.
The problem in the thread-starting post is with the binding, i.e. the "=h"
. Binding a symbolic operand in inline PTX to a CUDA HLL variable requires that variable to be of scalar type, but ret
is of vector type uchar2
. I am reasonably sure this is a limitation of the inline-assembly-operand binding mechanism common to clang, gcc, etc. Robert Crovella’s first post in the thread already showed how to get around this limitation.
__device__ ulonglong2 foo (unsigned long long a, unsigned long long b)
{
ulonglong2 res;
asm ("{\n\t"
".reg .u32 alo, ahi, blo, bhi, r0, r1, r2, r3;\n\t"
".reg .u16 a0, a1, a2, a3, b0, b1, b2, b3;\n\t"
"mov.b64 {alo,ahi}, %2;\n\t"
"mov.b64 {blo,bhi}, %3;\n\t"
"mov.b32 {a0,a1}, alo;\n\t"
"mov.b32 {a2,a3}, ahi;\n\t"
"mov.b32 {b0,b1}, blo;\n\t"
"mov.b32 {b2,b3}, bhi;\n\t"
"mul.wide.u16 r0, a0, b0;\n\t"
"mul.wide.u16 r1, a1, b1;\n\t"
"mul.wide.u16 r2, a2, b2;\n\t"
"mul.wide.u16 r3, a3, b3;\n\t"
"mov.b64 %0, {r0,r1};\n\t"
"mov.b64 %1, {r2,r3};\n\t"
"}"
: "=l"(res.x), "=l"(res.y)
: "l"(a), "l"(b));
return res;
}