I think this feature is pretty powerful. Although this does not equivalent to what we can do in C/C++ inline assembly in which we can reference variable names directly, but the syntax is pretty clear (just like printf).
The feature exists for quite a while (at least v2.3 can do the trick). Use the OptiX source as a start, one can define a PTX function and call from CUDA source as follows:
__noinline__ __device__ uint _something(uint x, float y, double z)
{
//return z + y + x;
uint ret;
__asm("\
.reg .u32 %rx, %ry, %rz;\n\
mov.u32 %rx, %1;\n\
cvt.rzi.u32.f32 %ry, %2;\n\
cvt.rzi.u32.f64 %rz, %3;\n\
add.u32 %rx, %rx, %ry;\n\
add.u32 %rx, %rx, %rz;\n\
mov.u32 %0, %rx;" : "=r"(ret) : "r"(x), "f"(y), "d"(z) : );
return ret;
}
__device__ uint something(uint x, float y, double z)
{
uint ret;
asm ("call (%0), _Z10_somethingjfd, (%1,%2,%3);" : "=r"(ret) : "r"(x), "f"(y), "d"(z) : );
return ret;
_something(0, 0.0f, 0.0); // dummy call to avoid compiler optimization
}
This is just a dummy example, but I think it’s enough to show the all the possibilities with PTX inlining.
Parameters can be specified with corresponding types, here’s what I’ve found out:
d => double
f => float
h => short
l => long
m => ??
n => ??
r => int
s => ??
Also, as long as you use the asm parameter list trick, you must specify an output for this asm code (just like the “=r” in above example)
Of course I think it would be great to have variable specified directly within the PTX code, so we shouldn’t rely on this extensively. But for now the trick is convenient enough for us to do some PTX/CUDA hybrid compilation, which is AWESOME. Hope CUDA 3.0 would keep or improve this feature further!