Hello. CUDA 4.0 is very useful and asm() is a convenient new feature. I would like to request that the entirety of the PTX expressiveness be available with CUDA 4.0 asm(). A specific feature request, in that regard, is the capability to indicate vector parameter types, as per .v2 and .v4 .
This seems like a useful enhancement. I would suggest filing an official feature request (RFE = request for enhancement) through the bug reporting mechanism. Meanwhile, you could handle vector types on a per-component basis, as in the example below.
typedef uint4 my_uint128_t;
__device__ my_uint128_t add_uint128 (my_uint128_t addend, my_uint128_t augend)
{
my_uint128_t res;
asm ("add.cc.u32 %0, %4, %8;\n\t"
"addc.cc.u32 %1, %5, %9;\n\t"
"addc.cc.u32 %2, %6, %10;\n\t"
"addc.u32 %3, %7, 11;\n\t"
: "=r"(res.x), "=r"(res.y), "=r"(res.z), "=r"(res.w)
: "r"(addend.x), "r"(addend.y), "r"(addend.z), "r"(addend.w),
"r"(augend.x), "r"(augend.y), "r"(augend.z), "r"(augend.w));
return res;
}