PTX Assembly in CUDA 4.0

Hello. CUDA 4.0 is very useful and asm() is a convenient new feature. I would like to request that the entirety of the PTX expressiveness be available with CUDA 4.0 asm(). A specific feature request, in that regard, is the capability to indicate vector parameter types, as per .v2 and .v4 .

This seems like a useful enhancement. I would suggest filing an official feature request (RFE = request for enhancement) through the bug reporting mechanism. Meanwhile, you could handle vector types on a per-component basis, as in the example below.

typedef uint4 my_uint128_t; 

__device__ my_uint128_t add_uint128 (my_uint128_t addend, my_uint128_t augend) 

{     

    my_uint128_t res;     

    asm ("add.cc.u32      %0, %4, %8;\n\t"

         "addc.cc.u32     %1, %5, %9;\n\t"

         "addc.cc.u32     %2, %6, %10;\n\t"

         "addc.u32        %3, %7, 11;\n\t"          

         : "=r"(res.x), "=r"(res.y), "=r"(res.z), "=r"(res.w)

         : "r"(addend.x), "r"(addend.y), "r"(addend.z), "r"(addend.w),

           "r"(augend.x), "r"(augend.y), "r"(augend.z), "r"(augend.w));

    return res; 

}