How add/sub long numbers with PTX

i dont know how to subtract properly in ptx when I have long number

For example

abcdefghil-

mnopqrst

I divide in 2

fghil-

pqrst

and save borrow in “borrow” (it’s int, while all the other are unsigned int)

then i apply carry

and do the rest, that is

abcde-

mno

Sometimes the result is correct, sometimes not (but it has a very little difference from the correct one)

asm("{\n" \

		"sub.cc.u32		%5, %5, %10;\n\t"

		"subc.cc.u32	%4, %4,  %9;\n\t" \

		"subc.cc.u32	%3, %3,  %8;\n\t" \

		"subc.cc.u32	%2, %2,  %7;\n\t" \

		"subc.cc.u32	%1, %1,  %6;\n\t" 

		"subc.u32		%0,  0,   0;\n\t" \

		"}" : "=r"(borrow),"+r"(f),"+r"(g),"+r"(h),"+r"(i),"+r"(l) : "r"(p),"r"(q),"r"(r),"r"(s),"r"(t) : );

	//	save carry in u and apply 

	asm("{\n" \

		"sub.cc.u32		%4, %4, %5;\n\t"

		"subc.cc.u32	%3, %3,  0;\n\t" \

		"subc.cc.u32	%2, %2,  0;\n\t" \

		"subc.cc.u32	%1, %1,  0;\n\t" 

		"subc.u32		%0, %0,  0;\n\t" \

		"}" : "+r"(a),"+r"(b),"+r"(c),"+r"(d),"+r"(e) : "r"(borrow) : );

	//	second trance

	asm("{\n" \

		"sub.cc.u32		%4, %4, %7;\n\t"

		"subc.cc.u32	%3, %3, %6;\n\t" \

		"subc.cc.u32	%2, %2, %5;\n\t" \

		"subc.cc.u32	%1, %1,  0;\n\t" 

		"subc.u32		%0, %0,  0;\n\t" \

		"}" : "+r"(a),"+r"(b),"+r"(c),"+r"(d),"+r"(e) : "r"(m),"r"(n),"r"(o) : );

Is there a particular reason not to process the entire long integer subtraction in a single asm() statement? That would be certainly the easiest way to achieve the desired behavior.

If for some reason, you have to break up the computation, the carry needs to be extracted to C-level at the end of the first asm() statement and then must be re-created in the second asm() statement. There should be no need for a third asm() statement. The extraction of the carry flag in the posted code seems correct. One can simply use an unsigned int for that as well, as all we want to know is whether the value is equal to zero or different from zero. I am not in front of a CUDA-capable machine so can’t try, but if I am thinking about this correctly, one can re-create the carry flag via

sub.cc.u32  bit_bucket, 0, carry

at the start of the second asm() statement, then continue with the subc.cc.u32 instructions to subtract the more significant 32-bit chunks. In other words, to subtract two long integers e:f:g:h from a:b:c:d, where ‘a’ and ‘e’ represent the most significant 32-bit chunks, one would have two pieces of code operating according to the following scheme:

sub.cc.u32  res0, d, h

subc.cc.u32 res1, c, g

subc.u32    carry, 0, 0           // capture carry flag

//

sub.cc.u32  bit_bucket, 0, carry  // re-create carry flag

subc.cc.u32 res2, b, f

subc.u32    res3, a, e

I created a little test app, and what I suggested in terms of carry-flag capture and re-creation seems to work fine. Here is the code I tried:

__global__ void sub128_kernel (uint4 minuend, uint4 subtrahend, uint4 *diff)

{

    uint4 res;

#if 0

    asm ("sub.cc.u32      %0, %4, %8;\n\t"

         "subc.cc.u32     %1, %5, %9;\n\t"

         "subc.cc.u32     %2, %6, %10;\n\t"

         "subc.u32        %3, %7, %11;"

         : "=r"(res.x), "=r"(res.y), "=r"(res.z), "=r"(res.w)

         : "r"(minuend.x), "r"(minuend.y), "r"(minuend.z), "r"(minuend.w),

           "r"(subtrahend.x), "r"(subtrahend.y), "r"(subtrahend.z), 

           "r"(subtrahend.w));

#elif 1

    unsigned int carry;

    asm ("sub.cc.u32      %0, %3, %5;\n\t"

         "subc.cc.u32     %1, %4, %6;\n\t"

         "subc.u32        %2,  0,  0;"       // capture carry-flag

         : "=r"(res.x), "=r"(res.y), "=r"(carry)

         : "r"(minuend.x), "r"(minuend.y), 

           "r"(subtrahend.x), "r"(subtrahend.y));

    asm ("sub.cc.u32       %0,  0, %0;\n\t"  // recreate carry-flag

         "subc.cc.u32      %1, %3, %5;\n\t"

         "subc.u32         %2, %4, %6;"

         : "+r"(carry), "=r"(res.z), "=r"(res.w)

         : "r"(minuend.z), "r"(minuend.w), 

           "r"(subtrahend.z), "r"(subtrahend.w));

#else

    unsigned int c, c2;

    c = (subtrahend.x > minuend.x);       // SUB.cc

    res.x = minuend.x - subtrahend.x;

    res.y = minuend.y - c;                // SUBC.cc

    c2 = res.y > minuend.y;

    res.y = res.y - subtrahend.y;

    c = res.y > minuend.y;

    c = c | c2;

    res.z = minuend.z - c;                // SUBC.cc

    c2 = res.z > minuend.z;

    res.z = res.z - subtrahend.z;

    c = res.z > minuend.z;

    c = c | c2;

    res.w = minuend.w - c;                // SUBC  

    res.w = res.w - subtrahend.w;

#endif

    *diff = res;

}