I’m trying to add together some large integers (on the order of several hundred bits). I was planning on adding together the lowest 32 bits, then the next 32 bits plus the carry from the previous stage, and so on.
My only problem is that I cannot find the cuda C function that does this. Does it exist? I see that the PTX ISA has the addc instruction to handle this. Is my only option to code this part directly in PTX? From what I can tell, there is no way to “insert” PTX instructions inline with the C code, so it looks like there is no elegant way to use this approach.